fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

song2181 · 2022-12-28T10:31:19Z

Description

fix config max_envstep. fix MiniGrid-AKTDT-7x7-1-v0 bug

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

…v-icm-onppo

ding/entry/serial_entry_reward_model_offpolicy.py

dizoo/minigrid/__init__.py

dizoo/minigrid/config/minigrid_icm_offppo_config.py

PaParaZz1 · 2022-12-29T11:14:07Z

dizoo/minigrid/envs/app_key_to_door_treasure.py

@@ -185,7 +186,7 @@ def step(self, action):

        obs = self.gen_obs()

-        return obs, reward, done, {}
+        return obs, reward, done, done, {}


why two done here

bacause in file minigrid_env.py , we use the step method of Gymnasium env, return is (observation, reward, terminated, truncated, info)
obs, rew, done, _, info = self._env.step(action)

add some comments about this

setup.py

ding/reward_model/rnd_reward_model.py

ding/reward_model/icm_reward_model.py

…v-icm-onppo

…eplay save

…v-icm-onppo

…into dev-icm-onppo

… into dev-icm-onppo

…v-icm-onppo

PaParaZz1 · 2023-03-01T05:39:04Z

ding/reward_model/icm_reward_model.py

+            self.tb_logger.add_scalar('icm_reward/icm_reward_mean', icm_reward.mean(), self.estimate_cnt_icm)
+            self.tb_logger.add_scalar('icm_reward/icm_reward_min', icm_reward.min(), self.estimate_cnt_icm)
+            self.tb_logger.add_scalar('icm_reward/icm_reward_std', icm_reward.std(), self.estimate_cnt_icm)
+            icm_reward = (raw_icm_reward - raw_icm_reward.min()) / (raw_icm_reward.max() - raw_icm_reward.min() + 1e-8)


why norm twice here

dizoo/minigrid/__init__.py

PaParaZz1 · 2023-03-01T05:42:10Z

ding/reward_model/icm_reward_model.py

@@ -171,8 +174,12 @@ def __init__(self, config: EasyDict, device: str, tb_logger: 'SummaryWriter') ->
        self.ce = nn.CrossEntropyLoss(reduction="mean")
        self.forward_mse = nn.MSELoss(reduction='none')
        self.reverse_scale = config.reverse_scale
+        self.res = nn.Softmax(dim=-1)


why use softmax here if we only need to sample action by argmax operation

PaParaZz1 · 2023-03-01T05:43:10Z

dizoo/minigrid/config/minigrid_icm_offppo_config.py

            value_weight=0.5,
            entropy_weight=0.001,
            clip_ratio=0.2,
-            adv_norm=False,
+            adv_norm=True,
+            value_norm=True,


offppo doesn't have value norm

PaParaZz1 · 2023-03-01T05:43:49Z

ding/reward_model/icm_reward_model.py

@@ -151,6 +151,9 @@ class ICMRewardModel(BaseRewardModel):
        update_per_collect=100,
        # (float) the importance weight of the forward and reverse loss
        reverse_scale=1,
+        intrinsic_reward_weight=0.003,  # 1/300


add comments for each fields in default config

puyuan1996 and others added 7 commits November 21, 2022 21:33

polish(pu): polish icm_onppo_config

0936556

polish(pu): polish icm rnd intrinsic_reward_weight and config

c03fadd

Merge branch 'main' of https://github.com/opendilab/DI-engine into de…

c66a6df

…v-icm-onppo

style(pu): yapf format

39365e1

Merge branch 'main' into dev-icm-onppo

6a92682

fix(lisong): fix config bugs and app_key env bugs

6b11aff

Merge https://github.com/opendilab/DI-engine into dev-icm-onppo

3907dd9

song2181 changed the title ~~Dev icm onppo~~ fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs Dec 28, 2022

PaParaZz1 mentioned this pull request Dec 28, 2022

polish(pu): improve ICM/RND by normalization and adjustable weight #549

Closed

3 tasks

PaParaZz1 added bug Something isn't working env Questions about RL environment labels Dec 28, 2022

PaParaZz1 mentioned this pull request Dec 28, 2022

Roadmap for DI-engine #548

Open

PaParaZz1 requested changes Dec 29, 2022

View reviewed changes

PaParaZz1 and others added 17 commits January 2, 2023 16:38

Merge branch 'main' into dev-icm-onppo

e9db602

Merge branch 'main' into dev-icm-onppo

0bf6f14

Merge branch 'main' into dev-icm-onppo

8d58e4c

polish(lisong): polish icm/rnd config and reward model

142fc44

Merge branch 'dev-icm-onppo' of github.com:song2181/DI-engine into de…

e2ae39c

…v-icm-onppo

fix(lisong): add viewsizerapper in minigrid_wrapper

3731e02

Merge branch 'main' into dev-icm-onppo

c3a4710

Merge branch 'main' into dev-icm-onppo

ab6a2a2

fix(lisong): add doorkey8x8 rnd+onppo config,save reward model, fix r…

d17f6b3

…eplay save

Merge branch 'main' of https://github.com/opendilab/DI-engine into de…

84a19c4

…v-icm-onppo

Merge branch 'dev-icm-onppo' of github.com:song2181/DI-engine into de…

9906d1b

…v-icm-onppo

Merge branch 'main' into dev-icm-onppo

fb37768

fix(pu): fix augmented_reward tb_logging

d8191c2

Merge branch 'dev-icm-onppo' of https://github.com/song2181/DI-engine …

be4fdc5

…into dev-icm-onppo

feat(lisong): add noisy-tv env in minigrid

153b8db

Merge branch 'dev-icm-onppo' of https://github.com/puyuan1996/DI-engine…

0fc5f93

… into dev-icm-onppo

Merge branch 'dev-icm-onppo' of github.com:song2181/DI-engine into de…

1957316

…v-icm-onppo

fix(lisong): modify noisy_tv env

f6985c5

PaParaZz1 reviewed Mar 1, 2023

View reviewed changes

PaParaZz1 requested changes Mar 1, 2023

View reviewed changes

PaParaZz1 merged commit 072370a into opendilab:main Mar 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

song2181 commented Dec 28, 2022 •

edited

Loading

PaParaZz1 Dec 29, 2022

song2181 Dec 30, 2022

PaParaZz1 Jan 2, 2023

PaParaZz1 Mar 1, 2023

PaParaZz1 Mar 1, 2023

PaParaZz1 Mar 1, 2023

PaParaZz1 Mar 1, 2023

fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564

Conversation

song2181 commented Dec 28, 2022 • edited Loading

Description

Related Issue

TODO

Check List

PaParaZz1 Dec 29, 2022

Choose a reason for hiding this comment

song2181 Dec 30, 2022

Choose a reason for hiding this comment

PaParaZz1 Jan 2, 2023

Choose a reason for hiding this comment

PaParaZz1 Mar 1, 2023

Choose a reason for hiding this comment

PaParaZz1 Mar 1, 2023

Choose a reason for hiding this comment

PaParaZz1 Mar 1, 2023

Choose a reason for hiding this comment

PaParaZz1 Mar 1, 2023

Choose a reason for hiding this comment

song2181 commented Dec 28, 2022 •

edited

Loading