Different explaination about the argument ignore_done #610

Seraphli · 2023-03-10T07:30:27Z

Seraphli
Mar 10, 2023

In ant_onppo, ignore_done is set to False. But in halfcheetah_onppo, ignore_done is set to True. Why the value is different between the two files?
https://github.com/opendilab/DI-engine/blob/v0.4.6/dizoo/mujoco/config/ant_onppo_config.py#L34-L37
https://github.com/opendilab/DI-engine/blob/v0.4.6/dizoo/mujoco/config/halfcheetah_onppo_config.py#L35-L39

Answered by zjowowen

Mar 14, 2023

The argument ignore_done was introduced for calculate q value.

The bool variable done being true in gym.step() has two causes. For most of the time, the env is terminated by defined by some failure conditions. Sometimes, done is true because env steps number has reached its timelimit.

OpenAI gym is a rather unstable env manager with api changing from time to time. It is really bothering for maintainence. The gym env did not give useful signals for these two cases before version 0.25.0.

But it is different for these two cases. In the calculation of Bellman equation, target-value=r+next-state-value. If env is truely terminated, next state is an end game with value being zero. If env is not …

View full answer

zjowowen · 2023-03-14T07:27:03Z

zjowowen
Mar 14, 2023
Collaborator

The argument ignore_done was introduced for calculate q value.

The bool variable done being true in gym.step() has two causes. For most of the time, the env is terminated by defined by some failure conditions. Sometimes, done is true because env steps number has reached its timelimit.

OpenAI gym is a rather unstable env manager with api changing from time to time. It is really bothering for maintainence. The gym env did not give useful signals for these two cases before version 0.25.0.

But it is different for these two cases. In the calculation of Bellman equation, target-value=r+next-state-value. If env is truely terminated, next state is an end game with value being zero. If env is not terminated and has reached its timelimit, next state may be some state and has some value.

For halfcheetah, it is stable and not likely to fall into the groud and terminate for most of the time, so we have to ignore the done given by gym.step(). But for other envs, we turn the argument ignore_done to be false.

4 replies

Seraphli Mar 14, 2023
Author

So can I understand it this way? Ant is not as stable as halfcheetah and it is always okay to set it as false.

zjowowen Mar 14, 2023
Collaborator

Yes.

Seraphli Mar 14, 2023
Author

Thanks for clarifying.

zjowowen Mar 14, 2023
Collaborator

You are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different explaination about the argument ignore_done #610

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Different explaination about the argument ignore_done #610

Seraphli Mar 10, 2023

Replies: 1 comment · 4 replies

zjowowen Mar 14, 2023 Collaborator

Seraphli Mar 14, 2023 Author

zjowowen Mar 14, 2023 Collaborator

Seraphli Mar 14, 2023 Author

zjowowen Mar 14, 2023 Collaborator

Seraphli
Mar 10, 2023

Replies: 1 comment 4 replies

zjowowen
Mar 14, 2023
Collaborator

Seraphli Mar 14, 2023
Author

zjowowen Mar 14, 2023
Collaborator

Seraphli Mar 14, 2023
Author

zjowowen Mar 14, 2023
Collaborator