-
In ant_onppo, ignore_done is set to False. But in halfcheetah_onppo, ignore_done is set to True. Why the value is different between the two files? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
The argument ignore_done was introduced for calculate q value. The bool variable done being true in gym.step() has two causes. For most of the time, the env is terminated by defined by some failure conditions. Sometimes, done is true because env steps number has reached its timelimit. OpenAI gym is a rather unstable env manager with api changing from time to time. It is really bothering for maintainence. The gym env did not give useful signals for these two cases before version 0.25.0. But it is different for these two cases. In the calculation of Bellman equation, target-value=r+next-state-value. If env is truely terminated, next state is an end game with value being zero. If env is not terminated and has reached its timelimit, next state may be some state and has some value. For halfcheetah, it is stable and not likely to fall into the groud and terminate for most of the time, so we have to ignore the done given by gym.step(). But for other envs, we turn the argument ignore_done to be false. |
Beta Was this translation helpful? Give feedback.
The argument ignore_done was introduced for calculate q value.
The bool variable done being true in gym.step() has two causes. For most of the time, the env is terminated by defined by some failure conditions. Sometimes, done is true because env steps number has reached its timelimit.
OpenAI gym is a rather unstable env manager with api changing from time to time. It is really bothering for maintainence. The gym env did not give useful signals for these two cases before version 0.25.0.
But it is different for these two cases. In the calculation of Bellman equation, target-value=r+next-state-value. If env is truely terminated, next state is an end game with value being zero. If env is not …