You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.
Having said so, I think this would probably have a negligible effect in terms of learning, given that the replay buffer is so big, but I think it's good for the author to check on this @ghliu .
Hi Guan-Horng,
Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.
pytorch-ddpg/main.py
Line 64 in e9db328
Thank you!
The text was updated successfully, but these errors were encountered: