(Solved) No env.reset() at the end of each training epoch. #67

slDeng1003 · 2024-03-11T13:10:28Z

【Existing code:】
Only reset the environment at the beginning of training loop, that is, only call env.reset() at the first epoch.
【Right(might) training paradigm】
I checked OpenAI spinning-up's implement of PPO https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py, they do reset the env at the end of each epoch (same as reset it at the beginning of each epoch).

Correct me if I were wrong:)

P.S.: It;s still nice code!

ZheruiHuang · 2024-04-18T07:49:39Z

Hello! I think the training code is logically the same as OpenAI's.

Maybe you are misled by these two similar loops: https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/ppo/ppo.py#L299 and

PPO-PyTorch/train.py

Line 173 in 728cce8

for t in range(1, max_ep_len+1):

In the former (OpenAI's) implementation, this loop will perform more than one episode, and it calls reset when an episode is done (but not jump out the loop). In the latter (this repo's) implementation, the loop performs only one episode. When an episode is done, it breaks the loop and resets the env (before the next episode begins).

Hope it makes scene to you!

slDeng1003 · 2024-04-18T16:53:08Z

Dear Huang,
I appreciate your reply. I have checked the code and find out that you are right.
Thank you again for your help!👍
@ZheruiHuang

slDeng1003 changed the title ~~No env.reset() at the end of each training epoch.~~ (Solved) No env.reset() at the end of each training epoch. Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Solved) No env.reset() at the end of each training epoch. #67

(Solved) No env.reset() at the end of each training epoch. #67

slDeng1003 commented Mar 11, 2024 •

edited

Loading

ZheruiHuang commented Apr 18, 2024

slDeng1003 commented Apr 18, 2024

(Solved) No env.reset() at the end of each training epoch. #67

(Solved) No env.reset() at the end of each training epoch. #67

Comments

slDeng1003 commented Mar 11, 2024 • edited Loading

ZheruiHuang commented Apr 18, 2024

slDeng1003 commented Apr 18, 2024

slDeng1003 commented Mar 11, 2024 •

edited

Loading