Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Running MAPPO #38

Open
roggirg opened this issue Jul 18, 2022 · 11 comments · Fixed by #39
Open

Issues Running MAPPO #38

roggirg opened this issue Jul 18, 2022 · 11 comments · Fixed by #39
Assignees

Comments

@roggirg
Copy link

roggirg commented Jul 18, 2022

Hi Folks,

I'm trying to run "on-policy PPO" using python examples/on_policy_files/nocturne_runner.py algorithm=ppo and there are a couple of issues I'm encountering.

  • algo vs. algorithm: The config.yml file uses algorithm whereas the script uses cfg.algo. Switching algo to algorithm seems to fix the issue.
  • wandb_name seems to be missing from the cfg. To make it work, I just disabled use of wandb.
  • The wrapper environment calls len(self.vehicles) on line 30 which throws AttributeError: 'BaseEnv' object has no attribute 'vehicles'. Replacing self.vehicles with self.controlled_vehicles seems to solve the issue. Is this the correct way to fix it?

Thanks for your help.

@xiaomengy
Copy link
Contributor

@eugenevinitsky Could you help take a look?

@eugenevinitsky
Copy link
Collaborator

Hi, sorry this bug is here! I am out today but this will be definitively fixed by tomorrow afternoon.

@eugenevinitsky
Copy link
Collaborator

I believe the fixes that you have there are correct though.

@eugenevinitsky
Copy link
Collaborator

Thanks for your patience, working on getting this merged but the relevant fixes are in:
#39

@eugenevinitsky
Copy link
Collaborator

Heads up though, that code has not been extensively hyper-parameter tuned

@eugenevinitsky
Copy link
Collaborator

eugenevinitsky commented Jul 20, 2022

No rush at all but let us know if this resolves your issue?

@roggirg
Copy link
Author

roggirg commented Jul 21, 2022

Hi @eugenevinitsky ,

Everything is running now thanks for the fixes.

Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:

average episode rewards is 0.33026985824108124
maximum per step reward is 0.058307357132434845

 Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29.

average episode rewards is 2.849382162094116
maximum per step reward is 8.059619903564453
episode reward of rendered episode is: 0.8622641801569368

 Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25.

average episode rewards is 0.9344396740198135
maximum per step reward is 0.05804213136434555

 Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26.

average episode rewards is 1.3483695685863495
maximum per step reward is 8.056236267089844

 Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27.

average episode rewards is 1.1445978283882141
maximum per step reward is 0.057421959936618805
```
Thanks!

@xiaomengy
Copy link
Contributor

Hi @eugenevinitsky ,

Everything is running now thanks for the fixes.

Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:

average episode rewards is 0.33026985824108124
maximum per step reward is 0.058307357132434845

 Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29.

average episode rewards is 2.849382162094116
maximum per step reward is 8.059619903564453
episode reward of rendered episode is: 0.8622641801569368

 Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25.

average episode rewards is 0.9344396740198135
maximum per step reward is 0.05804213136434555

 Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26.

average episode rewards is 1.3483695685863495
maximum per step reward is 8.056236267089844

 Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27.

average episode rewards is 1.1445978283882141
maximum per step reward is 0.057421959936618805

Thanks!

It's hard to say what is the normal FPS. It depends on lost of things. Could you provide more details such as what machine you are using, what and how many CPU cores you have, what and how many GPUs you have, etc.

@eugenevinitsky
Copy link
Collaborator

eugenevinitsky commented Jul 21, 2022

Hey @roggirg, it depends on the number of rollout threads you're using and whether you are using a GPU or just CPU; the MAPPO code uses an RNN by default and includes the time for backprop when computing the FPS. Can you try increasing the value of algorithm.n_rollout_threads? It should basically scale linearly in the number of threads or workers

@roggirg
Copy link
Author

roggirg commented Jul 21, 2022

Ah cool, thanks @eugenevinitsky @xiaomengy . I played around with n_rollout_threads=4 (did not know of its existence) and the FPS jumped up to ~50ish.
FYI, I'm running on a 1080Ti with a 12 -core CPU.
Thanks for your help.

@roggirg roggirg closed this as completed Jul 21, 2022
@eugenevinitsky
Copy link
Collaborator

We're going to re-open this because that's a good deal slower than we expect it to be. @xiaomengy, any chance you could run the line
python examples/on_policy_files/nocturne_runner.py algorithm=ppo algorithm.n_rollout_threads=4 and report the FPS? I don't have GPU access for a little while so I can't check it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants