Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Transformer on Pong #16

Open
furmans opened this issue Dec 8, 2020 · 2 comments
Open

Stable Transformer on Pong #16

furmans opened this issue Dec 8, 2020 · 2 comments

Comments

@furmans
Copy link

furmans commented Dec 8, 2020

Hello,

I am currently unable to recreate the results of the stable transformer on the Pong environment. I believe from the paper the last 100 episode returns should be ~17.62 for this model and environment.

I am running the train program with arguments as specified in README for Best Performing Stable Transformer on Pong.

In train.py line 731 I changed ctx = mp.get_context("fork") to ctx = mp.get_context("spawn")

The final results I obtained one one run:

[INFO:17181 train:962 2020-12-01 19:35:33,350] Steps 10001513 @ 668.5 SPS. Loss -15.672254. Return per episode: -12.7. Stats:
{'baseline_loss': 11.395485877990723,
 'entropy_loss': -18.699639002482098,
 'episode_returns': [-20.0, -18.0, -19.0],
 'last_100_episode_returns': -19.530000686645508,
 'learning_rate': 8.657589688233862e-05,
 'len_max_traj': 239,
 'max_return_achieved': '-14.0 at step 5366379',
 'mean_episode_return': -12.666666666666666,
 'num_unpadded_steps': 3346,
 'pg_loss': -8.368099212646484,
 'total_loss': -15.672253926595053}
[INFO:17181 train:969 2020-12-01 19:35:33,350] Learning finished after 10001513 steps.

Results from another run:

[INFO:15271 train:962 2020-12-04 19:47:48,776] Steps 10001156 @ 661.4 SPS. Loss -9.595014. Return per episode: -19.7. Stats:
{'baseline_loss': 14.119840621948242,
 'entropy_loss': -18.633128484090168,
 'episode_returns': [-21.0, -19.0, -20.0, -19.0],
 'last_100_episode_returns': -19.540000915527344,
 'learning_rate': 9.02709105067138e-05,
 'len_max_traj': 239,
 'max_return_achieved': '-14.0 at step 7824133',
 'mean_episode_return': -19.666666666666668,
 'num_unpadded_steps': 3309,
 'pg_loss': -5.081725597381592,
 'total_loss': -9.595013936360678}
[INFO:15271 train:969 2020-12-04 19:47:48,776] Learning finished after 10001156 steps.

I am on Ubuntu 18.04.4, using Cuda 10.2, cudnn 7, torch 1.6.0.

Thanks in advance for any help.

Best,
Sean

@BKHMSI
Copy link

BKHMSI commented Mar 25, 2021

Hi @furmans,

I am having the same problem, were you able to make it work?

@skkuai
Copy link

skkuai commented May 3, 2022

I made it work. Please see the #17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants