-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atari environments #57
Conversation
Codecov Report
@@ Coverage Diff @@
## master #57 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 24 25 +1
Lines 704 749 +45
=========================================
+ Hits 704 749 +45
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@Rocamonde, would you mind taking an initial look at this? This week is pretty rammed for me so limited time for code review. Happy to answer questions / review subsections if anything requires more context on seals than you have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some things that I'm not very familiar with, I left some questions in the comments.
src/seals/__init__.py
Outdated
|
||
# Atari | ||
|
||
ATARI_ENV_NAMES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting this into a constant is a good idea, maybe we can do the same for the mujoco environments for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can automate this by something like:
set((env_spec.id.split('-')[0] for env_spec in gym.envs.registry.all() if env_spec.entry_point == 'gym.envs.atari:AtariEnv'))
This won't quite work -- there's some duplicates because of the Deterministic, NoFrameSkip, etc versions but it should be fairly easy to make a variant that handles that.
I'd feel better about interrogating the registry as that's always guaranteed to be up to date, whereas this will gradually drift. Though might want to add a test that asserts this is non-empty (or includes some known-good Atari names), so we'll find out in case the entry point or other detail of gym changes underneath us and breaks this.
Co-authored-by: Juan Rocamonde <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to flag is most Atari environments include the score. One feature that'd be nice to have is an option to gray out that score to avoid side-channels. I think this is probably out-of-scope for this PR (we can always add it in another one), especially as it'd be laborious to support all environments (score is in different place), but the README.md
is maybe a bit misleading when it says "we remove any side-channel sources of reward information" -- we're really just removing episode termination (a major but not the only one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these! In support of having them in seals
. But implementation needs to be more programmatic, less reliant on hard-coded constants or 100s of duplicate implementations ;)
I've made some suggestions on how to make that change.
src/seals/__init__.py
Outdated
|
||
# Atari | ||
|
||
ATARI_ENV_NAMES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can automate this by something like:
set((env_spec.id.split('-')[0] for env_spec in gym.envs.registry.all() if env_spec.entry_point == 'gym.envs.atari:AtariEnv'))
This won't quite work -- there's some duplicates because of the Deterministic, NoFrameSkip, etc versions but it should be fairly easy to make a variant that handles that.
I'd feel better about interrogating the registry as that's always guaranteed to be up to date, whereas this will gradually drift. Though might want to add a test that asserts this is non-empty (or includes some known-good Atari names), so we'll find out in case the entry point or other detail of gym changes underneath us and breaks this.
OK, Atari environments are now pulled from the gym registry. |
AFAICT the 'coverage reduction' comes from the tests checking if there are any environments named "SpaceInvaders" and calling an error if there aren't, and the tests not covering the lines where we say what to do if there aren't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting some fairly minor changes -- please re-request review once addressed, I think we're almost there :)
Co-authored-by: Adam Gleave <[email protected]>
Co-authored-by: Adam Gleave <[email protected]>
Co-authored-by: Adam Gleave <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor suggestions to add type annotations and fix capitalization of atari.
Added type annotations and fixed spellings. |
Hi, I'd be interested in finishing up the patching reward leaks for several atari environments to allow for reproducing RLHF paper results. This would include Beam Rider, Breakout, Pong, Q*bert, Seaquest, and Space Invaders. In particular, covering the score with a black background and covering enemy ship count for BeamRider, covering speedometer for Enduro. Should I create a new issue for this? |
Yes, please open a new pull request (and optionally accompanying issue) to track this. We'd definitely welcome a contribution here. I'd suggest making some generic wrapper we can use to black out the score, and then we can just add versions of the Atari environments with that wrapper applied to the relevant region that contains the score. |
Great! |
Adds constant-environment-length versions of atari environments, accessible as (for example)
seals/Asteroids-v5
orseals/AsteroidsNoFrameskip-v4
. Modifies testing so that atari environments don't spend super long being tested.