Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add score masking to seven atari environments #62

Merged
merged 10 commits into from
Nov 22, 2022

Conversation

stewy33
Copy link
Contributor

@stewy33 stewy33 commented Oct 7, 2022

Fixes #61

Added score masking to the seven atari environments from the RLHF paper. I used a black background to cover the score, to cover enemy ship count for BeamRider, and cover the speedometer for Enduro.

Note that the number of lives is unmasked. This matches the original implementation in the RLHF paper. However, it seems that episode boundaries could be inferred from there. What do we think about this choice?

image

@AdamGleave AdamGleave requested a review from Rocamonde October 10, 2022 00:07
@AdamGleave
Copy link
Member

@Rocamonde, do you mind reviewing this PR?

However, it seems that episode boundaries could be inferred from there. What do we think about this choice?

It's an interesting question. I'd guess number of lives is often decision relevant for the agent -- e.g. it wants to be more risk averse when only one life left. So I lean against masking it, though I agree it introduces a confounder. Having an option to mask it (that we could leave off by default) is probably the best thing to do. But, masking the score is a definitive improvement over not masking it so this shouldn't hold up the PR.

@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Merging #62 (f480835) into master (eeb9beb) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #62   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           26        26           
  Lines          982      1047   +65     
=========================================
+ Hits           982      1047   +65     
Impacted Files Coverage Δ
src/seals/atari.py 100.00% <100.00%> (ø)
src/seals/base_envs.py 100.00% <100.00%> (ø)
src/seals/util.py 100.00% <100.00%> (ø)
tests/test_envs.py 100.00% <100.00%> (ø)
tests/test_util.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@Rocamonde Rocamonde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking the time to submit this PR! Overall I agree with the design choices and implementation, and seems like a pretty useful feature to add to seals. There are some major comments that I would like to see addressed:

  • Masking obs_from_state instead of environment interaction methods
  • Whether we want to force all registered environments to be masked (even if users could still manually registered non-masked environments)
  • Whether we should have wrappers always inherit from our base class for type safety.

By default, I would favor the solutions I left in the comments, but happy to hear your thoughts on them. Since these are fairly major design choices for the project, maybe @AdamGleave wants to weigh in on these too?

src/seals/util.py Show resolved Hide resolved
src/seals/util.py Outdated Show resolved Hide resolved
src/seals/util.py Show resolved Hide resolved
src/seals/atari.py Outdated Show resolved Hide resolved
tests/test_envs.py Outdated Show resolved Hide resolved
@AdamGleave
Copy link
Member

  • Masking obs_from_state instead of environment interaction methods
  • Whether we should have wrappers always inherit from our base class for type safety.

These suggestions seem to be based on the assumption that the Atari environments implement the ResettablePOMDP interface, but they don't -- they're base gym.Env implementations. There's no obs_from_state method for us to mask.

I guess we could make the state be the non-masked observation, and mask it in the observation? That could work but I don't see the benefit.

@stewy33
Copy link
Contributor Author

stewy33 commented Oct 10, 2022

Hi, I just incorporated some of the suggested changes!

I added the option of having unmasked atari environments. The naming convention is now seals/BeamRider-v5 for masked environments and seals/BeamRider-Unmasked-v5 for unmasked environments. The idea for the naming was to suggest that the masked environments are the standard ones. Let me know if you think I should include a warning when loading unmasked environments.

@AdamGleave AdamGleave requested a review from Rocamonde October 12, 2022 03:10
if score_region is None:
raise ValueError(
"Requested environment does not yet support masking. "
+ "See https://github.com/HumanCompatibleAI/seals/issues/61.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + is unnecessary (https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation). It actually introduces runtime overhead, even though this is largely irrelevant and it's more of a standard style choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

name = "seals/" + slash_separated[-1]

if not masked:
last_hyphen_idx = name.rfind("-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident all environments will have a {name}-v{num} format? It's been the case everywhere that I've seen, but this would preclude us from registering environments without this format, and that's probably at least worth documenting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the _supported_atari_env method, we already only support an Atari environment if it ends with "-v4" or "-v5". So I think this is ok for now.

src/seals/atari.py Show resolved Hide resolved
@@ -51,17 +52,21 @@ def __init__(

self.mask = np.ones(env.observation_space.shape, dtype=bool)
for r in score_regions:
self.mask[r["x0"] : r["x1"], r["y0"] : r["y1"]] = 0
assert r["x"][0] < r["x"][1] and r["y"][0] < r["y"][1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a public method (users could create their own wrapper beyond our internal usage for seals-defined environments) you should raise a ValueError instead (and add a corresponding test). Passing in the wrong values is very much a possibility.

assert is to be used when something should always behave in a certain way by virtue of the purported logic of the program. This allows catching logical bugs and reassuring code checkers. This would be fine if only our own internally defined masks could ever be used. However, when something is part of the public API and therefore contingent on user input, we cannot really assert that a user won't pass the wrong value. See https://stackoverflow.com/questions/17530627/python-assertion-style#:~:text=The%20assert%20statement%20should%20only,user%20input%20or%20the%20environment. and https://wiki.python.org/moin/UsingAssertionsEffectively

What you can do, however, is have tests that assert that our internally defined masks verify this. That, plus a with raises test on the MaskScoreWrapper API should be enough to thoroughly test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

assert r["x"][0] < r["x"][1] and r["y"][0] < r["y"][1]
self.mask[r["x"][0] : r["x"][1], r["y"][0] : r["y"][1]] = 0

def _mask_obs(self, obs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding this. Code looks cleaner now IMO.

tests/test_envs.py Show resolved Hide resolved
@stewy33 stewy33 requested a review from Rocamonde October 22, 2022 18:25
Copy link
Member

@Rocamonde Rocamonde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a minor fix on the test cases. Please fix before merging. The tests say pending, you might want to re-trigger the pipeline.

def test_mask_score_wrapper_enforces_spec():
"""Test that MaskScoreWrapper enforces the spec."""
atari_env = gym.make(GYM_ATARI_ENV_SPECS[0].id)
with pytest.raises():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you specify the error that is raised and use the match option to match the error message?

atari_env = gym.make(GYM_ATARI_ENV_SPECS[0].id)
with pytest.raises():
util.MaskScoreWrapper(atari_env, [dict(x=(0, 1), y=(1, 0))])
with pytest.raises():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above (error type + error message match)

@AdamGleave
Copy link
Member

@Rocamonde stewy33 doesn't have permission to trigger them (our tests don't run on fork), I have done this now with git-push-fork-to-upstream-branch origin stewy33:master

@stewy33
Copy link
Contributor Author

stewy33 commented Oct 28, 2022

Hi, I'm new to contributing to open-source projects, so I'm wondering if

  1. I need to manually re-trigger the tests
  2. Whether you guys prefer me to rebase or merge my changes and whether I should squash my commits (and if so, how much of this is automated).

@Rocamonde
Copy link
Member

@Rocamonde stewy33 doesn't have permission to trigger them (our tests don't run on fork), I have done this now with git-push-fork-to-upstream-branch origin stewy33:master

Thanks, just re-triggered this after the most recent changes.

@AdamGleave what's gonna happen when we do this for multiple open PRs? (apparently the way it works is by pushing to the trigger-integration branch.)

@stewy33:

  1. I need to manually re-trigger the tests

Don't worry, I've just taken care of that. Normally it's automatic, but since your PR is from a fork, it's not.

2. Whether you guys prefer me to rebase or merge my changes and whether I should squash my commits (and if so, how much of this is automated).

IIRC the desirable option is the only one enabled when you press the merge button on the PR. AFAIK we always squash and merge (Adam correct me if this is wrong). Just click the button on the PR once the tests pass and GitHub should automatically do this and close the PR.

@Rocamonde
Copy link
Member

Rocamonde commented Oct 28, 2022

Seems like you have failing tests due to two small mistakes.

test_atari_unmasked_env_naming is failing because you mixed up two ways to check this and forgot to decide on one (or you just accidentally forgot a line of code). So currently you're doing

        noncompliant_envs = [
            (_get_score_region(name) is None and "Unmasked" not in name)
            for name in ATARI_ENVS
        ]
        assert len(noncompliant_envs) == 0

what you actually get is a list of bools with all elements False. So you can either

        noncompliant_envs = [
            name
            if (_get_score_region(name) is None and "Unmasked" not in name)
            for name in ATARI_ENVS
        ]

so you actually apply the filter to the list comprehension (I think this is probably what you intended to do), or

        is_each_env_noncompliant = [
            (_get_score_region(name) is None and "Unmasked" not in name)
            for name in ATARI_ENVS
        ]
        assert not any(is_each_env_noncompliant)

which is also correct but less intuitive. I probably prefer the first TBH.

Then for the second error, we manually hardcoded the names of some envs not to check because they take a while to show determinism. But since now we also have unmasked and masked versions, your unmasked envs are getting checked again and the tests are failing.

            # these environments take a while for their non-determinism to show.
            slow_random_envs = [
                "seals/Bowling-v5",
                "seals/Frogger-v5",
                "seals/KingKong-v5",
                "seals/Koolaid-v5",
                "seals/NameThisGame-v5",
            ]

I would just manually add the other tests to this list for now. Once you're at it, check that we have no hardcoded lists of environments anywhere else that are not being updated.

@AdamGleave
Copy link
Member

I would just manually add the other tests to this list for now.

Could also change the test to check if the prefix of the env name matches and skip accordingly.

@stewy33
Copy link
Contributor Author

stewy33 commented Nov 3, 2022

Hi, I added another test to hopefully past code coverage. Could one of you re-trigger tests? If they pass, I think we're ready to merge.

@Rocamonde
Copy link
Member

Rocamonde commented Nov 7, 2022

@AdamGleave codecov is being annoying, I think it thinks the CI failed and is not reporting coverage, but I checked on the website and all looks good. Can you override this / retrigger codecov? the PR LGTM.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rocamonde codecov seems to have fixed itself unless you did something?

@stewy33 thanks for all your work on this PR. Got around to reviewing it. Pretty much looks good, a couple of minor suggestions. After that should be ready to merge :)


SCORE_REGIONS: Dict[str, List[Dict[str, Tuple[int, int]]]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use List[Dict[str, Tuple[int, int]]] in three places in your code -- consider defining it as a type? Like:

MaskedRegionSpecifier = List[Dict[str, Tuple[int, int]]]

I'd also consider using a named tuple instead of dict to enforce that x and y are both present.

score_region = _get_score_region(atari_env_id)
if score_region is None:
raise ValueError(
"Requested environment does not yet support masking. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding the informative error message :)

def __init__(
self,
env: gym.Env,
score_regions: List[Dict[str, Tuple[int, int]]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you did define a type alias this file would be the natural place to do it.)

class MaskScoreWrapper(gym.Wrapper):
"""Mask a list of box-shaped regions in the observation to hide reward info.

Intended for environments whose observations are raw pixels (like atari
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Intended for environments whose observations are raw pixels (like atari
Intended for environments whose observations are raw pixels (like Atari

self.mask = np.ones(env.observation_space.shape, dtype=bool)
for r in score_regions:
if r["x"][0] >= r["x"][1] or r["y"][0] >= r["y"][1]:
raise ValueError('Invalid region: "x" and "y" must be increasing.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice input validation!

from seals import GYM_ATARI_ENV_SPECS, util


def test_mask_score_wrapper_enforces_spec():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to add a test that actually checks MaskScoreWrapper masks the observations -- e.g. you could have a dummy environment that returns all-ones, a dummy mask config, and then just check that region (and only that region) is zero.

I won't insist on it though, the MaskScoreWrapper implementation is simple and readable already so is unlikely to have a bug, and you've already done a lot of work in this PR!

@stewy33
Copy link
Contributor Author

stewy33 commented Nov 21, 2022

Hi, could someone trigger the tests one last time before merge? Just incorporated the desired minor changes.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience with this! Also I've changed the CI set up so tests should hopefully run OK on forked repos in futuer.

@AdamGleave AdamGleave merged commit dc7a695 into HumanCompatibleAI:master Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Seals Atari environments show game score
3 participants