Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Gym 0.26 support #780

Closed
wants to merge 201 commits into from
Closed

Add Gym 0.26 support #780

wants to merge 201 commits into from

Conversation

carlosluis
Copy link
Contributor

@carlosluis carlosluis commented Feb 19, 2022

EDIT: Please use the Gymnasium branch instead of this one: #1327
(the new branch is compatible with gym 0.21, gym 0.26 and gymnasium)

See comment #780 (comment) to use this PR

To install SB3 with gym 0.26+ support:

pip install git+https://github.com/carlosluis/stable-baselines3@fix_tests

(or as a requirement: git+https://github.com/carlosluis/stable-baselines3@fix_tests#egg=stable_baselines3[extra,tests,docs] and "sb3_contrib @ git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib@feat/new-gym-version" in a setup.py see DLR-RM/rl-baselines3-zoo#256)

Note: if you want to use gymnasium, you can do:

import sys
import gymnasium
sys.modules["gym"] = gymnasium

before any stable-baselines3 import.

for native gymnasium support, you can take a look at #1327 or install it with:

pip install git+https://github.com/DLR-RM/stable-baselines3@feat/gymnasium-support

Description

Gym 0.26 has been released and with it breaking changes. The objective of this PR is to fix all the failing tests.

Moving from Gym 0.26.2 to gymanisum 0.26.2 (which are the same) is part of this PR.

Missing:

Motivation and Context

Gym release notes

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

closes #840 #871
closes #271
closes #1156

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Status

  • There's still 28 failing tests when running locally, need to investigate further

@Miffyli Miffyli added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Feb 20, 2022
@araffin araffin mentioned this pull request Feb 21, 2022
11 tasks
@araffin
Copy link
Member

araffin commented Feb 21, 2022

it seems that the failures come from pygame not being installed somehow, probably an issue from gym...

@araffin araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Feb 21, 2022
@carlosluis
Copy link
Contributor Author

it seems that the failures come from pygame not being installed somehow, probably an issue from gym...

Only a subset of the failures, those are fixed now.

@araffin araffin mentioned this pull request Feb 21, 2022
14 tasks
@carlosluis
Copy link
Contributor Author

@RedTachyon I'm currently investigating a bug which may be related to openai/gym#2422.

Here's a stack trace of the type of errors:

tests/test_save_load.py:216: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
stable_baselines3/dqn/dqn.py:258: in learn
    return super(DQN, self).learn(
stable_baselines3/common/off_policy_algorithm.py:354: in learn
    rollout = self.collect_rollouts(
stable_baselines3/common/off_policy_algorithm.py:584: in collect_rollouts
    actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
stable_baselines3/common/off_policy_algorithm.py:415: in _sample_action
    unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
stable_baselines3/dqn/dqn.py:238: in predict
    action = np.array([self.action_space.sample() for _ in range(n_batch)])
stable_baselines3/dqn/dqn.py:238: in <listcomp>
    action = np.array([self.action_space.sample() for _ in range(n_batch)])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = Discrete(10)

    def sample(self) -> int:
>       return self.start + self.np_random.randint(self.n)
E       AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'

I've been digging into the code and the problem seems to be that this line

def sample(self) -> int:
        return self.start + self.np_random.randint(self.n)

is being called with self.np_random of type numpy.random._generator.Generator, instead of gym.utils.seeding.RandomNumberGenerator. The former doesn't have a method randint so the code fails.

If it's any help, this is only happening after loading a model, so the problem may be related to it (I haven't checked much on that front yet, wanted to rule out something obvious with the changes on seeding first). Here's an example test were it fails

model.learn(total_timesteps=64, reset_num_timesteps=False)

It seems discrete.py in Gym assumes np_random will always be of some type but doesn't enforce it explicitly? Or there's something funny happening with the loading...

@RedTachyon
Copy link

I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.

I think I see two main ways to solve it - either in __setstate__ of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class

@carlosluis
Copy link
Contributor Author

I think I found the issue - the custom RNG class inherits from the numpy Generator for compatibility purposes, but when it gets pickled/unpickled, it defaults to the numpy behavior which creates a new Generator.

I think I see two main ways to solve it - either in __setstate__ of Space make sure that a RNG is set instead of a Generator; or in the RNG code do some custom code to pickle/unpickle the proper class

I see, thanks for checking. I wonder what this means for SB3 adopting Gym 0.22, since the problem seems to be on Gym's side. I'm guessing the fix (whenever it comes) won't be available for SB3 until the next release by Gym?

@araffin
Copy link
Member

araffin commented Feb 23, 2022

won't be available for SB3 until the next release by Gym?

openai/gym#2640 must be fixed first too

@jkterry1
Copy link
Contributor

We're going to do a 0.22.1 release in the new future that will fix all of these things, some of the relevant fixes here have already been merged

@araffin araffin mentioned this pull request Feb 23, 2022
3 tasks
@carlosluis
Copy link
Contributor Author

carlosluis commented Mar 3, 2022

Update on this:

  1. Mean reward is below desired threshold.
FAILED tests/test_identity.py::test_discrete[env2-A2C] - AssertionError: Mean reward below threshold: 88...
FAILED tests/test_identity.py::test_discrete[env2-PPO] - AssertionError: Mean reward below threshold: 86...

Not sure exactly how gym's updates are causing this lower mean reward, need to check this further

  1. Warning not raised in HER test
FAILED tests/test_her.py::test_save_load_replay_buffer[True-False] - assert 0 == 1

Caused because the trajectory truncation is not raising a warning in these lines:

if truncate_last_trajectory:
assert len(recwarn) == 1
warning = recwarn.pop(UserWarning)
assert "The last trajectory in the replay buffer will be truncated" in str(warning.message)

EDIT: CI also shows about 3 tests failing with EOFError, and I also observed those locally, but they seem to be also solved when using gym master (tested locally).

@carlosluis
Copy link
Contributor Author

Regarding the "mean reward below threshold" problem, after some further investigations the root cause is the change in seeding behaviour in gym.

SB3 sets seeds here:

Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].

The consequence of it is, for instance, that the result of


is not consistent between gym==0.21.0 and gym >= 0.22.0. The solution I say makes the most sense is to simply adjust the threshold value in the test.

@araffin
Copy link
Member

araffin commented Mar 8, 2022

Before gym 0.22.0, the default seed() method was not actually setting any seed [source]. In gym 0.22 the default seed() does set the seed [source].

I see...

is not consistent between gym==0.21.0 and gym >= 0.22.0.

is there any way to make it consistent?

The solution I say makes the most sense is to simply adjust the threshold value in the test.

The whole point of those performance tests is to detect any change that may injure performance/change results. (they fail easily as soon as any minor change is made) So we should not change the threshold but rather fix the underlying change/issue, or in the current case, I would prefer changing the seed if we cannot have consistent behavior with previous version.

btw, are all those warnings (see below) due to gym only?

gym/utils/seeding.py:47: 1 warning
tests/test_callbacks.py: 4153 warnings
tests/test_cnn.py: 377 warnings
tests/test_custom_policy.py: 106 warnings
tests/test_deterministic.py: 238 warnings
tests/test_dict_env.py: 14116 warnings
tests/test_envs.py: 2 warnings
tests/test_her.py: 1566 warnings
tests/test_identity.py: 18153 warnings
tests/test_logger.py: 300 warnings
tests/test_monitor.py: 3000 warnings
tests/test_predict.py: 109 warnings
tests/test_run.py: 222 warnings
tests/test_save_load.py: 6597 warnings
tests/test_spaces.py: 1110 warnings
tests/test_train_eval_mode.py: 31 warnings
tests/test_utils.py: 12 warnings
tests/test_vec_envs.py: 216 warnings
tests/test_vec_monitor.py: 4000 warnings
tests/test_vec_normalize.py: 6547 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:47: DeprecationWarning: WARN: Function `rng.randint(low, [high, size, dtype])` is marked as deprecated and will be removed in the future. Please use `rng.integers(low, [high, size, dtype])` instead.

same for

tests/test_envs.py::test_non_default_action_spaces[new_action_space4]
  /home/runner/work/stable-baselines3/stable-baselines3/stable_baselines3/common/env_checker.py:272: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
    warnings.warn(

tests/test_utils.py: 16 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:138: DeprecationWarning: WARN: Function `hash_seed(seed, max_bytes)` is marked as deprecated and will be removed in the future. 
    deprecation(

tests/test_utils.py: 16 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/gym/utils/seeding.py:175: DeprecationWarning: WARN: Function `_bigint_from_bytes(bytes)` is marked as deprecated and will be removed in the future. 
    deprecation(

?

@carlosluis
Copy link
Contributor Author

Similarly, the HER test failure was also caused by the change in RNG. The failing test required that after training for 200 steps, the env had to be left in the middle of a trajectory. After the change in gym, the RNG gods decided that after 200 steps the env had just finished an episode, i.e., self.current_idx == 0 in

==> warning is never raised ==> test assertion fails.

A simple change in the seed fixes the test.

@araffin araffin mentioned this pull request Mar 13, 2023
16 tasks
@araffin
Copy link
Member

araffin commented Mar 29, 2023

Closing this one as I just released an alpha version of the Gymnasium branch on PyPi:

# sb3 contrib installs sb3 too
pip install "sb3_contrib>=2.0.0a1" --upgrade

Documentation is available here: https://stable-baselines3.readthedocs.io/en/feat-gymnasium-support/

@rfali
Copy link

rfali commented May 16, 2023

@araffin if I understand correctly, SB3 directly jumped from supporting gym==0.21 to gymnasium==0.28.1?

@araffin
Copy link
Member

araffin commented May 16, 2023

SB3 2.x (master version) supports gymnasium 0.28.1 and gym 0.21/0.26 via shimmy

@rfali
Copy link

rfali commented May 16, 2023

@araffin is there any documentation/reference on how to use SB3 with gym 0.21/0.26 via shimmy? I saw the install requirements of master branch and it only mentioned gymnasium (does not include gym).

@araffin
Copy link
Member

araffin commented May 17, 2023

is there any documentation/reference on how to use SB3 with gym 0.21/0.26 via shimmy?

It is automatic if you use make_vec_env (and if you pass the env to the constructor), otherwise take a look at shimmy documentation.

env = env_id(**env_kwargs)
# Patch to support gym 0.21/0.26 and gymnasium
env = _patch_env(env)

https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L30

Patch env is defined here:
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/patch_gym.py#L15

and yes gym is an optional dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet