Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Extend core Environment reset method with options or params argument like Gymnasium. #17

Closed
joeryjoery opened this issue May 11, 2023 · 3 comments · Fixed by #29
Labels
enhancement New feature or request

Comments

@joeryjoery
Copy link
Owner

Add flexibility to procedurally generate different environments based on reset keyword arguments.

So, instead of having to reinitialize the entire Environment object or mutating the attributes of an instance, this could be functionally performed through an options dictionary or dataclass.

This is an extension of the dm_env API for parity with the gymnasium API. The only real problem with this extension is that jitted reset functions will have to recompile each time the options tree-structure changes, this should be noted in the docstring.

@joeryjoery joeryjoery added the enhancement New feature or request label May 11, 2023
@Howuhh
Copy link

Howuhh commented Nov 10, 2023

see also instadeepai/jumanji#212 and discussion at gymnasium discord about this https://discord.com/channels/961771112864313344/1029089576398114846/1165900249798287391

I think most useful and ergonomic implementation currently is gymnax with explicit EnvParams that should be passed to all the env functions. There will be no need to recompile with them, as these are just regular pytrees and not static. However, gymnax is not consistent and sometimes stores env params inside the class, imho all the params should be in the EnvParams as field(pytree_node=False) (from flax dataclass) if there is some code based on it that requires that values to be not a tracer.

@joeryjoery
Copy link
Owner Author

Hey, the suggestion you made in the jumanji gh is exactly what I had in mind.

I have a problem with the Gymnax API as in meta-RL the parameters do not change dynamically. If they would, it would just be a state. So having to manually carry this around is annoying and prone to error. This is also why I put the random key inside state, less variables to take care off. Plus, it also prevents reproducibility problems since the Environment manages its own randomness in a predictable sequential pattern.

I also don't agree that these env_params can change how the environment specs are defined, i.e., gymnax uses this:

obs, state = env.reset(key_reset, env_params)

# Sample a random action.
action = env.action_space(env_params).sample(key_act)

This design is asking for unintentional recompilations if array shapes change. Preferably when defining an Environment, it's specification is bounded and otherwise one should just make a new Environment.

I'll try to get around to fixing stale issues in this repo perhaps in the weekend :)

@Howuhh
Copy link

Howuhh commented Nov 10, 2023

Actually if we jit the entire training loop like in PureJaxRL, there will be an error on shape changes, not recompilation. But I agree that explicit params do not allow for arbitrary changes to them, only to some that are compatible with meta-RL task. I think with options this also will be true. Once compiled, there will be no way to change tree structure. But this is okay, for all meta-RL tasks we usually have same structure for problem specification or at least can pad it to common shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants