[feat] Extend core `Environment` reset method with `options` or `params` argument like Gymnasium. #17

joeryjoery · 2023-05-11T05:43:49Z

Add flexibility to procedurally generate different environments based on reset keyword arguments.

So, instead of having to reinitialize the entire Environment object or mutating the attributes of an instance, this could be functionally performed through an options dictionary or dataclass.

This is an extension of the dm_env API for parity with the gymnasium API. The only real problem with this extension is that jitted reset functions will have to recompile each time the options tree-structure changes, this should be noted in the docstring.

The text was updated successfully, but these errors were encountered:

Howuhh · 2023-11-10T06:42:40Z

see also instadeepai/jumanji#212 and discussion at gymnasium discord about this https://discord.com/channels/961771112864313344/1029089576398114846/1165900249798287391

I think most useful and ergonomic implementation currently is gymnax with explicit EnvParams that should be passed to all the env functions. There will be no need to recompile with them, as these are just regular pytrees and not static. However, gymnax is not consistent and sometimes stores env params inside the class, imho all the params should be in the EnvParams as field(pytree_node=False) (from flax dataclass) if there is some code based on it that requires that values to be not a tracer.

joeryjoery · 2023-11-10T07:16:03Z

Hey, the suggestion you made in the jumanji gh is exactly what I had in mind.

I have a problem with the Gymnax API as in meta-RL the parameters do not change dynamically. If they would, it would just be a state. So having to manually carry this around is annoying and prone to error. This is also why I put the random key inside state, less variables to take care off. Plus, it also prevents reproducibility problems since the Environment manages its own randomness in a predictable sequential pattern.

I also don't agree that these env_params can change how the environment specs are defined, i.e., gymnax uses this:

obs, state = env.reset(key_reset, env_params)

# Sample a random action.
action = env.action_space(env_params).sample(key_act)

This design is asking for unintentional recompilations if array shapes change. Preferably when defining an Environment, it's specification is bounded and otherwise one should just make a new Environment.

I'll try to get around to fixing stale issues in this repo perhaps in the weekend :)

Howuhh · 2023-11-10T08:30:02Z

Actually if we jit the entire training loop like in PureJaxRL, there will be an error on shape changes, not recompilation. But I agree that explicit params do not allow for arbitrary changes to them, only to some that are compatible with meta-RL task. I think with options this also will be true. Once compiled, there will be no way to change tree structure. But this is okay, for all meta-RL tasks we usually have same structure for problem specification or at least can pad it to common shapes.

joeryjoery added the enhancement New feature or request label May 11, 2023

joeryjoery mentioned this issue Nov 11, 2023

Extend State bound with an Optional protocol that has a attribute. E… #29

Merged

joeryjoery closed this as completed in #29 Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Extend core `Environment` reset method with `options` or `params` argument like Gymnasium. #17

[feat] Extend core `Environment` reset method with `options` or `params` argument like Gymnasium. #17

joeryjoery commented May 11, 2023

Howuhh commented Nov 10, 2023

joeryjoery commented Nov 10, 2023

Howuhh commented Nov 10, 2023 •

edited

Loading

[feat] Extend core Environment reset method with options or params argument like Gymnasium. #17

[feat] Extend core Environment reset method with options or params argument like Gymnasium. #17

Comments

joeryjoery commented May 11, 2023

Howuhh commented Nov 10, 2023

joeryjoery commented Nov 10, 2023

Howuhh commented Nov 10, 2023 • edited Loading

[feat] Extend core `Environment` reset method with `options` or `params` argument like Gymnasium. #17

[feat] Extend core `Environment` reset method with `options` or `params` argument like Gymnasium. #17

Howuhh commented Nov 10, 2023 •

edited

Loading