-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Extend core Environment
reset method with options
or params
argument like Gymnasium.
#17
Comments
see also instadeepai/jumanji#212 and discussion at gymnasium discord about this https://discord.com/channels/961771112864313344/1029089576398114846/1165900249798287391 I think most useful and ergonomic implementation currently is gymnax with explicit EnvParams that should be passed to all the env functions. There will be no need to recompile with them, as these are just regular pytrees and not static. However, gymnax is not consistent and sometimes stores env params inside the class, imho all the params should be in the EnvParams as |
Hey, the suggestion you made in the I have a problem with the Gymnax API as in meta-RL the parameters do not change dynamically. If they would, it would just be a state. So having to manually carry this around is annoying and prone to error. This is also why I put the random key inside I also don't agree that these obs, state = env.reset(key_reset, env_params)
# Sample a random action.
action = env.action_space(env_params).sample(key_act) This design is asking for unintentional recompilations if array shapes change. Preferably when defining an Environment, it's specification is bounded and otherwise one should just make a new Environment. I'll try to get around to fixing stale issues in this repo perhaps in the weekend :) |
Actually if we jit the entire training loop like in PureJaxRL, there will be an error on shape changes, not recompilation. But I agree that explicit params do not allow for arbitrary changes to them, only to some that are compatible with meta-RL task. I think with |
Add flexibility to procedurally generate different environments based on reset keyword arguments.
So, instead of having to reinitialize the entire
Environment
object or mutating the attributes of an instance, this could be functionally performed through anoptions
dictionary or dataclass.This is an extension of the
dm_env
API for parity with thegymnasium
API. The only real problem with this extension is thatjit
ted reset functions will have to recompile each time theoptions
tree-structure changes, this should be noted in the docstring.The text was updated successfully, but these errors were encountered: