Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Open
1 task done
Howuhh opened this issue Oct 20, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Howuhh
Copy link

Howuhh commented Oct 20, 2023

Is your feature request related to a problem? Please describe

Hi, I'm currently developing a library with environments for meta-RL research. In order not to reinvent the wheel, I wanted to use the Jumanji interface (I like it better than gymnax, and Jumanji is more actively maintained), however I've encountered that with the current interface it is extremely difficult or impossible to do so.

In meta-RL we need to be able to adaptively change the environment parameters or the problem generator parameters and we need to do it from outside, because with sampling inside the environment we lose the ability to implement different training curriculums (besides implemented and hardcoded by default). Therefore, you need to be able to pass these parameters when resetting the environment. Gymnax does something similar.

Describe the solution you'd like

It seems to me that it would be enough to change the reset interface to:

reset(self, key: chex.PRNGKey, options: None | EnvOptions = None) -> Tuple[State, TimeStep]

where EnvOptions is arbitrary jit-compitable dataclass. step method can be left as it is, because these options can be stored in State and there is no need to pass it explicitly further. The only important thing is the possibility to change them on reset. Actually, gymnasium also does this (but for different reasons I guess...)

Currently I plan to add this argument in the subclass for my environments, but this will break compatibility with Jumanji wrappers (and in general) for example.

Describe alternatives you've considered

We can not use common meta-RL interface of env.set_task(task_params), as after jitting step and reset methods, this will not have any effect. We also can not give them at initalization, as base Environment class is not jit compatible and should be created once outside the jitted region.

Misc

  • Check for duplicate requests.
@Howuhh Howuhh added the enhancement New feature or request label Oct 20, 2023
@Howuhh Howuhh changed the title Jumanji is not suitable to meta-learning, but adding options parameter to reset method could fix this Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this Oct 20, 2023
@sash-a
Copy link
Collaborator

sash-a commented Oct 25, 2024

Hey @Howuhh really sorry for only getting back to you literally a year later. Really great work on xland minigrid, it's an awesome benchmark!

I'd be open to this although as you can probably tell (seeing as we took a year to reply) time is unfortunately a bit limited. Is this the solution you've settled on - passing some optional pytree to reset?

Importantly would this still benefit you as I assume you're not using jumanji wrappers etc? Or is it useful to be able to perform meta learning on some of these envs.

Apologies again for leaving this so long 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants