Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Howuhh · 2023-10-20T14:23:38Z

Is your feature request related to a problem? Please describe

Hi, I'm currently developing a library with environments for meta-RL research. In order not to reinvent the wheel, I wanted to use the Jumanji interface (I like it better than gymnax, and Jumanji is more actively maintained), however I've encountered that with the current interface it is extremely difficult or impossible to do so.

In meta-RL we need to be able to adaptively change the environment parameters or the problem generator parameters and we need to do it from outside, because with sampling inside the environment we lose the ability to implement different training curriculums (besides implemented and hardcoded by default). Therefore, you need to be able to pass these parameters when resetting the environment. Gymnax does something similar.

Describe the solution you'd like

It seems to me that it would be enough to change the reset interface to:

reset(self, key: chex.PRNGKey, options: None | EnvOptions = None) -> Tuple[State, TimeStep]

where EnvOptions is arbitrary jit-compitable dataclass. step method can be left as it is, because these options can be stored in State and there is no need to pass it explicitly further. The only important thing is the possibility to change them on reset. Actually, gymnasium also does this (but for different reasons I guess...)

Currently I plan to add this argument in the subclass for my environments, but this will break compatibility with Jumanji wrappers (and in general) for example.

Describe alternatives you've considered

We can not use common meta-RL interface of env.set_task(task_params), as after jitting step and reset methods, this will not have any effect. We also can not give them at initalization, as base Environment class is not jit compatible and should be created once outside the jitted region.

Misc

Check for duplicate requests.

The text was updated successfully, but these errors were encountered:

sash-a · 2024-10-25T10:15:58Z

Hey @Howuhh really sorry for only getting back to you literally a year later. Really great work on xland minigrid, it's an awesome benchmark!

I'd be open to this although as you can probably tell (seeing as we took a year to reply) time is unfortunately a bit limited. Is this the solution you've settled on - passing some optional pytree to reset?

Importantly would this still benefit you as I assume you're not using jumanji wrappers etc? Or is it useful to be able to perform meta learning on some of these envs.

Apologies again for leaving this so long 😄

Howuhh added the enhancement New feature or request label Oct 20, 2023

Howuhh changed the title ~~Jumanji is not suitable to meta-learning, but adding options parameter to reset method could fix this~~ Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this Oct 20, 2023

Howuhh mentioned this issue Nov 10, 2023

[feat] Extend core Environment reset method with options or params argument like Gymnasium. joeryjoery/jit_env#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Howuhh commented Oct 20, 2023 •

edited

Loading

sash-a commented Oct 25, 2024

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Jumanji is not suitable to meta-learning, but adding options parameter to reset method can fix this #212

Comments

Howuhh commented Oct 20, 2023 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Misc

sash-a commented Oct 25, 2024

Howuhh commented Oct 20, 2023 •

edited

Loading