Sample a successor state by passing a key to Environment.step #211

carlosgmartin · 2023-10-13T04:32:33Z

Is your feature request related to a problem? Please describe

Currently, one obtains a successor state by calling Environment.step(state, action). The state itself contains a key, which is derived from the key argument of Environment.reset via splitting and propagation throughout the episode. This lets Jumanji simulate stochastic environments.

However, this approach has some disadvantages:

It does not allow one to (re)sample successor states.
If an agent receives a State as input, it can plan (think AlphaZero) with access to future environment randomness, breaking the assumption that the latter is unpredictable and letting the agent "cheat".

Describe the solution you'd like

Allow Environment.step to receive a key argument directly, as in Environment.step(state, action, key). This is the approach taken by gymnax. It is also the approach pgx intends to take: sotetsuk/pgx#1043.

In the medium/long term, I would support deprecating the State.key attribute entirely, which is currently the only constraint enforced by the StateProtocol. Its removal would allow State objects to be completely generic (they could be strings, ints, tuples, dicts, etc.).

Describe alternatives you've considered

A possible alternative is to create a copy of the state, replace its key attribute, and pass it into Environment.step (for the first issue) or the agent (for the second issue). However, this approach seems hacky and error-prone.

Fundamentally, it seems like step should be treated as an intrinsically stochastic function, implying that it should receive its own key at call time. (The key can be None if it's not needed.)

The text was updated successfully, but these errors were encountered:

sash-a · 2024-10-25T11:16:55Z

Really sorry for taking a year to get back to you on this. I think this is a great suggestion, I don't have time to do this right now (I may have some time early next year) as it would quite a bit of work, but I'll will leave this up for anyone who wants to implement it I'll be happy to review 😄

carlosgmartin added the enhancement New feature or request label Oct 13, 2023

This was referenced Oct 13, 2023

[Enhancement] Observation from state in public API #195

Open

Specify rng explicitly in step function sotetsuk/pgx#1043

Closed

sash-a added the help wanted Extra attention is needed label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample a successor state by passing a key to Environment.step #211

Sample a successor state by passing a key to Environment.step #211

carlosgmartin commented Oct 13, 2023 •

edited

Loading

sash-a commented Oct 25, 2024 •

edited

Loading

Sample a successor state by passing a key to Environment.step #211

Sample a successor state by passing a key to Environment.step #211

Comments

carlosgmartin commented Oct 13, 2023 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

sash-a commented Oct 25, 2024 • edited Loading

carlosgmartin commented Oct 13, 2023 •

edited

Loading

sash-a commented Oct 25, 2024 •

edited

Loading