Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample a successor state by passing a key to Environment.step #211

Open
carlosgmartin opened this issue Oct 13, 2023 · 1 comment
Open
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@carlosgmartin
Copy link

carlosgmartin commented Oct 13, 2023

Is your feature request related to a problem? Please describe

Currently, one obtains a successor state by calling Environment.step(state, action). The state itself contains a key, which is derived from the key argument of Environment.reset via splitting and propagation throughout the episode. This lets Jumanji simulate stochastic environments.

However, this approach has some disadvantages:

  1. It does not allow one to (re)sample successor states.
  2. If an agent receives a State as input, it can plan (think AlphaZero) with access to future environment randomness, breaking the assumption that the latter is unpredictable and letting the agent "cheat".

Describe the solution you'd like

Allow Environment.step to receive a key argument directly, as in Environment.step(state, action, key). This is the approach taken by gymnax. It is also the approach pgx intends to take: sotetsuk/pgx#1043.

In the medium/long term, I would support deprecating the State.key attribute entirely, which is currently the only constraint enforced by the StateProtocol. Its removal would allow State objects to be completely generic (they could be strings, ints, tuples, dicts, etc.).

Describe alternatives you've considered

A possible alternative is to create a copy of the state, replace its key attribute, and pass it into Environment.step (for the first issue) or the agent (for the second issue). However, this approach seems hacky and error-prone.

Fundamentally, it seems like step should be treated as an intrinsically stochastic function, implying that it should receive its own key at call time. (The key can be None if it's not needed.)

@sash-a
Copy link
Collaborator

sash-a commented Oct 25, 2024

Really sorry for taking a year to get back to you on this. I think this is a great suggestion, I don't have time to do this right now (I may have some time early next year) as it would quite a bit of work, but I'll will leave this up for anyone who wants to implement it I'll be happy to review 😄

@sash-a sash-a added the help wanted Extra attention is needed label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants