You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
Currently, one obtains a successor state by calling Environment.step(state, action). The state itself contains a key, which is derived from the key argument of Environment.reset via splitting and propagation throughout the episode. This lets Jumanji simulate stochastic environments.
However, this approach has some disadvantages:
It does not allow one to (re)sample successor states.
If an agent receives a State as input, it can plan (think AlphaZero) with access to future environment randomness, breaking the assumption that the latter is unpredictable and letting the agent "cheat".
Describe the solution you'd like
Allow Environment.step to receive a key argument directly, as in Environment.step(state, action, key). This is the approach taken by gymnax. It is also the approach pgx intends to take: sotetsuk/pgx#1043.
In the medium/long term, I would support deprecating the State.key attribute entirely, which is currently the only constraint enforced by the StateProtocol. Its removal would allow State objects to be completely generic (they could be strings, ints, tuples, dicts, etc.).
Describe alternatives you've considered
A possible alternative is to create a copy of the state, replace its key attribute, and pass it into Environment.step (for the first issue) or the agent (for the second issue). However, this approach seems hacky and error-prone.
Fundamentally, it seems like step should be treated as an intrinsically stochastic function, implying that it should receive its own key at call time. (The key can be None if it's not needed.)
The text was updated successfully, but these errors were encountered:
Really sorry for taking a year to get back to you on this. I think this is a great suggestion, I don't have time to do this right now (I may have some time early next year) as it would quite a bit of work, but I'll will leave this up for anyone who wants to implement it I'll be happy to review 😄
Is your feature request related to a problem? Please describe
Currently, one obtains a successor state by calling
Environment.step(state, action)
. Thestate
itself contains akey
, which is derived from thekey
argument ofEnvironment.reset
via splitting and propagation throughout the episode. This lets Jumanji simulate stochastic environments.However, this approach has some disadvantages:
State
as input, it can plan (think AlphaZero) with access to future environment randomness, breaking the assumption that the latter is unpredictable and letting the agent "cheat".Describe the solution you'd like
Allow
Environment.step
to receive akey
argument directly, as inEnvironment.step(state, action, key)
. This is the approach taken by gymnax. It is also the approach pgx intends to take: sotetsuk/pgx#1043.In the medium/long term, I would support deprecating the
State.key
attribute entirely, which is currently the only constraint enforced by theStateProtocol
. Its removal would allowState
objects to be completely generic (they could be strings, ints, tuples, dicts, etc.).Describe alternatives you've considered
A possible alternative is to create a copy of the
state
, replace itskey
attribute, and pass it intoEnvironment.step
(for the first issue) or the agent (for the second issue). However, this approach seems hacky and error-prone.Fundamentally, it seems like
step
should be treated as an intrinsically stochastic function, implying that it should receive its ownkey
at call time. (Thekey
can beNone
if it's not needed.)The text was updated successfully, but these errors were encountered: