-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] step_and_maybe_reset in env #1611
Changes from 118 commits
1f539dd
d4c16e1
d2321aa
565115a
3c46136
78cfa41
a6bd8eb
04d4ae7
2d0b4c6
528609a
51cd5af
3e31963
b11e73a
1540407
f1b0ea4
16b3538
7ad0864
bcac398
428f8ee
6fbd0bd
02db623
08a8f47
45e64f7
7dd4821
e1a2206
dc2caab
01ffbf9
ac76ec3
ceab010
f0327c9
518b3d1
47dd93b
354fb6f
53d5f9a
78c00e8
f63480e
9a3631f
44336ed
512f9f7
2ceb438
c666d5a
6ecebda
ce6e9bd
5c613c3
9f97e58
9cbcbb0
ae2748d
72c4163
6f4c374
bf36bec
4095766
9f2a9ad
5b34961
deba78d
b638461
f8b2f60
dfc868a
1221c48
22d3a27
38e0e90
5828a84
004d381
6ef720d
d2167ea
f66c6c9
4fd2768
ded1883
cb1a83f
6af6a45
eaf5ebd
f426322
83672c6
f0a6134
c589a4a
a85b321
ed5e96f
e705719
4c3fa33
e354bc8
faa3b41
80120b4
09205e5
3c41dab
4db8110
38cac91
a336f0e
a9f4678
9ee6348
62c848b
684d527
89ddfd2
a0376e4
44dfe86
d332597
85cf664
b55fae1
5dcbf15
0236818
36f151d
afecc65
f4a0beb
5a331e5
1ee4ba7
9d14427
8c15741
1fe2cd3
c378acc
551c9eb
e177c77
28b0059
5a21f2a
b7b2081
de1ecf2
6fec99e
6bfe517
8f78ac0
d2b734b
a0c12d5
83eb52f
5943c51
3e86847
ca1dd78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,23 @@ With these, the following methods are implemented: | |
- :meth:`env.step`: a step method that takes a :class:`tensordict.TensorDict` input | ||
containing an input action as well as other inputs (for model-based or stateless | ||
environments, for instance). | ||
- :meth:`env.step_and_maybe_reset`: executes a step, and (partially) resets the | ||
environments if it needs to. It returns the updated input with a ``"next"`` | ||
key containing the data of the next step, as well as a tensordict containing | ||
the input data for the next step (ie, reset or result or | ||
:func:`~torchrl.envs.utils.step_mdp`) | ||
This is done by reading the ``done_keys`` and | ||
assigning a ``"_reset"`` signal to each done state. This method allows | ||
to code non-stopping rollout functions with little effort: | ||
|
||
>>> data_ = env.reset() | ||
>>> result = [] | ||
>>> for i in range(N): | ||
... data, data_ = env.step_and_maybe_reset(data_) | ||
... result.append(data) | ||
... | ||
>>> result = torch.stack(result) | ||
|
||
- :meth:`env.set_seed`: a seeding method that will return the next seed | ||
to be used in a multi-env setting. This next seed is deterministically computed | ||
from the preceding one, such that one can seed multiple environments with a different | ||
|
@@ -169,7 +186,95 @@ one can simply call: | |
>>> print(a) | ||
9.81 | ||
|
||
It is also possible to reset some but not all of the environments: | ||
TorchRL uses a private ``"_reset"`` key to indicate to the environment which | ||
component (sub-environments or agents) should be reset. | ||
This allows to reset some but not all of the components. | ||
|
||
The ``"_reset"`` key has two distinct functionalities: | ||
1. During a call to :meth:`~.EnvBase._reset`, the ``"_reset"`` key may or may | ||
not be present in the input tensordict. TorchRL's convention is that the | ||
absence of the ``"_reset"`` key at a given ``"done"`` level indicates | ||
a total reset of that level (unless a ``"_reset"`` key was found at a level | ||
above, see details below). | ||
If it is present, it is expected that those entries and only those components | ||
where the ``"_reset"`` entry is ``True`` (along key and shape dimension) will be reset. | ||
|
||
The way an environment deals with the ``"_reset"`` keys in its :meth:`~.EnvBase._reset` | ||
method is proper to its class. | ||
Designing an environment that behaves according to ``"_reset"`` inputs is the | ||
developer's responsibility, as TorchRL has no control over the inner logic | ||
of :meth:`~.EnvBase._reset`. Nevertheless, the following point should be | ||
kept in mind when desiging that method. | ||
|
||
2. After a call to :meth:`~.EnvBase._reset`, the output will be masked with the | ||
``"_reset"`` entries and the output of the previous :meth:`~.EnvBase.step` | ||
will be written wherever the ``"_reset"`` was ``False``. In practice, this | ||
means that if a ``"_reset"`` modifies data that isn't exposed by it, this | ||
modification will be lost. After this masking operation, the ``"_reset"`` | ||
entries will be erased from the :meth:`~.EnvBase.reset` outputs. | ||
|
||
It must be pointed that ``"_reset"`` is a private key, and it should only be | ||
used when coding specific environment features that are internal facing. | ||
In other words, this should NOT be used outside of the library, and developers | ||
will keep the right to modify the logic of partial resets through ``"_reset"`` | ||
setting without preliminary warranty, as long as they don't affect TorchRL | ||
internal tests. | ||
|
||
Finally, the following assumptions are made and should be kept in mind when | ||
designing reset functionalities: | ||
|
||
- Each ``"_reset"`` is paired with a ``"done"`` entry (+ ``"terminated"`` and, | ||
possibly, ``"truncated"``). This means that the following structure is not | ||
allowed: ``TensorDict({"done": done, "nested": {"_reset": reset}}, [])``, as | ||
the ``"_reset"`` lives at a different nesting level than the ``"done"``. | ||
- A reset at one level does not preclude the presence of a ``"_reset"`` at lower | ||
vmoens marked this conversation as resolved.
Show resolved
Hide resolved
|
||
levels, but it annihilates its effects. The reason is simply that | ||
whether the ``"_reset"`` at the root level corresponds to an ``all()``, ``any()`` | ||
or custom call to the nested ``"done"`` entries cannot be known in advance, | ||
and it is explicitly assumed that the ``"_reset"`` at the root was placed | ||
there to superseed the nested values (for an example, have a look at | ||
:class:`~.PettingZooWrapper` implementation where each group has one or more | ||
``"done"`` entries associated which is aggregated at the root level with a | ||
``any`` or ``all`` logic depending on the task). | ||
- When calling :meth:`env.reset(tensordict)` with a partial ``"_reset"`` entry | ||
that will reset some but not all the done sub-environments, the input data | ||
should contain the data of the sub-environemtns that are __not__ being reset. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can the input contain only the data "of the sub-environments that are not being reset." It has to contain all data no? otherwise it is sparse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. who said only? :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but if we specify that it should contain the data for those that are not being reset it might be read as implying that it contains only that. I think it is easier if we say it contains all data There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we want to be precise, we don't really care that the data of the envs being reset is there actually. As long as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example: import torch
from tensordict import TensorDict
reset = torch.tensor([True, False])
td = torch.stack([
TensorDict({}, []),
TensorDict({"a": 1}, []),
])
td_reset = torch.stack([
TensorDict({"a": 0}, []),
TensorDict({}, []),
])
td_reset.where(~reset, other=td, pad=0) which returns a non-sparse td
|
||
The reason for this constrain lies in the fact that the output of the | ||
``env._reset(data)`` can only be predicted for the entries that are reset. | ||
For the others, TorchRL cannot know in advance if they will be meaningful or | ||
not. For instance, one could perfectly just pad the values of the non-reset | ||
components, in which case the non-reset data will be meaningless and should | ||
be discarded. | ||
|
||
Below, we give some examples of the expected effect that ``"_reset"`` keys will | ||
have on an environment returning zeros after reset: | ||
|
||
>>> # single reset at the root | ||
>>> data = TensorDict({"val": [1, 1], "_reset": [False, True]}, []) | ||
>>> env.reset(data) | ||
>>> print(data.get("val")) # only the second value is 0 | ||
tensor([1, 0]) | ||
>>> # nested resets | ||
>>> data = TensorDict({ | ||
... ("agent0", "val"): [1, 1], ("agent0", "_reset"): [False, True], | ||
... ("agent1", "val"): [2, 2], ("agent1", "_reset"): [True, False], | ||
... }, []) | ||
>>> env.reset(data) | ||
>>> print(data.get(("agent0", "val"))) # only the second value is 0 | ||
tensor([1, 0]) | ||
>>> print(data.get(("agent1", "val"))) # only the second value is 0 | ||
tensor([0, 2]) | ||
>>> # nested resets are overridden by a "_reset" at the root | ||
>>> data = TensorDict({ | ||
... "_reset": [True, True], | ||
... ("agent0", "val"): [1, 1], ("agent0", "_reset"): [False, True], | ||
... ("agent1", "val"): [2, 2], ("agent1", "_reset"): [True, False], | ||
... }, []) | ||
>>> env.reset(data) | ||
>>> print(data.get(("agent0", "val"))) # reset at the root overrides nested | ||
tensor([0, 0]) | ||
>>> print(data.get(("agent1", "val"))) # reset at the root overrides nested | ||
tensor([0, 0]) | ||
|
||
.. code-block:: | ||
:caption: Parallel environment reset | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using the word environment? I would use dimensions as this could be tasks or agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fair point. In the context of batched envs it seemed clearer to refer to it as envs
we could say "envs, tasks or agent"