[Feature] step_and_maybe_reset in env #1611

vmoens · 2023-10-06T09:16:46Z

Contribution

This PR proposes the step_and_maybe_reset in EnvBase.
This method executes a step followed by a reset, if necessary.

We also make reset more robust by ensuring that partial resets are handled uniformly. This is necessary since step_and_maybe_reset must take care of this functionality, and from our perspective handling partial resets is the responsibility of reset (the user should not have to worry about data not updated properly).

This has repercussions on the logic behind TransformedEnv._reset and BatchedEnv._reset.

I'm now considering having batched envs calling reset and not _reset to make sure that the data is well presented, since now the update of the input tensordict with the tensordict_reset occurs after _reset (hence, the output of _reset in SerialEnv is incomplete).
This could introduce some overhead but that's of limited impact since now step_and_maybe_reset is there to handle things faster.

# Conflicts: # benchmarks/ecosystem/gym_env_throughput.py

matteobettini · 2023-10-21T12:44:30Z

torchrl/envs/libs/pettingzoo.py

@@ -464,12 +480,18 @@ def _reset(
        self, tensordict: Optional[TensorDictBase] = None, **kwargs
    ) -> TensorDictBase:

+        _reset = tensordict.get("_reset", None)


This crashes when tensordict is None

Yeah sorry about that, I can't test petting zoo locally so I'm always moving in the dark...

vmoens · 2023-10-22T00:02:54Z

@matteobettini @albertbou92 @BY571 this should be (almost) mergeable.
Tests are passing (VMAS sporadically failing after tests complete and old deps broken because of #1622)

matteobettini · 2023-10-22T00:07:44Z

Why would vmas sporadically fail?

vmoens · 2023-10-22T00:39:13Z

Why would vmas sporadically fail?

Not sure you can have a look.
All tests pass but it fails when closing.
Seems like an issue with tensors not released on CUDA, it happens often with VMAS. It isn't even a flaky test, since they all pass, it's just a weird exit status

matteobettini · 2023-10-22T07:46:37Z

Why would vmas sporadically fail?

Not sure you can have a look. All tests pass but it fails when closing. Seems like an issue with tensors not released on CUDA, it happens often with VMAS. It isn't even a flaky test, since they all pass, it's just a weird exit status

Was this happening before this PR?

vmoens · 2023-10-22T12:37:03Z

it happens often with vmas

I was referring to VMAS CI beyond this PR

matteobettini

Some final questions/comments

matteobettini · 2023-10-24T08:05:39Z

torchrl/envs/utils.py

    # goes through the tensordict and brings the _reset information to
    # a boolean tensor of the shape of the tensordict.
    batch_size = data.batch_size
    n = len(batch_size)
-
+    if done_keys is not None and reset_keys is None:
+        reset_keys = {_replace_last(key, "done") for key in done_keys}


I am not following this.
We are taking the done keys (with all the terminated and truncated entries) and replacing a "done" ending for all and making it a set which we call reset keys.

This is counter intuitive as reset_keys have a _reset ending and not a done ending.
This change seems to come from the fact that you aim to use this function in 2 contexts:

normally on the root td with the reset_keys as input

on the "next" td in collectors with the done_keys as input

I think we should try to write this better, here are some suggestions:

always call the function on the root td and pass the keys with preappended "next" if you want to use that

do this key filtering and conversion outside of the function and let the function just operate on a set of keys to be cconsidered as reset keys

I don't see the problem, can you elaborate why we should try to write this better?
Is it a naming problem? We can rename the function _aggregate_stop or smth similar.

always call the function on the root td and pass the keys with preappended "next" if you want to use that
That introduces some unwanted overhead when we can directly access "next" and read the done_keys. Recall that td.get(("next", "key")) is considerably slwoer than next_td.get("key") as we do here.

do this key filtering and conversion outside of the function and let the function just operate on a set of keys to be cconsidered as reset keys

What's your suggestion for _update_traj_ids in collectors.py for instance? We don't have a "_reset" in the "next" tensordict, but I think this function does its job of aggregating the done signals to read what the trajectory ids are.
What I understand is that the confusion comes from the "reset" in the function name, but what this function really does is just aggregating end-of-trajectory signals (either reset or done) to the root.
Given this, I don't see why it should be changed. It's a private function, properly tested and I think it serves its purpose.

connecting to the comment below, we could have

_aggeregate_keys(keys=) which can be called on anything

alternatively we could have both _aggeregate_dones and _aggregate_resets where one calls the other

matteobettini · 2023-10-24T08:07:43Z

torchrl/collectors/collectors.py

+        traj_sop = _aggregate_resets(
+            tensordict.get("next"), done_keys=self.env.done_keys


I assume we have to do this becuse there are no reset keys anymore visible by the collector.
Since this is quite counterintitive, what about a _aggregate_dones()?
I also have other suggestions in the other comment relating to this function.

yep we can rename the function. I would rather say that it's more direct to aggregate the done rather than the "_reset" which come from the dones

matteobettini · 2023-10-24T08:21:56Z

torchrl/envs/common.py

+            action_keys=self.action_keys,
+            done_keys=self.done_keys,
+        )
+        any_done = _terminated_or_truncated(


Help me understand this a little bit better.

In an axample case (e.g., pettingzoo) where i have

{ "done": [False], "agents":{"done:[True, False]} }

Is the any_done triggered?

If so, this is a problem for envs like PettingZoo where _reset() will be called with {"_reset": [False]}

The best thing it trying it out :)

from tensordict import TensorDict from torchrl.envs.utils import _terminated_or_truncated data = TensorDict({"done": [False], ("agent", "done"): [True, False]}, []) print(_terminated_or_truncated(data))

which returns True
So what you're saying is that it should be False since there's a False at the root?
I can correct that

I think it should to follow the dominance rule we imposed right?
Or at least in this context definitely becuase we do not want to call reset.
I don't know in what other contexts this function is used though, but if its primary use is to decide when to call reset then yes

vmoens · 2023-10-24T09:28:14Z

@matteobettini I addressed all your comments

matteobettini

LGTM

init

1f539dd

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2023

amend

d4c16e1

vmoens added enhancement New feature or request performance Performance issue or suggestion for improvement Refactoring Refactoring of an existing feature labels Oct 6, 2023

vmoens added 2 commits October 6, 2023 13:57

amend

d2321aa

amend

565115a

matteobettini added the Environments Adds or modifies an environment wrapper label Oct 6, 2023

vmoens added 21 commits October 6, 2023 15:07

amend

3c46136

amend

78cfa41

amend

a6bd8eb

amend

04d4ae7

init

2d0b4c6

amend

528609a

fix

51cd5af

Merge remote-tracking branch 'origin/main' into step_maybe_reset

3e31963

# Conflicts: # benchmarks/ecosystem/gym_env_throughput.py

amend

b11e73a

remove pip3 install -e .

1540407

tensordict_

f1b0ea4

amend rollout logic

16b3538

amend

7ad0864

amend

bcac398

inference

428f8ee

cpu -> cuda

6fbd0bd

checks

02db623

using pipe instead of event

08a8f47

amend

45e64f7

amend

7dd4821

rm cuda event

e1a2206

vmoens added 7 commits October 20, 2023 12:06

amend

1fe2cd3

amend

c378acc

amend

551c9eb

amend

e177c77

amend

28b0059

amend

5a21f2a

lint

b7b2081

matteobettini reviewed Oct 21, 2023

View reviewed changes

matteobettini mentioned this pull request Oct 21, 2023

[Env] Custom RewardSum facebookresearch/BenchMARL#32

Merged

vmoens added 2 commits October 21, 2023 09:08

amend

de1ecf2

amend

6fec99e

amend

6bfe517

vmoens added 3 commits October 23, 2023 11:09

cache keys

8f78ac0

fix empty cache

d2b734b

Merge remote-tracking branch 'origin/main' into step_maybe_reset

a0c12d5

matteobettini reviewed Oct 24, 2023

View reviewed changes

vmoens added 3 commits October 24, 2023 09:25

amend

83eb52f

Merge remote-tracking branch 'origin/main' into step_maybe_reset

5943c51

addressing comments

3e86847

matteobettini approved these changes Oct 24, 2023

View reviewed changes

amend

ca1dd78

vmoens merged commit 3b355dd into main Oct 24, 2023
51 of 59 checks passed

vmoens deleted the step_maybe_reset branch October 24, 2023 10:31

vmoens mentioned this pull request Oct 24, 2023

[BugFix] Fix storage device #1650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] step_and_maybe_reset in env #1611

[Feature] step_and_maybe_reset in env #1611

vmoens commented Oct 6, 2023 •

edited

Loading

matteobettini Oct 21, 2023

vmoens Oct 21, 2023

vmoens commented Oct 22, 2023

matteobettini commented Oct 22, 2023

vmoens commented Oct 22, 2023

matteobettini commented Oct 22, 2023

vmoens commented Oct 22, 2023

matteobettini left a comment

matteobettini Oct 24, 2023

vmoens Oct 24, 2023

matteobettini Oct 24, 2023

matteobettini Oct 24, 2023

vmoens Oct 24, 2023

matteobettini Oct 24, 2023 •

edited

Loading

vmoens Oct 24, 2023

matteobettini Oct 24, 2023

vmoens commented Oct 24, 2023

matteobettini left a comment

		traj_sop = _aggregate_resets(
		tensordict.get("next"), done_keys=self.env.done_keys

[Feature] step_and_maybe_reset in env #1611

[Feature] step_and_maybe_reset in env #1611

Conversation

vmoens commented Oct 6, 2023 • edited Loading

Contribution

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens commented Oct 22, 2023

matteobettini commented Oct 22, 2023

vmoens commented Oct 22, 2023

matteobettini commented Oct 22, 2023

vmoens commented Oct 22, 2023

matteobettini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matteobettini Oct 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens commented Oct 24, 2023

matteobettini left a comment

Choose a reason for hiding this comment

vmoens commented Oct 6, 2023 •

edited

Loading

matteobettini Oct 24, 2023 •

edited

Loading