-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Batched specs of heterogeneous shape and related stacked tensordicts #766
Comments
Where did the 3 come from here? Shouldn't it be a 2 (as the first dimension comes from the fact that we stacked 2 tensors)? Or am I missing something? Apart from this the issue looks perfect. Thank you for transcribing this. |
Edited :) |
Centralizing comments from The API I would propose will look like this: from torchrl.data import UnboundedContinuousTensorSpec, CompositeSpec, StackedCompositeSpec, StackedSpec
spec1 = UnboundedContinuousTensorSpec(shape=[3, 4])
spec2 = UnboundedContinuousTensorSpec(shape=[3, 4])
c_spec = StackedCompositeSpec(
CompositeSpec(action=spec1),
CompositeSpec(action=spec2)
)
spec = c_spec["action"]
assert isinstance(spec, StackedSpec)
spec.rand() # returns a nestedtensor
ntensor = spec.zeros() # returns a nestedtensor
spec.is_in(ntensor) # works
spec.is_in(ntensor.unbind(0)) # works
c_spec.rand() # returns a LazyStackedTensorDict
lstack = c_spec.zero() # returns a LazyStackedTensorDict
c_spec.is_in(lstack)
print(spec)
# StackedSpec(shape=torch.Size([2, 3, *], device="cpu", dtype=torch.float32)
print(c_spec)
# StackedCompositeSpec(
# "action": StackedSpec(shape=torch.Size([2, 3, *], device="cpu", dtype=torch.float32),
# ) Another question to solve is that se sometimes do stuff like
to get as many actions as there are envs. Also, what do you think of this feature? If you're happy with it I can assign someone on it. |
First of all thanks so much for recapping this! It looks good and matches what we thought. Here are some points:
spec1 = UnboundedContinuousTensorSpec(shape=[3, 4])
spec2 = MultOneHotDiscreteTensorSpec(shape=[2, 6])
c_spec = StackedCompositeSpec(
CompositeSpec(action=spec1),
CompositeSpec(action=spec2)
) Would this be allowed?
I think the issue you brought up here is extermely important and thanks for spotting this.I'll try to give an example to see if I got it. Environments (for example vmas) can reach very complex batch_dims. When vmas is used in (n_parallel_workers,
n_agents, # The dimension which can be heterogeneous,
n_vectorized_envs,
n_rollout_samples) Now, if you ask a What just happened is that we leaked part of the If then we call what you mentioned ( I have been thinking about this issue for some time and I can think of a few ways to tackle it but I still haven't foung a great one:
random_action = env.action_spec.rand(batch_size=(n_parallel_workers,n_agents, n_vectorized_envs, n_rollout_samples),stack_dim = 1)
# The rand takes the second dim (n_agents) as the heteorgenous stack dim
random_action.batch_size # ((n_parallel_workers,n_agents, n_vectorized_envs, n_rollout_samples)
random_action.stack_dim # 1 Correct The problem here is that to get the nested tensor you have to have the heterogneous dim as the first one, but this should be fine because you can do random_action_per_worker = random_action[0]
random_action_per_worker.stack_dim # 0
random_action_per_worker.get_nestedtensor("action) # Success! So overall I think this might be the best solution (give that the last snippet is something feasible) but I think we are facing a spicy issue and have to act carefully. |
The more I think about this the more I'm convinced that
Here are two solutions, one being less disruptive than the other: Leading shapes of specs must have the shape of
|
I also have a preference for the first solution. I think the first solution is patricularly nice because the specs get even closer to tensors and tensordicts. Am I right in thinking that the second with rollouts and paralllel envs would be: multi-agent, vectorized, batched, rolled out: |
That's up to us to decide, but it'll be confusing no matter what. With the first we can stick to the motto "time dim at the end" |
There is one doubt I had for a while in my mind: In environments that have a heterogeneous dimension, such dimension has to be the first one of the batch_size in order to get the relative nested tensors. But when such environment is wrapped in a I was wondering if this can be done in heterogenous i.e. env = VmasEnv("simple_crypto", num_envs=32)
env.batch_size # (n_agents, n_vec_envs)
env.rollout(10)["action"].shape # (n_agents, n_vec_envs, 10, *)
env.rollout(10)["action"].stack_dim = 0
env = ParallelEnv(2, lambda: VmasEnv("simple_crypto", num_envs=32))
env.batch_size # (2, n_agents, n_vec_envs)
env.rollout(10)["action"].shape # (2, n_agents, n_vec_envs, 10, *)
env.rollout(10)["action"].stack_dim = 1 Are the lazy stacked tensordicts already able to support this? I.e. preappending a dimension to the their batch_size |
You'll get a lazy stack of lazy stacks, yes. That should work |
Hey guys! Also, I got a question maybe out of context. |
This is a very interesting point you bring up, thanks! This is done in multi-agent libraries such as PettingZoo (https://github.com/Farama-Foundation/PettingZoo). The first and simplest solution is that an environment could keep an index_to_name list, which is the len of the number of agents and people can query to retrieve the name from an index. More complex things can be thought but they would look less nice. For example: one could think of making observation a composite spec with the entry jeys being the agent names. This would bring the n_agents outside the batch_size. The same thing could also be done for action and reward, but for example "done" would have issues as it cannot be composite. So you could not define a per-agent done. I think keeping everything in tensors (eventually lazy stacks if agents are heterogenous) would be cleaner and lets us benefit from all the goodies of torchrl |
I like this idea. |
Yep +1 on having per-env dicts that link index to name. It's also something we had in nocturne and it's not the easiest to handle. We should use (and document it) "lists" of envs where each has a numerical index. If this number can change, the list has a max number of agents and each env is assigned one and only one place in the list. If they have dedicated names, we store a dict |
we can close |
Motivation
In multiagent settings, each agent's individual spec can differ.
I would be nice to have a way of building heterogeneous composite specs, and carry data using tensordict following this logic.
Solution
StackedCompositeSpec
We could use a
StackedCompositeSpec
that would essentially work as a tuple of boxes in gym:Constructor 1
Constructor 2
This would basically mean that the environment expects an action of shape
Size([3])
for the first agent andSize([5])
for the second.LazyStackedTensorDict
to host tensors of different shape across a dimensionwhich would show a tensor
where the diverging shapes have been hidden.
With this kind of tensordict, the
get
operation would be prohibited. Instead, one could doSimilarly,
set_
method would not work (as we don't have a data format to pass the mixed input tensor except nestedtensor).That way we could carry data in ParallelEnv and using the collector while keeping the key with mixed attributes visible to the users.
One could also access a nestedtensor provided that not more than one LazyTensorDict layer is used (as we can't currently build nested nested tensors).
TensorDictSequential is already capable of handling lazy stacked tensordicts that have differnt keys. We could also think about allowing it (?) to gather tensors that do not share the same shape for instance, although this is harder to implement as not every module has a precise signature of the input tensor it expected.
cc @matteobettini
The text was updated successfully, but these errors were encountered: