[Feature Request] Transform that stacks data for agents with identical specs #2566

kurtamohler · 2024-11-14T21:09:35Z

Motivation

Some multi-agent environments, like VmasEnv, stack all of the tensors for observations, rewards, etc. for different agents that have identical specs. For instance, in one of these stacked environments, if there are 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([2, 8]),
            ...),
        shape=torch.Size([2])),
    shape=torch.Size([]))

In contrast, other environments, like UnityMLAgentsEnv, have separate keys for each agent, even if the agents' specs are identical. For instance, with 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        agent_0: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        agent_1: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        shape=torch.Size([])),
    shape=torch.Size([]))

It is not easy to apply the same training script to two environments that use these two different formats. For instance, applying the multi-agent PPO tutorial to a Unity env is not straightforward.

Solution

If we had an environment transform that could stack all the data from different keys, we could convert an environment that uses the unstacked format into an environment that uses the stacked format. Then it should be straightforward to use the same (or almost the same) training script on the two different environments.

Alternatives

Additional context

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

thomasbbrunner · 2024-11-15T08:24:47Z

Have you taken a look at the group_map argument? When set to MarlGroupMapType.ALL_IN_ONE_GROUP the environment should return all agents in a single group (when possible, otherwise in more than one group).

If you are using this setting, then imo there's an issue in the implementation of the grouping of agents in UnityMLAgentsEnv.

kurtamohler · 2024-11-18T09:40:26Z

@thomasbbrunner, the group_map=ALL_IN_ONE_GROUP argument does combine all the agents into one group key, but just to be clear, the specific issue is that the data for each of the agents do not get stacked all into the same set of tensors--each agent still has a separate key containing separate tensors, like the example I showed in the issue description.

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

But I'm happy to change the behavior if there is good reason to do so.

@vmoens, wdyt?

thomasbbrunner · 2024-11-18T12:49:29Z

Ah, I see! Thanks for the explanation. It seems that the UnityMLAgentsEnv has a different behavior than the other multi-agent environments (my experience is mostly based on PettingZoo here).

PettingZooEnv has three possible grouping types:

ONE_GROUP_PER_AGENT (self explanatory)
None (default): will group as many agents as possible, while keeping the groups homogeneous (i.e., all agents in the group have the same obs and action spec).
ALL_IN_ONE_GROUP: (self explanatory) + if the agents cannot be grouped together (due to mismatched specs), it will return some weird spec and any environment operation will lead to an error.

Personally, I don't see the benefit of grouping heterogeneous agents. Maybe that makes sense in your use-case, but I'd argue that there is no difference between having one group for each agent and a group containing sub-groups of heterogenous agents.

I quite like the default behavior of PettingZooEnv (grouping where possible) and I don't understand why there isn't a GROUP_HOMOGENEOUS (or similar) in MarlGroupMapType.

Would be interested in hearing more about your use-case!

thomasbbrunner · 2024-11-18T12:53:46Z

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

cc @matteobettini

kurtamohler · 2024-11-18T23:18:52Z

I agree that we should make environments consistent with each other when possible. At the time, I didn't think that putting the Unity env agents under separate keys would be a significant inconsistency with other TorchRL envs (at least compared to VMAS, PettingZoo, and OpenSpiel)--it just seemed like the right choice because it's more consistent with the underlying ml_agents library. But I see that it probably would be better to stack the data of agents that share the same group, by default.

@vmoens, do you agree with this?

I don't understand why there isn't a GROUP_HOMOGENEOUS

That sounds like a good idea to me. Feel free to submit an issue

Personally, I don't see the benefit of grouping heterogeneous agents.

I don't either, but then again, I'm fairly new here. Maybe I took ALL_IN_ONE_GROUP too literally

vmoens · 2024-11-19T13:09:04Z

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

I'm sorry this is the impression you have, @matteobettini gave a great deal of effort in unifying MARL APIs, but if there are inconsistencies we should address them!

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

I 100% agree on this.
My take when writing wrappers is usually that I want the wrapper to have the minimal interference with the underlying API – the only thing we want is to present the data in a tensordict-friendly manner: ie, convert to tensors, make sure that the data is on the desired device, minimize the cost of operations etc. Once that is done, we leave it to the user to deal with the data.
The reason we want to do this that any change we make in the way the data is presented to the user will create edge cases that we may not know of.

Now RE the "consistentcy" problem highlighted by @thomasbbrunner, as I said earlier we should make sure that things can be made uniform at a low cost.
My vision here would be for TorchRL to deal itself with the various grouping strategies with its own API, possibly inspired by pettingzoo if that is a convention in the community.
The API would consist in appending a transform that re-arranges the tensordict given some flag, for instance you could create your Unity env in its wrapper and pass a transform

env = UnityMLAgentsEnv(...)
transform = GroupMARLAgents(MARLGroups.ONE_GROUP_PER_AGENT)
env = env.append_transform(transform)

and then, internally, we make sure that the transform does its job for every MARL library we support.

Would that make sense?

nb: Using metaclasses, we could even pass directly a group argument in each wrapper that automatically append the transform if required.

matteobettini · 2024-11-19T13:15:10Z

Hey guys so I think there might have been some general confusion about the multi-agent environment API here.

If you can, whatch this section of this video to understand how it works https://youtu.be/1tOIMgJf_VQ?si=1RJ7PGD3s5--hI2o&t=1235

Here is a recap:

The choice of which agents should be stacked together and which kept separate (what we call grouping) has to be 100% up to the user.

That is why multi-agent envs should take a group map that maps groups to agents in that group.
This is a dictionary passed by the user.

The MarlGroupMapType are just some sensible defaults for this value. As you know, these right now are either all in one group or one group per agent.

Then environments have a default behaviour for building the map when nothing is passed.
The default behavior that makes the most sense to me (and it is the one used in vmas and pettingzoo) is to group together agents based on their name. Basically if you have "evader_1", "evader_2" "attacker_1" it would make 2 groups named "attacker" and "evader".

This to me was better than GROUP_HOMOGENEOUS because it is difficult to define a sensible and consistent bhavior for that. But I am open to discuss this.

The choice of the grouping needs to be passed at environment construction. If MLAgents is overfitted to one specific choice I believe we should extend it. Modifying the grouping in a transform later is very inefficient and I think should be avoided

MLAgents also has a concept of an agent type or something similar which I remember would be great for the default grouping strategy

matteobettini · 2024-11-19T13:30:37Z

Maybe i am missing what the group_map input to the MLAgents constructor does. Cause that is what I would use to change the data grouping and address this issue

kurtamohler · 2024-11-26T00:17:29Z

I think I understand what you're saying, @matteobettini. I think two main points to summarize are:

By default, UnityMLAgentsEnv should not stack agent data, and the structure of the specs should reflect the group IDs assigned by MLAgents. (At least Vincent and I seem to agree on this)
All agents that are in the same MARL group should always be stacked together.

An important distinction is that the MLAgents group IDs are not the same thing as MARL group keys.

At the moment, UnityMLAgentsEnv honors point 1, but not point 2. So how do we honor both points?

Consider an example unity env where there are two MLAGents group IDs, which each have 3 agents.

Right now, the default group map given by UnityMLAgentsEnv would be this:

{
    'group_0': ['agent_0', 'agent_1', 'agent_2'],
    'group_1': ['agent_3', 'agent_4', 'agent_5'],
}

In order to honor both points 1 and 2, we would need to change UnityMLAgentsEnv's default group map to:

{
    ('group_0', 'agent_0'): ['agent_0'],
    ('group_0', 'agent_1'): ['agent_1'],
    ('group_0', 'agent_2'): ['agent_2'],
    ('group_1', 'agent_3'): ['agent_3'],
    ('group_1', 'agent_4'): ['agent_4'],
    ('group_1', 'agent_5'): ['agent_5'],
}

Of course, to allow the user to specify a different group map where some agents are in the same MARL group, UnityMLAgentsEnv would need to be updated to know how to stack/unstack the data coming out of and going into MLAgents.

However, a completely different alternative to all of the above could be to just remove the group_map arg from UnityMLAgentsEnv, and then if the user wants to stack agents, they can use the Stack transform. One benefit is that UnityMLAgentsEnv wouldn't have the burden of implementing the logic for stacking/unstacking MARL groupings.

kurtamohler · 2024-11-26T00:56:56Z

Modifying the grouping in a transform later is very inefficient and I think should be avoided

Of course, you're right that creating a MARL grouping and then modifying it later with a transform would be inefficient. But what if the env wrappers didn't create any MARL groupings, and instead we always use a transform, like the GroupMARLAgents transform that Vincent suggested?

If I understand correctly, the stacked tensordict for each MARL group is typically created during each EnvBase.step/reset call by assembling a list of the agents' tensordicts and then stacking them. If instead the env wrappers just put all the agents' tensordicts into the output tensordict under different keys and then a transform stacks them, I would think that the performance is pretty similar. There will be some overhead from accessing each agent from the tensordict given to the transform, but I wouldn't expect it to be much, since accesses to dicts are typically meant to give logarithmic runtime, though I'm not specifically familiar with how efficient TensorDict accesses are.

A further point is that if we used a transform for all MARL groupings, rather than requiring the env wrappers to implement the MARL grouping logic, then the implementation of env wrappers would become a bit simpler and less prone to potentially inefficient or bugged implementations of the MARL grouping logic. It seems like it would be nice to have the MARL grouping logic implemented in one common place.

matteobettini · 2024-11-26T08:48:15Z

Thanks for the answer.

I'll leave the discussion of the transform aside for now and focus on the env.

Yes point 2 makes perfect sense!

By default, UnityMLAgentsEnv should not stack agent data, and the structure of the specs should reflect the group IDs assigned by MLAgents. (At least Vincent and I seem to agree on this)

This is what is causing the problem in my opinion. It is creating a discrepancy with the other MARL environments.

We are having a meeting to discuss this tomorrow but for now let me just try to explain the MARL API a bit better.

Axioms

Let me list some axioms to begin with, these should always be true, if you don't agree with these then you can stop me here.

The structure of the specs and the input/output tensordicts should always match. Therefore the same grouping should be applied to both.
Such structure is dictated purely and solely by the TorchRL group_map. Values in the same group are stacked, different groups are separate entries
When a key is shared across groups, it has to be located in the root of the tensordict without any group dimension

Your choice when implementing a multi-agent wrapper

If you follow the above axioms you relise that you do not have much freedom as the implementer of a task. You have to support any possible group_map given by the user.

What you have choice in is some sensible defaults for group_map.
This is where the discussion of MLAgents comes in.

Since MLAgents has this nice internal concept of group IDs, a perfect default value for torchrl group_map is to match the internal one from MLAgents.

Users can also request other groupings (like ONE_GROUP_PER_AGENT which seems to be the only one provided now). In this case the torchrl group_map won't align with the MLAgents group map but this is fine! What is important is that on the torchrl side of things both the specs and the data always follow the same torchrl group_map (given at construction)

If the user requests an unfeasible group_map, for example stacking heterogeneous agents, this will be provided on a best-effort basis, resulting in errors if it is not possible.

The current problem

My impression is that the current implementation of MLAgent violates axioms 1 and 2

kurtamohler · 2024-11-26T18:48:06Z

Those axioms make sense.

My impression is that the current implementation of MLAgent violates axioms 1 and 2

Isn't only axiom 2 violated? The specs and tensordicts currently do match for the MLAgents wrapper. That is, unless you've seen a bugged case where that's not true. As far as I understand, the only thing that MLAgents violates is the part of axiom 2 that says the specs and tensordicts in a group must be stacked.

Is my solution here insufficient in some way?

matteobettini · 2024-11-27T09:58:00Z

Isn't only axiom 2 violated? The specs and tensordicts currently do match for the MLAgents wrapper. That is, unless you've seen a bugged case where that's not true. As far as I understand, the only thing that MLAgents violates is the part of axiom 2 that says the specs and tensordicts in a group must be stacked.

Nice! I think I had understood wrongly

Is my solution #2566 (comment) insufficient in some way?

I think it makes sense but I am not sure I understood it fully. I think it is because I have not clear what the group_map arg currently does.

Two points about that solution that I am still wondering about:

isn't group_map == mlagents_group_map the best default value? Why are we having one group per agent as default?
The names of the groups cannot be tuples (this is just a nit)

It is totally fine to remove completely the group_map arg from MLagents, but I personally think it is a nice feature to have and it aligns with the many envs already present.

There is also a point about the transform adding an overhead which we can talk about today in our meeting

matteobettini · 2024-11-27T10:04:46Z

A further point is that if we used a transform for all MARL groupings, rather than requiring the env wrappers to implement the MARL grouping logic, then the implementation of env wrappers would become a bit simpler and less prone to potentially inefficient or bugged implementations of the MARL grouping logic. It seems like it would be nice to have the MARL grouping logic implemented in one common place.

I agree with this but I think it could be done in a utility function (which many MARL envs can use).

Since all MARL envs already provide this functionality and it is now part of the MARL API, I don't see why removing it. Instead we could consider making a util function to help in providing it.

Overfitting to a specific grouping strategy in the wrapper and later modifying this with a transform requires looping over the agents twice for each input and output interaction. Instead, it is possible to do all this in only one loop over the agents. +1 for having a standard function to do this

Also note that users that implement MARL envs are not necessarily forced to have a group map argument and be flexible with respect to that. For example SMACv2 is overfitted to ALL_AGENTS_IN_ONE_GROUP. However, in wrappers provided in the torchrl repo where it makes sense to have this flexibility, it is a nice feature to have.

We can still provide a transform if users want to implement their own env quickly and have not much time for supporting that

matteobettini · 2024-11-27T10:26:42Z

Let me take a concrete example.

This is how VMAS stacks the data coming from the step output into the groups dictated by the group_map

rl/torchrl/envs/libs/vmas.py

Lines 577 to 608 in d537dcb

    
           for group, agent_names in self.group_map.items(): 
        
               agent_tds = [] 
        
               for agent_name in agent_names: 
        
                   i = self.agent_names_to_indices_map[agent_name] 
        
                   agent_obs = self.read_obs(obs[i]) 
        
                   agent_rew = self.read_reward(rews[i]) 
        
                   agent_info = self.read_info(infos[i]) 
        
                   agent_td = TensorDict( 
        
                       source={ 
        
                           "observation": agent_obs, 
        
                           "reward": agent_rew, 
        
                       }, 
        
                       batch_size=self.batch_size, 
        
                       device=self.device, 
        
                   ) 
        
                   if agent_info is not None: 
        
                       agent_td.set("info", agent_info) 
        
                   agent_tds.append(agent_td) 
        
               agent_tds = LazyStackedTensorDict.maybe_dense_stack(agent_tds, dim=1) 
        
               if not self.het_specs_map[group]: 
        
                   agent_tds = agent_tds.to_tensordict() 
        
               source.update({group: agent_tds}) 
        
           tensordict_out = TensorDict( 
        
               source=source, 
        
               batch_size=self.batch_size, 
        
               device=self.device, 
        
           ) 
        
           return tensordict_out

The original vmas interface just outputs lists (e.g., obs[i] is the obs of agent i).

flexible stacking at this stage should be more efficient than creating a tensordict for each individual agent, outputting it, and doing the stacking later with another for loop over the agents

The amout of code you would have in _step if you do the "one td per agent" solution would be the same

kurtamohler added the enhancement New feature or request label Nov 14, 2024

kurtamohler self-assigned this Nov 14, 2024

kurtamohler linked a pull request Nov 14, 2024 that will close this issue

[Feature] Add Stack transform #2567

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Transform that stacks data for agents with identical specs #2566

[Feature Request] Transform that stacks data for agents with identical specs #2566

kurtamohler commented Nov 14, 2024

thomasbbrunner commented Nov 15, 2024 •

edited

Loading

kurtamohler commented Nov 18, 2024 •

edited

Loading

thomasbbrunner commented Nov 18, 2024

thomasbbrunner commented Nov 18, 2024 •

edited by vmoens

Loading

kurtamohler commented Nov 18, 2024

vmoens commented Nov 19, 2024

matteobettini commented Nov 19, 2024 •

edited

Loading

matteobettini commented Nov 19, 2024

kurtamohler commented Nov 26, 2024 •

edited

Loading

kurtamohler commented Nov 26, 2024 •

edited

Loading

matteobettini commented Nov 26, 2024 •

edited

Loading

kurtamohler commented Nov 26, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

[Feature Request] Transform that stacks data for agents with identical specs #2566

[Feature Request] Transform that stacks data for agents with identical specs #2566

Comments

kurtamohler commented Nov 14, 2024

Motivation

Solution

Alternatives

Additional context

Checklist

thomasbbrunner commented Nov 15, 2024 • edited Loading

kurtamohler commented Nov 18, 2024 • edited Loading

thomasbbrunner commented Nov 18, 2024

thomasbbrunner commented Nov 18, 2024 • edited by vmoens Loading

kurtamohler commented Nov 18, 2024

vmoens commented Nov 19, 2024

matteobettini commented Nov 19, 2024 • edited Loading

matteobettini commented Nov 19, 2024

kurtamohler commented Nov 26, 2024 • edited Loading

kurtamohler commented Nov 26, 2024 • edited Loading

matteobettini commented Nov 26, 2024 • edited Loading

Axioms

Your choice when implementing a multi-agent wrapper

The current problem

kurtamohler commented Nov 26, 2024 • edited Loading

matteobettini commented Nov 27, 2024 • edited Loading

matteobettini commented Nov 27, 2024 • edited Loading

matteobettini commented Nov 27, 2024 • edited Loading

thomasbbrunner commented Nov 15, 2024 •

edited

Loading

kurtamohler commented Nov 18, 2024 •

edited

Loading

thomasbbrunner commented Nov 18, 2024 •

edited by vmoens

Loading

matteobettini commented Nov 19, 2024 •

edited

Loading

kurtamohler commented Nov 26, 2024 •

edited

Loading

kurtamohler commented Nov 26, 2024 •

edited

Loading

matteobettini commented Nov 26, 2024 •

edited

Loading

kurtamohler commented Nov 26, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading

matteobettini commented Nov 27, 2024 •

edited

Loading