Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Transform that stacks data for agents with identical specs #2566

Open
1 task done
kurtamohler opened this issue Nov 14, 2024 · 15 comments · May be fixed by #2567
Open
1 task done

[Feature Request] Transform that stacks data for agents with identical specs #2566

kurtamohler opened this issue Nov 14, 2024 · 15 comments · May be fixed by #2567
Assignees
Labels
enhancement New feature or request

Comments

@kurtamohler
Copy link
Collaborator

Motivation

Some multi-agent environments, like VmasEnv, stack all of the tensors for observations, rewards, etc. for different agents that have identical specs. For instance, in one of these stacked environments, if there are 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        observation: UnboundedContinuous(
            shape=torch.Size([2, 8]),
            ...),
        shape=torch.Size([2])),
    shape=torch.Size([]))

In contrast, other environments, like UnityMLAgentsEnv, have separate keys for each agent, even if the agents' specs are identical. For instance, with 2 agents that each have 8 observations, the observation spec might look like this:

Composite(
    group_0: Composite(
        agent_0: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        agent_1: Composite(
            observation: UnboundedContinuous(
                shape=torch.Size([8]),
                ...),
            shape=torch.Size([])),
        shape=torch.Size([])),
    shape=torch.Size([]))

It is not easy to apply the same training script to two environments that use these two different formats. For instance, applying the multi-agent PPO tutorial to a Unity env is not straightforward.

Solution

If we had an environment transform that could stack all the data from different keys, we could convert an environment that uses the unstacked format into an environment that uses the stacked format. Then it should be straightforward to use the same (or almost the same) training script on the two different environments.

Alternatives

Additional context

Checklist

  • I have checked that there is no similar issue in the repo (required)
@kurtamohler kurtamohler added the enhancement New feature or request label Nov 14, 2024
@kurtamohler kurtamohler self-assigned this Nov 14, 2024
@kurtamohler kurtamohler linked a pull request Nov 14, 2024 that will close this issue
7 tasks
@thomasbbrunner
Copy link
Contributor

thomasbbrunner commented Nov 15, 2024

Have you taken a look at the group_map argument? When set to MarlGroupMapType.ALL_IN_ONE_GROUP the environment should return all agents in a single group (when possible, otherwise in more than one group).

If you are using this setting, then imo there's an issue in the implementation of the grouping of agents in UnityMLAgentsEnv.

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 18, 2024

@thomasbbrunner, the group_map=ALL_IN_ONE_GROUP argument does combine all the agents into one group key, but just to be clear, the specific issue is that the data for each of the agents do not get stacked all into the same set of tensors--each agent still has a separate key containing separate tensors, like the example I showed in the issue description.

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

But I'm happy to change the behavior if there is good reason to do so.

@vmoens, wdyt?

@thomasbbrunner
Copy link
Contributor

Ah, I see! Thanks for the explanation. It seems that the UnityMLAgentsEnv has a different behavior than the other multi-agent environments (my experience is mostly based on PettingZoo here).

PettingZooEnv has three possible grouping types:

  1. ONE_GROUP_PER_AGENT (self explanatory)
  2. None (default): will group as many agents as possible, while keeping the groups homogeneous (i.e., all agents in the group have the same obs and action spec).
  3. ALL_IN_ONE_GROUP: (self explanatory) + if the agents cannot be grouped together (due to mismatched specs), it will return some weird spec and any environment operation will lead to an error.

Personally, I don't see the benefit of grouping heterogeneous agents. Maybe that makes sense in your use-case, but I'd argue that there is no difference between having one group for each agent and a group containing sub-groups of heterogenous agents.

I quite like the default behavior of PettingZooEnv (grouping where possible) and I don't understand why there isn't a GROUP_HOMOGENEOUS (or similar) in MarlGroupMapType.

Would be interested in hearing more about your use-case!

@thomasbbrunner
Copy link
Contributor

thomasbbrunner commented Nov 18, 2024

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

cc @matteobettini

@kurtamohler
Copy link
Collaborator Author

I agree that we should make environments consistent with each other when possible. At the time, I didn't think that putting the Unity env agents under separate keys would be a significant inconsistency with other TorchRL envs (at least compared to VMAS, PettingZoo, and OpenSpiel)--it just seemed like the right choice because it's more consistent with the underlying ml_agents library. But I see that it probably would be better to stack the data of agents that share the same group, by default.

@vmoens, do you agree with this?

I don't understand why there isn't a GROUP_HOMOGENEOUS

That sounds like a good idea to me. Feel free to submit an issue

Personally, I don't see the benefit of grouping heterogeneous agents.

I don't either, but then again, I'm fairly new here. Maybe I took ALL_IN_ONE_GROUP too literally

@vmoens
Copy link
Contributor

vmoens commented Nov 19, 2024

It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that!

I'm sorry this is the impression you have, @matteobettini gave a great deal of effort in unifying MARL APIs, but if there are inconsistencies we should address them!

One of the reasons why UnityMLAgentsEnv does not stack the tensors for all the agents is that each of the agents in a Unity env are free to have different specs--for instance, the strikers and goalies have different specs in the "StrikersVsGoalies" example environment. In cases like that, it's not possible to stack all of the agents' data together. So when I was writing the wrapper, it seemed natural to keep the agents under separate agent keys, rather than try to automatically figure out which agents' specs match each other and can be stacked together. Furthermore, it seemed best to organize the data in a way that is consistent with how the ml_agents library itself does it.

I 100% agree on this.
My take when writing wrappers is usually that I want the wrapper to have the minimal interference with the underlying API – the only thing we want is to present the data in a tensordict-friendly manner: ie, convert to tensors, make sure that the data is on the desired device, minimize the cost of operations etc. Once that is done, we leave it to the user to deal with the data.
The reason we want to do this that any change we make in the way the data is presented to the user will create edge cases that we may not know of.

Now RE the "consistentcy" problem highlighted by @thomasbbrunner, as I said earlier we should make sure that things can be made uniform at a low cost.
My vision here would be for TorchRL to deal itself with the various grouping strategies with its own API, possibly inspired by pettingzoo if that is a convention in the community.
The API would consist in appending a transform that re-arranges the tensordict given some flag, for instance you could create your Unity env in its wrapper and pass a transform

env = UnityMLAgentsEnv(...)
transform = GroupMARLAgents(MARLGroups.ONE_GROUP_PER_AGENT)
env = env.append_transform(transform)

and then, internally, we make sure that the transform does its job for every MARL library we support.

Would that make sense?

nb: Using metaclasses, we could even pass directly a group argument in each wrapper that automatically append the transform if required.

@matteobettini
Copy link
Contributor

matteobettini commented Nov 19, 2024

Hey guys so I think there might have been some general confusion about the multi-agent environment API here.

If you can, whatch this section of this video to understand how it works https://youtu.be/1tOIMgJf_VQ?si=1RJ7PGD3s5--hI2o&t=1235

Here is a recap:

The choice of which agents should be stacked together and which kept separate (what we call grouping) has to be 100% up to the user.

That is why multi-agent envs should take a group map that maps groups to agents in that group.
This is a dictionary passed by the user.

The MarlGroupMapType are just some sensible defaults for this value. As you know, these right now are either all in one group or one group per agent.

Then environments have a default behaviour for building the map when nothing is passed.
The default behavior that makes the most sense to me (and it is the one used in vmas and pettingzoo) is to group together agents based on their name. Basically if you have "evader_1", "evader_2" "attacker_1" it would make 2 groups named "attacker" and "evader".

This to me was better than GROUP_HOMOGENEOUS because it is difficult to define a sensible and consistent bhavior for that. But I am open to discuss this.

The choice of the grouping needs to be passed at environment construction. If MLAgents is overfitted to one specific choice I believe we should extend it. Modifying the grouping in a transform later is very inefficient and I think should be avoided

MLAgents also has a concept of an agent type or something similar which I remember would be great for the default grouping strategy

@matteobettini
Copy link
Contributor

Maybe i am missing what the group_map input to the MLAgents constructor does. Cause that is what I would use to change the data grouping and address this issue

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 26, 2024

I think I understand what you're saying, @matteobettini. I think two main points to summarize are:

  1. By default, UnityMLAgentsEnv should not stack agent data, and the structure of the specs should reflect the group IDs assigned by MLAgents. (At least Vincent and I seem to agree on this)
  2. All agents that are in the same MARL group should always be stacked together.

An important distinction is that the MLAgents group IDs are not the same thing as MARL group keys.

At the moment, UnityMLAgentsEnv honors point 1, but not point 2. So how do we honor both points?

Consider an example unity env where there are two MLAGents group IDs, which each have 3 agents.

Right now, the default group map given by UnityMLAgentsEnv would be this:

{
    'group_0': ['agent_0', 'agent_1', 'agent_2'],
    'group_1': ['agent_3', 'agent_4', 'agent_5'],
}

In order to honor both points 1 and 2, we would need to change UnityMLAgentsEnv's default group map to:

{
    ('group_0', 'agent_0'): ['agent_0'],
    ('group_0', 'agent_1'): ['agent_1'],
    ('group_0', 'agent_2'): ['agent_2'],
    ('group_1', 'agent_3'): ['agent_3'],
    ('group_1', 'agent_4'): ['agent_4'],
    ('group_1', 'agent_5'): ['agent_5'],
}

Of course, to allow the user to specify a different group map where some agents are in the same MARL group, UnityMLAgentsEnv would need to be updated to know how to stack/unstack the data coming out of and going into MLAgents.

However, a completely different alternative to all of the above could be to just remove the group_map arg from UnityMLAgentsEnv, and then if the user wants to stack agents, they can use the Stack transform. One benefit is that UnityMLAgentsEnv wouldn't have the burden of implementing the logic for stacking/unstacking MARL groupings.

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 26, 2024

Modifying the grouping in a transform later is very inefficient and I think should be avoided

Of course, you're right that creating a MARL grouping and then modifying it later with a transform would be inefficient. But what if the env wrappers didn't create any MARL groupings, and instead we always use a transform, like the GroupMARLAgents transform that Vincent suggested?

If I understand correctly, the stacked tensordict for each MARL group is typically created during each EnvBase.step/reset call by assembling a list of the agents' tensordicts and then stacking them. If instead the env wrappers just put all the agents' tensordicts into the output tensordict under different keys and then a transform stacks them, I would think that the performance is pretty similar. There will be some overhead from accessing each agent from the tensordict given to the transform, but I wouldn't expect it to be much, since accesses to dicts are typically meant to give logarithmic runtime, though I'm not specifically familiar with how efficient TensorDict accesses are.

A further point is that if we used a transform for all MARL groupings, rather than requiring the env wrappers to implement the MARL grouping logic, then the implementation of env wrappers would become a bit simpler and less prone to potentially inefficient or bugged implementations of the MARL grouping logic. It seems like it would be nice to have the MARL grouping logic implemented in one common place.

@matteobettini
Copy link
Contributor

matteobettini commented Nov 26, 2024

Thanks for the answer.

I'll leave the discussion of the transform aside for now and focus on the env.

Yes point 2 makes perfect sense!

By default, UnityMLAgentsEnv should not stack agent data, and the structure of the specs should reflect the group IDs assigned by MLAgents. (At least Vincent and I seem to agree on this)

This is what is causing the problem in my opinion. It is creating a discrepancy with the other MARL environments.

We are having a meeting to discuss this tomorrow but for now let me just try to explain the MARL API a bit better.

Axioms

Let me list some axioms to begin with, these should always be true, if you don't agree with these then you can stop me here.

  1. The structure of the specs and the input/output tensordicts should always match. Therefore the same grouping should be applied to both.
  2. Such structure is dictated purely and solely by the TorchRL group_map. Values in the same group are stacked, different groups are separate entries
  3. When a key is shared across groups, it has to be located in the root of the tensordict without any group dimension

Your choice when implementing a multi-agent wrapper

If you follow the above axioms you relise that you do not have much freedom as the implementer of a task. You have to support any possible group_map given by the user.

What you have choice in is some sensible defaults for group_map.
This is where the discussion of MLAgents comes in.

Since MLAgents has this nice internal concept of group IDs, a perfect default value for torchrl group_map is to match the internal one from MLAgents.

Users can also request other groupings (like ONE_GROUP_PER_AGENT which seems to be the only one provided now). In this case the torchrl group_map won't align with the MLAgents group map but this is fine! What is important is that on the torchrl side of things both the specs and the data always follow the same torchrl group_map (given at construction)

If the user requests an unfeasible group_map, for example stacking heterogeneous agents, this will be provided on a best-effort basis, resulting in errors if it is not possible.

The current problem

My impression is that the current implementation of MLAgent violates axioms 1 and 2

@kurtamohler
Copy link
Collaborator Author

kurtamohler commented Nov 26, 2024

Those axioms make sense.

My impression is that the current implementation of MLAgent violates axioms 1 and 2

Isn't only axiom 2 violated? The specs and tensordicts currently do match for the MLAgents wrapper. That is, unless you've seen a bugged case where that's not true. As far as I understand, the only thing that MLAgents violates is the part of axiom 2 that says the specs and tensordicts in a group must be stacked.

Is my solution here insufficient in some way?

@matteobettini
Copy link
Contributor

matteobettini commented Nov 27, 2024

Isn't only axiom 2 violated? The specs and tensordicts currently do match for the MLAgents wrapper. That is, unless you've seen a bugged case where that's not true. As far as I understand, the only thing that MLAgents violates is the part of axiom 2 that says the specs and tensordicts in a group must be stacked.

Nice! I think I had understood wrongly

Is my solution #2566 (comment) insufficient in some way?

I think it makes sense but I am not sure I understood it fully. I think it is because I have not clear what the group_map arg currently does.

Two points about that solution that I am still wondering about:

  • isn't group_map == mlagents_group_map the best default value? Why are we having one group per agent as default?
  • The names of the groups cannot be tuples (this is just a nit)

It is totally fine to remove completely the group_map arg from MLagents, but I personally think it is a nice feature to have and it aligns with the many envs already present.

There is also a point about the transform adding an overhead which we can talk about today in our meeting

@matteobettini
Copy link
Contributor

matteobettini commented Nov 27, 2024

A further point is that if we used a transform for all MARL groupings, rather than requiring the env wrappers to implement the MARL grouping logic, then the implementation of env wrappers would become a bit simpler and less prone to potentially inefficient or bugged implementations of the MARL grouping logic. It seems like it would be nice to have the MARL grouping logic implemented in one common place.

I agree with this but I think it could be done in a utility function (which many MARL envs can use).

Since all MARL envs already provide this functionality and it is now part of the MARL API, I don't see why removing it. Instead we could consider making a util function to help in providing it.

Overfitting to a specific grouping strategy in the wrapper and later modifying this with a transform requires looping over the agents twice for each input and output interaction. Instead, it is possible to do all this in only one loop over the agents. +1 for having a standard function to do this

Also note that users that implement MARL envs are not necessarily forced to have a group map argument and be flexible with respect to that. For example SMACv2 is overfitted to ALL_AGENTS_IN_ONE_GROUP. However, in wrappers provided in the torchrl repo where it makes sense to have this flexibility, it is a nice feature to have.

We can still provide a transform if users want to implement their own env quickly and have not much time for supporting that

@matteobettini
Copy link
Contributor

matteobettini commented Nov 27, 2024

Let me take a concrete example.

This is how VMAS stacks the data coming from the step output into the groups dictated by the group_map

for group, agent_names in self.group_map.items():
agent_tds = []
for agent_name in agent_names:
i = self.agent_names_to_indices_map[agent_name]
agent_obs = self.read_obs(obs[i])
agent_rew = self.read_reward(rews[i])
agent_info = self.read_info(infos[i])
agent_td = TensorDict(
source={
"observation": agent_obs,
"reward": agent_rew,
},
batch_size=self.batch_size,
device=self.device,
)
if agent_info is not None:
agent_td.set("info", agent_info)
agent_tds.append(agent_td)
agent_tds = LazyStackedTensorDict.maybe_dense_stack(agent_tds, dim=1)
if not self.het_specs_map[group]:
agent_tds = agent_tds.to_tensordict()
source.update({group: agent_tds})
tensordict_out = TensorDict(
source=source,
batch_size=self.batch_size,
device=self.device,
)
return tensordict_out

The original vmas interface just outputs lists (e.g., obs[i] is the obs of agent i).

flexible stacking at this stage should be more efficient than creating a tensordict for each individual agent, outputting it, and doing the stacking later with another for loop over the agents

The amout of code you would have in _step if you do the "one td per agent" solution would be the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants