-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Transform that stacks data for agents with identical specs #2566
Comments
Have you taken a look at the If you are using this setting, then imo there's an issue in the implementation of the grouping of agents in |
@thomasbbrunner, the One of the reasons why But I'm happy to change the behavior if there is good reason to do so. @vmoens, wdyt? |
Ah, I see! Thanks for the explanation. It seems that the
Personally, I don't see the benefit of grouping heterogeneous agents. Maybe that makes sense in your use-case, but I'd argue that there is no difference between having one group for each agent and a group containing sub-groups of heterogenous agents. I quite like the default behavior of Would be interested in hearing more about your use-case! |
It would be nice if the behavior of all TorchRL environments would be consistent with each other. It sometimes feels like multi-agent is not part of the "core" TorchRL capabilities and I'm hoping we can change that! |
I agree that we should make environments consistent with each other when possible. At the time, I didn't think that putting the Unity env agents under separate keys would be a significant inconsistency with other TorchRL envs (at least compared to VMAS, PettingZoo, and OpenSpiel)--it just seemed like the right choice because it's more consistent with the underlying @vmoens, do you agree with this?
That sounds like a good idea to me. Feel free to submit an issue
I don't either, but then again, I'm fairly new here. Maybe I took |
I'm sorry this is the impression you have, @matteobettini gave a great deal of effort in unifying MARL APIs, but if there are inconsistencies we should address them!
I 100% agree on this. Now RE the "consistentcy" problem highlighted by @thomasbbrunner, as I said earlier we should make sure that things can be made uniform at a low cost. env = UnityMLAgentsEnv(...)
transform = GroupMARLAgents(MARLGroups.ONE_GROUP_PER_AGENT)
env = env.append_transform(transform) and then, internally, we make sure that the transform does its job for every MARL library we support. Would that make sense? nb: Using metaclasses, we could even pass directly a group argument in each wrapper that automatically append the transform if required. |
Hey guys so I think there might have been some general confusion about the multi-agent environment API here. If you can, whatch this section of this video to understand how it works https://youtu.be/1tOIMgJf_VQ?si=1RJ7PGD3s5--hI2o&t=1235 Here is a recap: The choice of which agents should be stacked together and which kept separate (what we call grouping) has to be 100% up to the user. That is why multi-agent envs should take a group map that maps groups to agents in that group. The Then environments have a default behaviour for building the map when nothing is passed. This to me was better than GROUP_HOMOGENEOUS because it is difficult to define a sensible and consistent bhavior for that. But I am open to discuss this. The choice of the grouping needs to be passed at environment construction. If MLAgents is overfitted to one specific choice I believe we should extend it. Modifying the grouping in a transform later is very inefficient and I think should be avoided MLAgents also has a concept of an agent type or something similar which I remember would be great for the default grouping strategy |
Maybe i am missing what the |
I think I understand what you're saying, @matteobettini. I think two main points to summarize are:
An important distinction is that the MLAgents group IDs are not the same thing as MARL group keys. At the moment, UnityMLAgentsEnv honors point 1, but not point 2. So how do we honor both points? Consider an example unity env where there are two MLAGents group IDs, which each have 3 agents. Right now, the default group map given by UnityMLAgentsEnv would be this:
In order to honor both points 1 and 2, we would need to change UnityMLAgentsEnv's default group map to:
Of course, to allow the user to specify a different group map where some agents are in the same MARL group, UnityMLAgentsEnv would need to be updated to know how to stack/unstack the data coming out of and going into MLAgents. However, a completely different alternative to all of the above could be to just remove the |
Of course, you're right that creating a MARL grouping and then modifying it later with a transform would be inefficient. But what if the env wrappers didn't create any MARL groupings, and instead we always use a transform, like the GroupMARLAgents transform that Vincent suggested? If I understand correctly, the stacked tensordict for each MARL group is typically created during each A further point is that if we used a transform for all MARL groupings, rather than requiring the env wrappers to implement the MARL grouping logic, then the implementation of env wrappers would become a bit simpler and less prone to potentially inefficient or bugged implementations of the MARL grouping logic. It seems like it would be nice to have the MARL grouping logic implemented in one common place. |
Thanks for the answer. I'll leave the discussion of the transform aside for now and focus on the env. Yes point 2 makes perfect sense!
This is what is causing the problem in my opinion. It is creating a discrepancy with the other MARL environments. We are having a meeting to discuss this tomorrow but for now let me just try to explain the MARL API a bit better. AxiomsLet me list some axioms to begin with, these should always be true, if you don't agree with these then you can stop me here.
Your choice when implementing a multi-agent wrapperIf you follow the above axioms you relise that you do not have much freedom as the implementer of a task. You have to support any possible What you have choice in is some sensible defaults for Since MLAgents has this nice internal concept of group IDs, a perfect default value for torchrl Users can also request other groupings (like ONE_GROUP_PER_AGENT which seems to be the only one provided now). In this case the torchrl If the user requests an unfeasible The current problemMy impression is that the current implementation of MLAgent violates axioms 1 and 2 |
Those axioms make sense.
Isn't only axiom 2 violated? The specs and tensordicts currently do match for the MLAgents wrapper. That is, unless you've seen a bugged case where that's not true. As far as I understand, the only thing that MLAgents violates is the part of axiom 2 that says the specs and tensordicts in a group must be stacked. Is my solution here insufficient in some way? |
Nice! I think I had understood wrongly
I think it makes sense but I am not sure I understood it fully. I think it is because I have not clear what the Two points about that solution that I am still wondering about:
It is totally fine to remove completely the group_map arg from MLagents, but I personally think it is a nice feature to have and it aligns with the many envs already present. There is also a point about the transform adding an overhead which we can talk about today in our meeting |
I agree with this but I think it could be done in a utility function (which many MARL envs can use). Since all MARL envs already provide this functionality and it is now part of the MARL API, I don't see why removing it. Instead we could consider making a util function to help in providing it. Overfitting to a specific grouping strategy in the wrapper and later modifying this with a transform requires looping over the agents twice for each input and output interaction. Instead, it is possible to do all this in only one loop over the agents. +1 for having a standard function to do this Also note that users that implement MARL envs are not necessarily forced to have a group map argument and be flexible with respect to that. For example SMACv2 is overfitted to ALL_AGENTS_IN_ONE_GROUP. However, in wrappers provided in the torchrl repo where it makes sense to have this flexibility, it is a nice feature to have. We can still provide a transform if users want to implement their own env quickly and have not much time for supporting that |
Let me take a concrete example. This is how VMAS stacks the data coming from the step output into the groups dictated by the Lines 577 to 608 in d537dcb
The original vmas interface just outputs lists (e.g., flexible stacking at this stage should be more efficient than creating a tensordict for each individual agent, outputting it, and doing the stacking later with another for loop over the agents The amout of code you would have in |
Motivation
Some multi-agent environments, like
VmasEnv
, stack all of the tensors for observations, rewards, etc. for different agents that have identical specs. For instance, in one of these stacked environments, if there are 2 agents that each have 8 observations, the observation spec might look like this:In contrast, other environments, like
UnityMLAgentsEnv
, have separate keys for each agent, even if the agents' specs are identical. For instance, with 2 agents that each have 8 observations, the observation spec might look like this:It is not easy to apply the same training script to two environments that use these two different formats. For instance, applying the multi-agent PPO tutorial to a Unity env is not straightforward.
Solution
If we had an environment transform that could stack all the data from different keys, we could convert an environment that uses the unstacked format into an environment that uses the stacked format. Then it should be straightforward to use the same (or almost the same) training script on the two different environments.
Alternatives
Additional context
Checklist
The text was updated successfully, but these errors were encountered: