Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

This repository contains the code for the MIR3 defense method as well as other baseline methods, such as MADDPG, M3DDPG, ROM-Q, ERNIE and ROMAX. It also includes implementations of different Multi-Agent Reinforcement Learning (MARL) environments on which we evaluate our results, such as SMAC and Multi-agent rendezvous.

Supported Algorithms

MARL training algorithms:
- MADDPG
MARL defense algorithms:
- M3DDPG
- ROM-Q
- ERNIE
- ROMAX
- MIR3
MARL attack algorithms:
- Adversarial Policy

Supported Environments

SMAC
Multi-agent rendezvous

How to run the code

Choose the Training Algorithm and Environment

The code uses different parameters for different algorithms and environments. The default parameters are located in ./configs. Algorithm parameters are stored in YAML files named {algorithm}_{env}.yaml within the ./configs/alg directory:

{algorithm}:
- MADDPG: maddpg
- M3DDPG: m3ddpg
- ROM-Q: romq
- ERNIE: ernie
- ROMAX: romax
- MIR3: mir3
- Adversarial policy: maddpg_traitor_mi
{env}:
- SMAC: smac
- Multi-agent rendezvous: robot

For instance, parameters for MADDPG on SMAC environments are found in ./configs/alg/maddpg_smac.yaml.

Environment parameters are saved in YAML files named {env}.yaml within the ./configs/env directory:

{env}:
- SMAC for training: sc2
- SMAC for attack: sc2_traitor
- Multi-agent rendezvous: robot

For instance, parameters for SMAC environments during the attack phase are in ./configs/env/sc2_traitor.yaml.

Train the Agents and Save the models

To train the agents, use the following command for example:

python -u main.py --alg mir3_smac --env sc2 --token train --map 4m_vs_3m --param_club 0.1 --seed 0

--alg: the specified config file of the algorithm, which means that we use the default parameters of MIR3 in ./configs/algs/{alg}.yaml.
--env: the specified config file of the environment, which means that we use the default parameters of sc2 for training in ./configs/envs/{env}.yaml.
--map: the specified map name through the parameter --map. If the string specified in --map is none, the map name in the specified config file of the environment is used instead.
--param_club: the specified hyperparameter in penalizing mutual information. The default parameter of --param_club is saved in ./configs/algs/mir3_smac.yaml.
--token: the specified experiment name. The default parameter of --token is saved in ./configs/default.yaml.
--seed: the specified seed. The default parameter of --seed is saved in ./configs/default.yaml.

The models and the training datas are saved in the directories like:

models: ./results/{env}/{map}/none/{learner}/{token}/{seed}/models/{timestep}/
datas: ./results/{env}/{map}/none/{learner}/{token}/{seed}/logs/

{env}:
- SMAC: sc2
- Multi-agent rendezvous: robot
learner:
- Algorithm MADDPG: maddpg
- Algorithm M3DDPG: m3ddpg
- Algorithm ROM-Q: romq
- Algorithm ERNIE: ernie
- Algorithm ROMAX: romax
- Algorithm MIR3: mir3
{token}, {seed}, {step}: The parameters mentioned before.
{timestep}：Models for different periods. Modify the interval in the algorithm config file.

Attack the Models and Save the Adversarial Policy

To attack the models, use the following command for example:

python -u main.py --alg maddpg_traitor_mi_smac --env sc2 --token attack --map 4m_vs_3m --seed 0 --victim_checkpoint ./results/sc2/4m_vs_3m/none/mir3/train/0/models/5000000/

--alg --env --token --map --seed: The parameters mentioned before.
--victim_checkpoint: Directory of the model to attack.

Adversarial agent IDs can be modified in the algorithm config file(./configs/alg/maddpg_traitor_mi_smac.yaml)

The models and attack datas are saved in the directories like:

models: ./results/{env}/{map}/traitor_ca_mi/maddpg_ca_mi/{token}/{seed}/models/{timestep}/
datas: ./results/{env}/{map}/traitor_ca_mi/maddpg_ca_mi/{token}/{seed}/logs/

{env}, {map}, {token}, {seed}, {timestep}: The parameters mentioned before.

Demo Videos

We record the behaviors of the agents under the attack in the videos. These videos showcase our methods alongside the baseline methods in the 4m vs 3m and 9m vs 8m scenario of the SMAC MARL environment, the rendezvous environment and the real robot environment, as illustrated in the table below.You can see this videos in the video file(./video), or just click on the corresponding GIF image.

Training algorithm	SMAC 4m vs 3m	SMAC 9m vs 8m	rendezvous	real robot
MADDPG
M3DDPG
ROM-Q
ERNIE
ROMAX
MIR3

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
configs		configs
data		data
envs		envs
learners		learners
runners/offpolicy		runners/offpolicy
video		video
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Supported Algorithms

Supported Environments

How to run the code

Choose the Training Algorithm and Environment

Train the Agents and Save the models

Attack the Models and Save the Adversarial Policy

Demo Videos

About

Releases

Packages

Languages

DIG-Beihang/MIR3

Folders and files

Latest commit

History

Repository files navigation

Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Supported Algorithms

Supported Environments

How to run the code

Choose the Training Algorithm and Environment

Train the Agents and Save the models

Attack the Models and Save the Adversarial Policy

Demo Videos

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages