If life exists on Mars, shall we human cooperate or compete with it?
Table of contents:
If you have any question (propose an ISSUE if it's general problem) or want to contribute to this repository, feel free to contact me: [email protected]
MARS is a comprehensive library for benchmarking multi-player zero-sum Markov games, including our proposed Nash-DQN algorithm, as well as other baselines methods like Self-play, Fictitious Self-play, Neural Fictitious Self-play, Policy Space Response Oracle, etc. An independent implementation of Nash-DQN algorithm is provided in another repo if you wanna have a quick understanding.
git clone --depth=1 https://github.com/quantumiracle/MARS.git # depth=1 ensures small size
cd MARS
conda env create -f conda_env_mars.yml
conda activate mars
MARS is mainly built for solving mult-agent Atari games in PettingZoo, especially competitive (zero-sum) games.
A comprehensive usage document is provided.
Some tutorials are provided for simple MARL concepts, including building an arbitrary matrix game, solving the Nash equilibrium with different algorithms for matrix games, building arbitrary Markov game, solving Markov games, etc.
MARS is still under-development and not prepared to release yet. You may find it hard to clone b.c. the author is testing algorithms with some models hosted on Git.
The EnvSpec = Environment type + '_' + Environment Name
as a convention in MARS.
Supported environments are as following:
Environment Type | Environment Name |
---|---|
gym |
all standard envs in OpenAI Gym |
pettingzoo |
'basketball_pong_v3', 'boxing_v2', 'combat_jet_v1', 'combat_tank_v2', 'double_dunk_v3', 'entombed_competitive_v3', 'entombed_cooperative_v3', 'flag_capture_v2', 'foozpong_v3', 'ice_hockey_v2', 'joust_v3','mario_bros_v3', 'maze_craze_v3', 'othello_v3', 'pong_v3', 'quadrapong_v4', 'space_invaders_v2', 'space_war_v2', 'surround_v2', 'tennis_v3', 'video_checkers_v4', 'volleyball_pong_v2', 'warlords_v3', 'wizard_of_wor_v3'; 'dou_dizhu_v4', 'go_v5', 'leduc_holdem_v4', 'rps_v2', 'texas_holdem_no_limit_v6', 'texas_holdem_v4', 'tictactoe_v3', 'uno_v4' |
lasertag |
'LaserTag-small2-v0', 'LaserTag-small3-v0', 'LaserTag-small4-v0' |
slimevolley |
'SlimeVolley-v0', 'SlimeVolleySurvivalNoFrameskip-v0', 'SlimeVolleyNoFrameskip-v0', 'SlimeVolleyPixel-v0' |
robosumo |
'RoboSumo-Ant-vs-Ant-v0', 'RoboSumo-Ant-vs-Bug-v0', 'RoboSumo-Ant-vs-Spider-v0', 'RoboSumo-Bug-vs-Ant-v0', 'RoboSumo-Bug-vs-Bug-v0', 'RoboSumo-Bug-vs-Spider-v0', 'RoboSumo-Spider-vs-Ant-v0', 'RoboSumo-Spider-vs-Bug-v0','RoboSumo-Spider-vs-Spider-v0' |
mdp |
'arbitrary_mdp', 'arbitrary_richobs_mdp', 'attack', 'combinatorial_lock' |
Supported algorithms are as following:
Method | Descriptions |
---|---|
Self-play | iterative best response |
Fictitious Self-Play | iterative best response to the opponent's historical average strategy |
Neural Fictitious Self-Play | a neural approximation version of FSP |
Policy Space Responce Oracle | neural version of Double Oracle, which is iterative best response of opponent's meta Nash strategy |
Nash Q-learning | model free, provable convergence under unique Nash assumption at each stage game |
Nash Value Iteration | model based, provable efficient convergence with optimistic value estimation (exploration bonus) |
Nash DQN | neural version of Nash Q-learning or Nash Value Iteration |
Nash DQN with Exploiter | Nash DQN with asymmetric learning scheme, opponent as an exploiter |
1. Train with MARL algorithm:
Format:
python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**
Example:
# PettingZoo Boxing_v1, neural fictitious self-play
python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id train_0
# PettingZoo Pong_v2, fictitious self-play
python general_train.py --env pettingzoo_pong_v2 --method fictitious_selfplay --save_id train_1
# PettingZoo Surround_v1, policy space response oracle
python general_train.py --env pettingzoo_surround_v1 --method prso --save_id train_3
# SlimeVolley SlimeVolley-v0, self-play
python general_train.py --env slimevolley_SlimeVolley-v0 --method selfplay --save_id train_4
To see all user input arguments:
python general_train.py --help
2. Exploit a trained model:
Format:
python general_exploit.py --env **EnvSpec** --method **Method** --load_id **TrainedModelID** --save_id **WheretoSave** --to_exploit **ExploitWhichPlayer**
Example:
python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id exploit_0 --to_exploit second
More examples are provided in ./examples/
and ./unit_test/
. Note that these files need to be put under the root directory (./
) to run.
1. Use Wandb for logging training results:
Format:
python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave --wandb_activate True --wandb_entity **YourWandbAccountName** --wandb_project **ProjectName**
Example:
python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0 --wandb_activate True --wandb_entity name --wandb_project pettingzoo_boxing_v1_nfsp
2. Train with MARL algorithm with multiprocess sampling and update:
Format:
python general_launch.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**
Example:
python general_launch.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0
3. Exploit a trained model (same as above):
Example:
python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id multiprocess_train_0 --save_id exploit_0 --to_exploit second
4. Test a trained MARL model in single-agent Atari:
This function is for limited environments (like boxing) since not all envs in PettingZoo Atari has a single-agent counterpart in OpenAI Gym.
Example:
python general_test.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id test_0
5. Bash script for server:
Those bash scripts to run multiple tasks on servers are provided in ./server_bash_scripts
. For example, to run a training bash script (put it in the root directory):
Example:
./general_train.sh
Basic single-agent RL algorithms (for best response, etc) to do:
- DQN
- PPO
MARL algorithms:
- Self-Play
- Fictitious Self-Play
- Neural Fictitious Self-Play
- Policy Space Responce Oracle
- Nash-DQN
- Nash-DQN-Exploiter
Supported environments:
- Openai Gym
- PettingZoo
- LaserTag
- SlimeVolley
- Robosumo (requiring gym==0.16)
- Matrix Markov Game
Two agents in SlimeVolley-v0 trained with self-play.
Two agents in Boxing-v1 PettingZoo trained with self-play.
Exploitability tests are also conducted.
MARS is distributed under the terms of Apache License (Version 2.0).
See Apache License for details.
If you find MARS useful, please cite it in your publications.
@article{ding2022deep,
title={A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games},
author={Ding, Zihan and Su, Dijia and Liu, Qinghua and Jin, Chi},
journal={arXiv preprint arXiv:2207.08894},
year={2022}
}
@software{MARS,
author = {Zihan Ding, Andy Su, Qinghua Liu, Chi Jin},
title = {MARS},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/quantumiracle/MARS}},
}