Skip to content

Latest commit

 

History

History
255 lines (178 loc) · 10.4 KB

README.md

File metadata and controls

255 lines (178 loc) · 10.4 KB

MARS - Multi-Agent Research Studio

drawing

If life exists on Mars, shall we human cooperate or compete with it?

Table of contents:

Description

If you have any question (propose an ISSUE if it's general problem) or want to contribute to this repository, feel free to contact me: [email protected]

MARS is a comprehensive library for benchmarking multi-player zero-sum Markov games, including our proposed Nash-DQN algorithm, as well as other baselines methods like Self-play, Fictitious Self-play, Neural Fictitious Self-play, Policy Space Response Oracle, etc. An independent implementation of Nash-DQN algorithm is provided in another repo if you wanna have a quick understanding.

Installation

git clone --depth=1 https://github.com/quantumiracle/MARS.git  # depth=1 ensures small size
cd MARS
conda env create -f conda_env_mars.yml
conda activate mars

Usage

Description

MARS is mainly built for solving mult-agent Atari games in PettingZoo, especially competitive (zero-sum) games.

A comprehensive usage document is provided.

Some tutorials are provided for simple MARL concepts, including building an arbitrary matrix game, solving the Nash equilibrium with different algorithms for matrix games, building arbitrary Markov game, solving Markov games, etc.

MARS is still under-development and not prepared to release yet. You may find it hard to clone b.c. the author is testing algorithms with some models hosted on Git.

Support

The EnvSpec = Environment type + '_' + Environment Name as a convention in MARS.

Supported environments are as following:

Environment Type Environment Name
gym all standard envs in OpenAI Gym
pettingzoo 'basketball_pong_v3', 'boxing_v2', 'combat_jet_v1', 'combat_tank_v2', 'double_dunk_v3', 'entombed_competitive_v3', 'entombed_cooperative_v3', 'flag_capture_v2', 'foozpong_v3', 'ice_hockey_v2', 'joust_v3','mario_bros_v3', 'maze_craze_v3', 'othello_v3', 'pong_v3', 'quadrapong_v4', 'space_invaders_v2', 'space_war_v2', 'surround_v2', 'tennis_v3', 'video_checkers_v4', 'volleyball_pong_v2', 'warlords_v3', 'wizard_of_wor_v3'; 'dou_dizhu_v4', 'go_v5', 'leduc_holdem_v4', 'rps_v2', 'texas_holdem_no_limit_v6', 'texas_holdem_v4', 'tictactoe_v3', 'uno_v4'
lasertag 'LaserTag-small2-v0', 'LaserTag-small3-v0', 'LaserTag-small4-v0'
slimevolley 'SlimeVolley-v0', 'SlimeVolleySurvivalNoFrameskip-v0', 'SlimeVolleyNoFrameskip-v0', 'SlimeVolleyPixel-v0'
robosumo 'RoboSumo-Ant-vs-Ant-v0', 'RoboSumo-Ant-vs-Bug-v0', 'RoboSumo-Ant-vs-Spider-v0', 'RoboSumo-Bug-vs-Ant-v0', 'RoboSumo-Bug-vs-Bug-v0', 'RoboSumo-Bug-vs-Spider-v0', 'RoboSumo-Spider-vs-Ant-v0', 'RoboSumo-Spider-vs-Bug-v0','RoboSumo-Spider-vs-Spider-v0'
mdp 'arbitrary_mdp', 'arbitrary_richobs_mdp', 'attack', 'combinatorial_lock'

Supported algorithms are as following:

Method Descriptions
Self-play iterative best response
Fictitious Self-Play iterative best response to the opponent's historical average strategy
Neural Fictitious Self-Play a neural approximation version of FSP
Policy Space Responce Oracle neural version of Double Oracle, which is iterative best response of opponent's meta Nash strategy
Nash Q-learning model free, provable convergence under unique Nash assumption at each stage game
Nash Value Iteration model based, provable efficient convergence with optimistic value estimation (exploration bonus)
Nash DQN neural version of Nash Q-learning or Nash Value Iteration
Nash DQN with Exploiter Nash DQN with asymmetric learning scheme, opponent as an exploiter

Quick Start:

1. Train with MARL algorithm:

Format:

python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**

Example:

# PettingZoo Boxing_v1, neural fictitious self-play
python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id train_0

# PettingZoo Pong_v2, fictitious self-play
python general_train.py --env pettingzoo_pong_v2 --method fictitious_selfplay --save_id train_1

# PettingZoo Surround_v1, policy space response oracle
python general_train.py --env pettingzoo_surround_v1 --method prso --save_id train_3

# SlimeVolley SlimeVolley-v0, self-play
python general_train.py --env slimevolley_SlimeVolley-v0 --method selfplay --save_id train_4

To see all user input arguments:

python general_train.py --help

2. Exploit a trained model:

Format:

python general_exploit.py --env **EnvSpec** --method **Method** --load_id **TrainedModelID** --save_id **WheretoSave** --to_exploit **ExploitWhichPlayer**

Example:

python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id exploit_0 --to_exploit second

More examples are provided in ./examples/ and ./unit_test/. Note that these files need to be put under the root directory (./) to run.

Advanced Usage:

1. Use Wandb for logging training results:

Format: python general_train.py --env **EnvSpec** --method **Method** --save_id **WheretoSave --wandb_activate True --wandb_entity **YourWandbAccountName** --wandb_project **ProjectName**

Example:

python general_train.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0 --wandb_activate True --wandb_entity name --wandb_project pettingzoo_boxing_v1_nfsp

2. Train with MARL algorithm with multiprocess sampling and update:

Format:

python general_launch.py --env **EnvSpec** --method **Method** --save_id **WheretoSave**

Example:

python general_launch.py --env pettingzoo_boxing_v1 --method nfsp --save_id multiprocess_train_0

3. Exploit a trained model (same as above):

Example:

python general_exploit.py --env pettingzoo_boxing_v1 --method nfsp --load_id multiprocess_train_0 --save_id exploit_0 --to_exploit second

4. Test a trained MARL model in single-agent Atari:

This function is for limited environments (like boxing) since not all envs in PettingZoo Atari has a single-agent counterpart in OpenAI Gym.

Example:

python general_test.py --env pettingzoo_boxing_v1 --method nfsp --load_id train_0 --save_id test_0

5. Bash script for server:

Those bash scripts to run multiple tasks on servers are provided in ./server_bash_scripts. For example, to run a training bash script (put it in the root directory):

Example:

./general_train.sh

Development

Basic single-agent RL algorithms (for best response, etc) to do:

  • DQN
  • PPO

MARL algorithms:

Supported environments:

Primary Results

Two agents in SlimeVolley-v0 trained with self-play.

Two agents in Boxing-v1 PettingZoo trained with self-play.

Exploitability tests are also conducted.

License

MARS is distributed under the terms of Apache License (Version 2.0).

See Apache License for details.

Citation

If you find MARS useful, please cite it in your publications.

@article{ding2022deep,
title={A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games},
author={Ding, Zihan and Su, Dijia and Liu, Qinghua and Jin, Chi},
journal={arXiv preprint arXiv:2207.08894},
year={2022}
}
@software{MARS,
  author = {Zihan Ding, Andy Su, Qinghua Liu, Chi Jin},
  title = {MARS},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/quantumiracle/MARS}},
}