This project was completed as part of the senior requirement for Yale Computer Science. Spring 2019. More Information
This work is an implementation and exploration of current work in Multiagent Reinforcement Learning (MARL). It is highly recommended that you read the following two papers before diving in.
- Multi-agent reinforcement learning in sequential social dilemmas
- Intrinsic Social Motivation via Causal Influence in Multi-Agent RL
- Switch to your virtual env
pip install -r requirements.txt
python train.py
python test.py ~/ray_results/prison_A3C/[training_instance]/ [checkpoint_num]
Training results are usually saved in your ray_results directory located in the root directory
Pycolab provides the abstraction for creating environments. Although this repository includes three environments, only the PrisonEnvironment has been fully developed and tested.
The PrisonEnvironment instantiates a gridworld variant of the classic Prisoner's Dilemma. At each step of the game, both agents independently choose to move left, move right, or stay still. The left side of the board represents full defection and the right side of the board represents full cooperation. Intermediate positions are a linear combination of the extremes. Rewards are distributed every 10 timesteps of the game. The figure below shows the corresponding rewards for four primary states of the game.
python play.py
allows you to quickly run a manual version of the game. The
script is extremely helpful when debugging the environment alone.
Reinforcement Learning is handled by RLLib. Currently all training is done using the A3C algorithm.
When initializing A3C agents in test.py
, asynchronous changes to the environment mess with the game visualization. One potential solution is to wait until the interactions with the environment have finished before starting the game. This only takes a few seconds. The better solution would be to fix the issue and submit a PR!
-
Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).
-
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A. G., Dunning, I., Zhu, T., McKee, K., Koster, R., Tina Zhu, Roff, H., Graepel, T. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems (pp. 3330-3340).
-
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P. A., Strouse, D. J., Leibo, J. Z. & de Freitas, N. (2018). Intrinsic Social Motivation via Causal Influence in Multi-Agent RL. arXiv preprint arXiv:1810.08647.
-
Credit to Sequential Social Dilemma Games for providing a useful example of RLLib.