This is the source code for my master's dissertation project Multi-Agent Cooperation in Hanabi with Policy Optimisation. As improvements have been made after the dissertation, please see tag:final-report for the version of used in my dissertation final report.
The implementation is based on Proximal Policy Optimisation and Hanabi Learning Environment.
- Python 3.7+ (because I can't survive without f-strings)
- PyTorch (currently CPU only) (someone buy me a NVIDIA laptop plz?)
- Hanabi Learning Environment
configs/
: JSON configuration files for agent training. Some are outdated, use for reference only.figures/
: figures for this README.ppo/
: main source files.scripts/
: plotting scripts. Very volatile, use at own risk (I detest matplotlib).
Hanabi-Small is a smaller version of Hanabi with a maximum score of 10. Currently we can train an agent on Hanabi-Small in under 8 hours on an 8-core CPU machine.
Hanabi-Full is the full version with a maximum score of 25. We are still tuning the hyperparameters for Hanabi-Full but this is our current results.
We can also do ad hoc evaluation using trained agents.