Implementation of Deep Reinforcement Learning algorithms in PyTorch, with support for distributed data collection and data-parallel training.
(In progress...)
This repo contains flexible implementations of several deep reinforcement learning algorithms. It supports the algorithms, architectures, and rewards from several papers, including:
- Mnih et al., 2015 - 'Human Level Control through Deep Reinforcement Learning'
- van Hasselt et al., 2015 - 'Deep Reinforcement Learning with Double Q-learning'
- Wang et al., 2015 - 'Dueling Network Architectures for Deep Reinforcement Learning'
- Schaul et al., 2015 - 'Prioritized Experience Replay'
- Schulman et al., 2017 - 'Proximal Policy Optimization Algorithms'
- Burda et al., 2018 - 'Exploration by Random Network Distillation'
Additionally, support is planned for:
- Munos et al., 2016 - 'Safe and Efficient Off-Policy Reinforcement Learning'
- Pathak et al., 2017 - 'Curiosity-driven Exploration by Self-supervised Prediction'
- Haarnoja et al., 2018 - 'Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor'
- Horgan et al., 2018 - 'Distributed Prioritized Experience Replay'
- Kapturowski et al., 2018 - 'Recurrent Experience Replay in Distributed Reinforcement Learning'
- Badia et al., 2020 - 'Never Give Up: Learning Directed Exploration Strategies'
Install the following system dependencies:
sudo apt-get update
sudo apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake libjpeg-dev zlib1g zlib1g-dev swig python3-dev
Installation of the system packages on Mac requires Homebrew. With Homebrew installed, run the following:
brew update
brew install cmake
We recommend creating a conda environment for this project. You can download the miniconda package manager from https://docs.conda.io/en/latest/miniconda.html. Then you can set up the new conda environment as follows:
conda create --name pytorch_drl python=3.9.2
conda activate pytorch_drl
git clone https://github.com/lucaslingle/pytorch_drl
cd pytorch_drl
pip install -e .
This repo comes in two parts: a python package and a script.
Support for readthedocs integration will be added in the future.
To generate the documentation as HTML locally, you can follow the instructions here.
To use the script correctly, you can refer to the script usage docs.
A formal description of how the configuration files are structured can be found in the config usage docs.
Some example config files for different algorithms are also provided in the subdirectories of models_dir
.
In this section, we describe the algorithms whose published results we've replicated using our codebase.
Game | OpenAI Baselines | Schulman et al., 2017 | Ours |
---|---|---|---|
Beamrider | 1299.3 | 1590.0 | 3406.5 |
Breakout | 114.3 | 274.8 | 424.4 |
Enduro | 350.2 | 758.3 | 749.4 |
Pong | 13.7 | 20.7 | 19.8 |
Qbert | 7012.1 | 14293.3 | 16600.8 |
Seaquest | 1218.9 | 1204.5 | 947.3 |
Space Invaders | 557.3 | 942.5 | 1151.9 |
- For computational efficiency, we tested only the seven Atari games first examined by Mnih et al., 2013.
- For consistency with Schulman et al., 2017, each of our results above is the mean performance over the last 100 real episodes of training, averaged over three random seeds.
- The OpenAI baselines results were obtained here.
- As can be seen above, our implementation closely reproduces the results of the paper.