Note: At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.
To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.
- Pull the gdrl image with:
docker pull mimoralea/gdrl:v0.12
- Spin up a container:
docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.12
(remember to usenvidia-docker
if you are using a GPU.) - Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is:
gdrl
https://www.manning.com/books/grokking-deep-reinforcement-learning
- Introduction to deep reinforcement learning
- Mathematical foundations of reinforcement learning
- Balancing immediate and long-term goals
- Balancing the gathering and utilization of information
- Estimating agents' behaviors
- Improving agents' behaviors
- Achieving goals more effectively and efficiently
- Introduction to value-based deep reinforcement learning
- More stable value-based methods
- Sample-efficient value-based methods
- Introduction to policy-based deep reinforcement learning
- Parallelizing policy-based methods
- Deterministic policy gradient methods
- Conservative policy optimization methods
- Towards artificial general intelligence
- Introduction to deep reinforcement learning
- Mathematical foundations of reinforcement learning
- Implementations of several MDPs:
- Bandit Walk
- Bandit Slippery Walk
- Slippery Walk Three
- Random Walk
- Russell and Norvig's Gridworld from AIMA
- FrozenLake
- FrozenLake8x8
- Implementations of several MDPs:
- Balancing immediate and long-term goals
- Implementations of methods for finding optimal policies:
- Policy Evaluation
- Policy Improvement
- Policy Iteration
- Value Iteration
- Implementations of methods for finding optimal policies:
- Balancing the gathering and utilization of information
- Implementations of exploration strategies for bandit problems:
- Random
- Greedy
- E-greedy
- E-greedy with linearly decaying epsilon
- E-greedy with exponentially decaying epsilon
- Optimistic initialization
- SoftMax
- Upper Confidence Bound
- Bayesian
- Implementations of exploration strategies for bandit problems:
- Estimating agents' behaviors
- Implementation of algorithms that solve the prediction problem (policy estimation):
- On-policy Monte-Carlo prediction
- Temporal-Difference prediction (TD)
- n-step Temporal-Difference prediction (n-step TD)
- TD(λ)
- Implementation of algorithms that solve the prediction problem (policy estimation):
- Improving agents' behaviors
- Implementation of algorithms that solve the control problem (policy improvement):
- On-policy first-visit Monte-Carlo control
- On-policy TD control: SARSA
- Off-policy TD control: Q-Learning
- Implementation of algorithms that solve the control problem (policy improvement):
- Achieving goals more effectively and efficiently
- Implementation of more effective and efficient reinforcement learning algorithms:
- Double Q-Learning
- SARSA(λ)
- Q(λ)
- Dyna-Q (model-based method)
- Implementation of more effective and efficient reinforcement learning algorithms:
- Introduction to value-based deep reinforcement learning
- Implementation of a value-based deep reinforcement learning baseline:
- Neural Fitted Q-iteration (NFQ)
- Implementation of a value-based deep reinforcement learning baseline:
- More stable value-based methods
- Implementation of "classic" value-based deep reinforcement learning methods:
- Deep Q-Networks (DQN)
- Double Deep Q-Networks (DDQN)
- Implementation of "classic" value-based deep reinforcement learning methods:
- Sample-efficient value-based methods
- Implementation of main improvements for value-based deep reinforcement learning methods:
- Dueling Deep Q-Networks (Dueling DQN)
- Prioritized Experience Replay (PER)
- Implementation of main improvements for value-based deep reinforcement learning methods:
- Introduction to policy-based deep reinforcement learning
- Implementation of classic policy-based deep reinforcement learning methods:
- Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
- Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)
- Implementation of classic policy-based deep reinforcement learning methods:
- Parallelizing policy-based methods
- Implementation of main improvements to policy-based deep reinforcement learning methods:
- Asynchronous Advantage Actor-Critic (A3C)
- Generalized Advantage Estimation (GAE)
- [Synchronous] Advantage Actor-Critic (A2C)
- Implementation of main improvements to policy-based deep reinforcement learning methods:
- Deterministic policy gradient methods
- Implementation of deterministic policy gradient deep reinforcement learning methods:
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
- Implementation of deterministic policy gradient deep reinforcement learning methods:
- Conservative policy optimization methods
- Implementation of conservative policy gradient deep reinforcement learning methods:
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Implementation of conservative policy gradient deep reinforcement learning methods:
- Towards artificial general intelligence