Grokking Deep Reinforcement Learning

Note: At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.

To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.

Running the code

Clone this repo:
git clone --depth 1 https://github.com/mimoralea/gdrl.git && cd gdrl
Pull the gdrl image with:
docker pull mimoralea/gdrl:v0.14
Spin up a container:
- On Mac or Linux:
  docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
- On Windows:
  docker run -it --rm -p 8888:8888 -v %CD%/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14
- NOTE: Use nvidia-docker or add --gpus all after --rm to the command, if you are using a GPU.
Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is: gdrl

About the book

Book's website

https://www.manning.com/books/grokking-deep-reinforcement-learning

Table of content

Introduction to deep reinforcement learning
Mathematical foundations of reinforcement learning
Balancing immediate and long-term goals
Balancing the gathering and utilization of information
Evaluating agents' behaviors
Improving agents' behaviors
Achieving goals more effectively and efficiently
Introduction to value-based deep reinforcement learning
More stable value-based methods
Sample-efficient value-based methods
Policy-gradient and actor-critic methods
Advanced actor-critic methods
Towards artificial general intelligence

Detailed table of content

1. Introduction to deep reinforcement learning

(Livebook)
(No Notebook)

2. Mathematical foundations of reinforcement learning

(Livebook)
(Notebook)
- Implementations of several MDPs:
  - Bandit Walk
  - Bandit Slippery Walk
  - Slippery Walk Three
  - Random Walk
  - Russell and Norvig's Gridworld from AIMA
  - FrozenLake
  - FrozenLake8x8

3. Balancing immediate and long-term goals

(Livebook)
(Notebook)
- Implementations of methods for finding optimal policies:
  - Policy Evaluation
  - Policy Improvement
  - Policy Iteration
  - Value Iteration

4. Balancing the gathering and utilization of information

(Livebook)
(Notebook)
- Implementations of exploration strategies for bandit problems:
  - Random
  - Greedy
  - E-greedy
  - E-greedy with linearly decaying epsilon
  - E-greedy with exponentially decaying epsilon
  - Optimistic initialization
  - SoftMax
  - Upper Confidence Bound
  - Bayesian

5. Evaluating agents' behaviors

(Livebook)
(Notebook)
- Implementation of algorithms that solve the prediction problem (policy estimation):
  - On-policy first-visit Monte-Carlo prediction
  - On-policy every-visit Monte-Carlo prediction
  - Temporal-Difference prediction (TD)
  - n-step Temporal-Difference prediction (n-step TD)
  - TD(λ)

6. Improving agents' behaviors

(Livebook)
(Notebook)
- Implementation of algorithms that solve the control problem (policy improvement):
  - On-policy first-visit Monte-Carlo control
  - On-policy every-visit Monte-Carlo control
  - On-policy TD control: SARSA
  - Off-policy TD control: Q-Learning
  - Double Q-Learning

7. Achieving goals more effectively and efficiently

(Livebook)
(Notebook)
- Implementation of more effective and efficient reinforcement learning algorithms:
  - SARSA(λ) with replacing traces
  - SARSA(λ) with accumulating traces
  - Q(λ) with replacing traces
  - Q(λ) with accumulating traces
  - Dyna-Q
  - Trajectory Sampling

8. Introduction to value-based deep reinforcement learning

(Livebook)
(Notebook)
- Implementation of a value-based deep reinforcement learning baseline:
  - Neural Fitted Q-iteration (NFQ)

9. More stable value-based methods

(Livebook)
(Notebook)
- Implementation of "classic" value-based deep reinforcement learning methods:
  - Deep Q-Networks (DQN)
  - Double Deep Q-Networks (DDQN)

10. Sample-efficient value-based methods

(Livebook)
(Notebook)
- Implementation of main improvements for value-based deep reinforcement learning methods:
  - Dueling Deep Q-Networks (Dueling DQN)
  - Prioritized Experience Replay (PER)

11. Policy-gradient and actor-critic methods

(Livebook)
(Notebook)
- Implementation of classic policy-based and actor-critic deep reinforcement learning methods:
  - Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
  - Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)
  - Asynchronous Advantage Actor-Critic (A3C)
  - Generalized Advantage Estimation (GAE)
  - [Synchronous] Advantage Actor-Critic (A2C)

12. Advanced actor-critic methods

(Livebook)
(Notebook)
- Implementation of advanced actor-critic methods:
  - Deep Deterministic Policy Gradient (DDPG)
  - Twin Delayed Deep Deterministic Policy Gradient (TD3)
  - Soft Actor-Critic (SAC)
  - Proximal Policy Optimization (PPO)

13. Towards artificial general intelligence

(Livebook)
(No Notebook)

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
docker		docker
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grokking Deep Reinforcement Learning

Running the code

About the book

Book's website

Table of content

Detailed table of content

1. Introduction to deep reinforcement learning

2. Mathematical foundations of reinforcement learning

3. Balancing immediate and long-term goals

4. Balancing the gathering and utilization of information

5. Evaluating agents' behaviors

6. Improving agents' behaviors

7. Achieving goals more effectively and efficiently

8. Introduction to value-based deep reinforcement learning

9. More stable value-based methods

10. Sample-efficient value-based methods

11. Policy-gradient and actor-critic methods

12. Advanced actor-critic methods

13. Towards artificial general intelligence

About

Releases

Packages

Contributors 2

Languages

License

mimoralea/gdrl

Folders and files

Latest commit

History

Repository files navigation

Grokking Deep Reinforcement Learning

Running the code

About the book

Book's website

Table of content

Detailed table of content

1. Introduction to deep reinforcement learning

2. Mathematical foundations of reinforcement learning

3. Balancing immediate and long-term goals

4. Balancing the gathering and utilization of information

5. Evaluating agents' behaviors

6. Improving agents' behaviors

7. Achieving goals more effectively and efficiently

8. Introduction to value-based deep reinforcement learning

9. More stable value-based methods

10. Sample-efficient value-based methods

11. Policy-gradient and actor-critic methods

12. Advanced actor-critic methods

13. Towards artificial general intelligence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages