layout	title	permalink
page	Reinforcement Learning	/theory/

Deep Reinforcement Learning notes

In this section you can find our summaries from Sergey Levine (Google, UC Berkeley): UC Berkeley CS-285 Deep Reinforcement Learning course.

Basics

{% include card.html title="Introduction" brief="Supervised vs Unsupervised vs Reinforcement;Ways to learn;How to build intelligent machines;State of the art" img="/_rl/lecture_1/icon.jpg" url="/lectures/lecture1" type="bulletlist" %}

{% include card.html title="Reinforcement Learning Overview" brief="Markov Decision Process;The goal of RL;RL algorithms" img="/_rl/lecture_4/icon.png" url="/lectures/lecture4" type="bulletlist"%}

{% include card.html title="Imitation Learning" brief="Behavioral Cloning;Why doesn't it work" img="/_rl/lecture_2/icon.png" url="/lectures/lecture2" type="bulletlist" star="no"%}

### Model-Free RL

{% include card.html title="Policy Gradients" brief="Policy differentiation;The REINFORCE algorithm;Solving the causality issue;Baselines;Off-policy Policy Gradient" img="/_rl/lecture_5/icon.png" url="/lectures/lecture5" type="bulletlist" %}

{% include card.html title="Actor-Critic (AC) Algorithms" brief="Policy Gradients variance reduction;Policy Evaluation (Monte Carlo vs Bootstrapping);Infinite horizon problems;Batch AC algorithm;Online AC algorithm" img="/_rl/lecture_6/icon.png" url="/lectures/lecture6" type="bulletlist" %}

{% include card.html title="Value Function Methods" brief="Policy Iteration;Value Iteration;Q iteration with Deep Learning;Q Learning;Exploration" img="/_rl/lecture_7/icon.png" url="/lectures/lecture7" type="bulletlist" %}

{% include card.html title="Deep RL with Q-functions" brief="Replay buffer and target network;DQN (Deep Q Networks);Double Q-Learning;Multi-step returns;Continuous actions;DDPG (Deep Deterministic Policy Gradient)" img="/_rl/lecture_8/icon.png" url="/lectures/lecture8" type="bulletlist"%}

{% include card.html title="Advanced Policy Gradients" brief="Policy Gradient as Policy Iteration;The KL Divergence constraint;Dual Gradient Descent;Natural Gradients and Trust Region Policy Optimization;Proximal Policy Optimization" img="/_rl/lecture_9/icon.png" url="/lectures/lecture9" type="bulletlist"%}

### Model-Based RL

{% include card.html title="Model-based Planning" brief="Deterministic vs Stochastic environments;Stochastic optimization methods;Monte Carlo Tree Search (MCTS);Collocation trajectory optimization;Shooting trajectory optimization" img="/_rl/lecture_10/icon.png" url="/lectures/lecture10" type="bulletlist"%}

{% include card.html title="Model-based Reinforcement Learning" brief="Naive Model-Based RL;Uncertainty in model-based RL;Model-based RL with complex observations" img="/_rl/lecture_11/icon.png" url="/lectures/lecture11" type="bulletlist"%}

{% include card.html title="Model-based Policy Learning" brief="How to use env. models to learn policies;Local vs Global policies;Guided policy search;Policy Distillation;Divide & conquer RL" img="/_rl/lecture_12/icon.png" url="/lectures/lecture12" type="bulletlist"%}

### Advanced Topics

{% include card.html title="Inverse Reinforcement Learning" brief="Underspecification problem;Feature Matching IRL;Maximum Entropy IRL" img="/_rl/lecture_15/icon.png" url="/lectures/lecture15" type="bulletlist"%}

{% include card.html title="Transfer and Multi-task Learning" brief="Forward Transfer;Multi-task Transfer" img="/_rl/lecture_16/icon.png" url="/rl/transfer_and_multitask_rl" type="bulletlist"%}

{% include card.html title="Meta-RL" brief="Meta-Learning;Recurrent Meta-RL;Gradient-Based Meta-RL (MAML);Meta-RL as a POMDP;Model-Based Meta-RL" img="/_rl/lecture_20/optimization_idea_2.png" url="/rl/meta-rl" type="bulletlist" %}

{% include card.html title="Distributed RL" brief="Original DQN;GORILA;A3C;IMPALA;Ape-X;R2D3;QT-Opt;Evolution Strategies;Population-based Training" img="/_rl/lecture_17/icon.png" url="/lectures/lecture17" type="bulletlist"%}

## Annex

This section contains both basic RL knowledge assumed to be known in the previous course and some demonstrations which we found interesting to add as an annex. In addition we added our own interpretations of some concepts hoping they can ease their understanding.

{% include paper-card.html title="MDP Basics" subtitle="" url="/lectures/basic_concepts" %} {% include paper-card.html title="Policy Expectations, Explained" subtitle="" url="/lectures/policy_expectations" %} {% include paper-card.html title="Policy Gradients" subtitle="" url="/lectures/policy_gradients_annex" %}

## Other great resources

Reinforcement Learning: An Introduction, Sutton & Barto, 2017. (Arguably the most complete RL book out there)
David Silver (DeepMind, UCL): UCL COMPM050 Reinforcement Learning course.
Lil'Log blog does and outstanding job at explaining algorithms and recent developments in both RL and SL.
This RL dictionary can also be useful to keep track of all field-specific terms.
If looking for some motivation to learn about DRL don't miss this truly inspiring documentary on DeepMind's AlphaGo algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rl.md

rl.md

Deep Reinforcement Learning notes

Basics

Files

rl.md

Latest commit

History

rl.md

File metadata and controls

Deep Reinforcement Learning notes

Basics