Skip to content

Latest commit

 

History

History
107 lines (66 loc) · 6.57 KB

rl.md

File metadata and controls

107 lines (66 loc) · 6.57 KB
layout title permalink
page
Reinforcement Learning
/theory/

Deep Reinforcement Learning notes

In this section you can find our summaries from Sergey Levine (Google, UC Berkeley): UC Berkeley CS-285 Deep Reinforcement Learning course.

Basics

{% include card.html title="Introduction" brief="Supervised vs Unsupervised vs Reinforcement;Ways to learn;How to build intelligent machines;State of the art" img="/_rl/lecture_1/icon.jpg" url="/lectures/lecture1" type="bulletlist" %}

{% include card.html title="Reinforcement Learning Overview" brief="Markov Decision Process;The goal of RL;RL algorithms" img="/_rl/lecture_4/icon.png" url="/lectures/lecture4" type="bulletlist"%}

{% include card.html title="Imitation Learning" brief="Behavioral Cloning;Why doesn't it work" img="/_rl/lecture_2/icon.png" url="/lectures/lecture2" type="bulletlist" star="no"%}



### Model-Free RL

{% include card.html title="Policy Gradients" brief="Policy differentiation;The REINFORCE algorithm;Solving the causality issue;Baselines;Off-policy Policy Gradient" img="/_rl/lecture_5/icon.png" url="/lectures/lecture5" type="bulletlist" %}

{% include card.html title="Actor-Critic (AC) Algorithms" brief="Policy Gradients variance reduction;Policy Evaluation (Monte Carlo vs Bootstrapping);Infinite horizon problems;Batch AC algorithm;Online AC algorithm" img="/_rl/lecture_6/icon.png" url="/lectures/lecture6" type="bulletlist" %}

{% include card.html title="Value Function Methods" brief="Policy Iteration;Value Iteration;Q iteration with Deep Learning;Q Learning;Exploration" img="/_rl/lecture_7/icon.png" url="/lectures/lecture7" type="bulletlist" %}

{% include card.html title="Deep RL with Q-functions" brief="Replay buffer and target network;DQN (Deep Q Networks);Double Q-Learning;Multi-step returns;Continuous actions;DDPG (Deep Deterministic Policy Gradient)" img="/_rl/lecture_8/icon.png" url="/lectures/lecture8" type="bulletlist"%}

{% include card.html title="Advanced Policy Gradients" brief="Policy Gradient as Policy Iteration;The KL Divergence constraint;Dual Gradient Descent;Natural Gradients and Trust Region Policy Optimization;Proximal Policy Optimization" img="/_rl/lecture_9/icon.png" url="/lectures/lecture9" type="bulletlist"%}



### Model-Based RL

{% include card.html title="Model-based Planning" brief="Deterministic vs Stochastic environments;Stochastic optimization methods;Monte Carlo Tree Search (MCTS);Collocation trajectory optimization;Shooting trajectory optimization" img="/_rl/lecture_10/icon.png" url="/lectures/lecture10" type="bulletlist"%}

{% include card.html title="Model-based Reinforcement Learning" brief="Naive Model-Based RL;Uncertainty in model-based RL;Model-based RL with complex observations" img="/_rl/lecture_11/icon.png" url="/lectures/lecture11" type="bulletlist"%}

{% include card.html title="Model-based Policy Learning" brief="How to use env. models to learn policies;Local vs Global policies;Guided policy search;Policy Distillation;Divide & conquer RL" img="/_rl/lecture_12/icon.png" url="/lectures/lecture12" type="bulletlist"%}



### Advanced Topics

{% include card.html title="Inverse Reinforcement Learning" brief="Underspecification problem;Feature Matching IRL;Maximum Entropy IRL" img="/_rl/lecture_15/icon.png" url="/lectures/lecture15" type="bulletlist"%}

{% include card.html title="Transfer and Multi-task Learning" brief="Forward Transfer;Multi-task Transfer" img="/_rl/lecture_16/icon.png" url="/rl/transfer_and_multitask_rl" type="bulletlist"%}

{% include card.html title="Meta-RL" brief="Meta-Learning;Recurrent Meta-RL;Gradient-Based Meta-RL (MAML);Meta-RL as a POMDP;Model-Based Meta-RL" img="/_rl/lecture_20/optimization_idea_2.png" url="/rl/meta-rl" type="bulletlist" %}

{% include card.html title="Distributed RL" brief="Original DQN;GORILA;A3C;IMPALA;Ape-X;R2D3;QT-Opt;Evolution Strategies;Population-based Training" img="/_rl/lecture_17/icon.png" url="/lectures/lecture17" type="bulletlist"%}


## Annex

This section contains both basic RL knowledge assumed to be known in the previous course and some demonstrations which we found interesting to add as an annex. In addition we added our own interpretations of some concepts hoping they can ease their understanding.

{% include paper-card.html title="MDP Basics" subtitle="" url="/lectures/basic_concepts" %} {% include paper-card.html title="Policy Expectations, Explained" subtitle="" url="/lectures/policy_expectations" %} {% include paper-card.html title="Policy Gradients" subtitle="" url="/lectures/policy_gradients_annex" %}


## Other great resources