- Yu, Yang. "Towards Sample Efficient Reinforcement Learning." IJCAI. 2018. [pdf]
- Meta-Learning in Neural Networks: A Survey, arix, 2020. [pdf]
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems, arxiv, 2020. [pdf]
- Reinforcement Learning: Theory and Algorithms, 2020. [pdf]
- TRPO
- PPO
- DDPG
- DQN
- A Game Theoretic Framework for Model Based Reinforcement Learning, arxiv, 2020. [pdf]
- Benchmarking Model-Based Reinforcement Learning
- Calibrated model-based deep reinforcement learning
- Clavera, Ignasi, et al. "Model-based reinforcement learning via meta-policy optimization." arXiv preprint arXiv:1809.05214 (2018).
- Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization. AAAI, 2020. [pdf]
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models
- End-to-end differentiable physics for learning and control
- Exploring model-based planning with policy networks
- Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning,
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. arXiv preprint. arXiv:1805.12114, 2018.
- Learning latent dynamics for planning from pixels
- Meta-Model-Based Meta-Policy Optimization.
- Model Based Reinforcement Learning for Atari.
- Model-based active exploration.
- Model-based Adversarial Meta-Reinforcement Learning, arxiv, 2020. [pdf]
- Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs, arxiv, 2020. [pdf]
- Model-based Trust-Region Policy Optimization.
- MOReL: Model-Based Offline Reinforcement Learning, arxiv, 2020. [pdf]
- Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning
- On Optimism in Model-Based Reinforcement Learning
- On the Expressivity of Neural Networks for Deep Reinforcement Learning, ICML, 2020. [pdf]
- Overcoming Model Bias for Robust Offline Deep Reinforcement Learning, arxiv, 2020. [pdf]
- Policy optimization with model-based explorations, AAAI, 2019. [pdf]
- Ready Policy One: World Building Through Active Learning
- Recurrent World Models Facilitate Policy Evolution
- Sample complexity of reinforcement learning using linearly combined model ensembles
- Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Scalable Bayesian Optimization Using Deep Neural Networks
- Search on the replay buffer: Bridging planning and reinforcement learning
- Self-Supervised Exploration via Disagreement
- SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning, arxiv, 2020. [pdf]
- When to trust your model: Model-based policy optimization
- If MaxEnt RL is the Answer, What is the Question?
- Provably Efficient Q-Learning with Low Switching Cost
- A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning, arxiv, 2019. [pdf]
- Learning to adapt in dynamic, real-world environments through meta-reinforcement learning
- RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, ICLR, 2017. [pdf]
- Provably Efficient Causal Reinforcement Learning with Confounded Observational Data, arxiv, 2020. [pdf]
- Bellemare, M. G., Dabney, W., and Munos, R. A distributional perspective on reinforcement learning. ICML, 2017. [pdf]
- Dabney, W., Rowland, M., Bellemare, M. G., and Munos, R. Distributional reinforcement learning with quantile regression. AAAI, 2018 [pdf]
Researchers: Sergey Levine, Nando de Freitas, Nan Jiang, Emma Brunskill
- Accelerating Online Reinforcement Learning with Offline Datasets, arxiv, 2020. [pdf]
- Accelerating Reinforcement Learning with Learned Skill Priors, Arxiv, 2020. [pdf]
- Batch Exploration with Examples for Scalable Robotic Reinforcement Learning, arxiv, 2020. [pdf]
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019. [pdf]
- Critic Regularized Regression, arxiv, 2020. [pdf]
- D4RL: Datasets for Deep Data-Driven Reinforcement Learning, 2020. [pdf]
- Defining Admissible Rewards for High-Confidence Policy Evaluation in Batch Reinforcement Learning, ACM CHIL, 2020. [pdf]
- Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization, arxiv, 2020. [pdf]
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL, 2020. [pdf]
- Hyperparameter selection for offline reinforcement learning, arxiv, 2020. [pdf]
- Information-Theoretic Considerations in Batch Reinforcement Learning, ICML, 2019. [pdf]
- Learning Deep Features in Instrumental Variable Regression, ICLR, 2021. [pdf]
- MOPO: Model-based Offline Policy Optimization, arxiv, 2020. [pdf]
- Model-Based Offline Planning, arxiv, 2020. [pdf]
- Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning [pdf]
- NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning, arxiv, 2021. [pdf]
- Offline Meta Reinforcement Learning, arxiv, 2020. [pdf]
- Offline Meta-Reinforcement Learning with Advantage Weighting, arxiv, 2020. [pdf]
- Overcoming Model Bias for Robust Offline Deep Reinforcement Learning, arxiv, 2020. [pdf]
- Overfitting and Optimization in Offline Policy Learning, arxiv, 2020. [pdf]
- Q Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison, arxiv, 2020. [pdf]
- Rl unplugged: Benchmarks for offline reinforcement learning. NeurIPS, 2020. [pdf]
- Semi-supervised reward learning for offline reinforcement learning, arxiv, 2020, [pdf]
- Stabilizing off-policy q-learning via bootstrapping error reduction (BEAR). In Advances in Neural Information Processing Systems, pages 11761–11771, 2019. [pdf]
- The Importance of Pessimism in Fixed-Dataset Policy Optimization, arxiv, 2020. [pdf]
- An Optimistic Perspective on Offline Reinforcement Learning, ICML, 2020. [pdf]
- Conservative Q-Learning for Offline Reinforcement Learning, arxiv, 2020. [pdf]
- MOReL: Model-Based Offline Reinforcement Learning, NeurIPS, 2020. [pdf]
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems, arxiv, 2020. [pdf]
- Minimax Weight and Q-Function Learning for Off-Policy Evaluation, arxiv, 2019. [pdf]
- CoinDICE: Off-Policy Confidence Interval Estimation, arxiv, 2020. [pdf]
- An Optimal Policy for Patient Laboratory Tests in Intensive Care Units, arxiv, 2019. [pdf]
- Phasic Policy Gradient, arxiv, 2020. [pdf]
- Eastman, Peter, et al. "Solving the RNA design problem with reinforcement learning." PLoS computational biology 14.6 (2018): e1006176.
- Adaptive Droplet Routing in Digital Microfluidic Biochips Using Deep Reinforcement Learning, ICML, 2020. [pdf]
- Angermueller, Christof, et al. "Model-based reinforcement learning for biological sequence design." International Conference on Learning Representations. 2020. [pdf]
- Automated Optical Multi-layer Design via Deep Reinforcement Learning, arxiv, 2020. [pdf]
- Micro/Nano Motor Navigation and Localization via Deep Reinforcement Learning, Advanced Theory and Simulations, 2020. [pdf]
- Mills, Kyle, Pooya Ronagh, and Isaac Tamblyn. "Finding the ground state of spin Hamiltonians with reinforcement learning." Nature Machine Intelligence (2020): 1-9.
- Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning, arxiv, 2020. [pdf]
- Optimal policy learning for COVID-19 prevention using reinforcement learning, Journal of Information Science, 2020. [pdf]
- Learning to Drive in a Day, 2018. [pdf]
- Data Valuation with Reinforcement Learning, ICML, 2020. [pdf]
- Learning When-to-Treat Policies, arxiv, 2020. [pdf]
- SoundSpaces: Audio-Visual Navigation in 3D Environments, ECCV, 2020. [pdf]
- Model Inversion Networks for Model-Based Optimization
- Incomplete Conditional Density Estimation for Fast Materials Discovery
- Autofocused oracles for model-based design, arxiv, 2020. [pdf]
- Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations, NeurIPS, 2020. [pdf]
- Predictive Entropy Search for Multi-objective Bayesian Optimization, ICML, 2020. [pdf]
ICML 2021 Reading List
Offline RL
- Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
- Offline Contextual Bandits with Overparameterized Models
- Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment
- Offline Reinforcement Learning with Fisher Divergence Critic Regularization
- Offline Meta-Reinforcement Learning with Advantage Weighting
- Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning
- Offline Reinforcement Learning with Pseudometric Learning
- Representation Matters: Offline Pretraining for Sequential Decision Making
- Is Pessimism Provably Efficient for Offline RL?
- OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
- Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills
- EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
- Conservative Objective Models for Effective Offline Model-Based Optimization
- Instabilities of Offline RL with Pre-Trained Neural Representation
Meta Learning
- A Distribution-dependent Analysis of Meta Learning
- MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration
- Improving Generalization in Meta-learning via Task Augmentation
- PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees
- Data Augmentation for Meta-Learning
- How Important is the Train-Validation Split in Meta-Learning?
- Provable Meta-Learning of Linear Representations
- Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices
- A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning
- Meta-Learning Bidirectional Update Rules
- Function Contrastive Learning of Transferable Meta-Representations
- Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
- Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation
- Meta-Thompson Sampling
- Memory Efficient Online Meta Learning
- Meta-learning Hyperparameter Performance Prediction with Neural Processes
Model-based RL
- Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning
- Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Continuous-time Model-based Reinforcement Learning
- Model-Based Reinforcement Learning via Latent-Space Collocation
- PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Temporal Predictive Coding For Model-Based Planning In Latent Space
- Awesome Meta Learning [link]
- [AWAC: Accelerating Online Reinforcement Learning with Offline Datasets]
- [D4RL: Building Better Benchmarks for Offline Reinforcement Learning]
- Stanford CS234 Reinforcement Learning: http://web.stanford.edu/class/cs234/index.html
- Stanford CS330 Deep Multi-task and meta-learning: https://cs330.stanford.edu/
- UIUC CS598 Reinforcement learning theory: https://nanjiang.cs.illinois.edu/cs598/
- Berkeley CS285 Deep Reinforcement Learning: http://rail.eecs.berkeley.edu/deeprlcourse
- NYU Deep Learning: https://atcold.github.io/pytorch-Deep-Learning/