Guozheng Ma · Zhen Wang · Zhecheng Yuan · Xueqian Wang · Bo Yuan · Dacheng Tao
Visual reinforcement learning (RL) that makes decisions directly from high-dimensional visual inputs has demonstrated significant potential in various domains. However, deploying visual RL techniques to the real world remains challenging due to their low sample efficiency and large generalization gap.
To tackle these obstacles, data augmentation (DA) has become a widely used technique in visual RL to acquire sample-efficient and generalizable policies by diversifying the training data.
This repository is based on the paper "A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning", authored by Guozheng Ma, Zhen Wang, Zhecheng Yuan, Xueqian Wang, Bo Yuan and Dacheng Tao.
In this repository, we conduct a systematic collection of existing papers, focusing on 1️⃣ How to Augment the Data and 2️⃣ How to Leverage the Augmented Data in visual RL.
You can click the button of 🔖 to jump to the details of corresponding paper!
The list of related paper will be continuously updated. Welcome to follow this repo! ⭐
If this repository is useful to you, please consider citing our paper:
@article{guozheng2022comprehensive, title={A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning}, author={Guozheng Ma, Zhen Wang, Zhecheng Yuan, Xueqian Wang, Bo Yuan, Dacheng Tao}, journal={arXiv preprint arXiv:2210.04561}, year={2022} }
The aim of data augmentation (DA) is to increase the amount and diversity of the original training data, so that agents can learn more efficient and robust policies. Thus, a primary focus of previous research is to design effective augmentation approaches.
Depending on the type of data that the DA technique aims to modify, we divide DA in visual RL into Observation Augmentation, Transition Augmentation and Trajectory Augmentation.
Related Studies |
---|
Rotation, Translation, and Cropping for Zero-Shot Generalization (CoG 2020) (paper) |
[RandFM] Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning (ICLR 2020) (paper) (code) |
[RAD] Reinforcement Learning with Augmented Data (NeurIPS 2020) (paper) (code) 🔖 |
[MixReg] Improving Generalization in Reinforcement Learning with Mixture Regularization (NeurIPS 2020) (paper) (code) 🔖 |
[PAADA] Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation (ICML 2022 Workshop) (arxiv paper) (workshop paper) |
[MixStyle] Domain Generalization with MixStyle (ICLR 2021) (paper) (code) |
[PlayVirtual] Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning (NeurIPS 2021) (paper) (code) 🔖 |
[CLOP] Local Feature Swapping for Generalization in Reinforcement Learning (ICLR 2022) (paper) |
[SRM] Spectrum Random Masking for Generalization in Image-based Reinforcement Learning (NeurIPS 2022) (paper)(code) |
[CoIT] On the Data-Efficiency with Contrastive Image Transformation in Reinforcement Learning (ICLR 2023) (paper) (code) |
[CG2A] Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation (ICCV 2023) (paper) |
[HAVE] Hierarchical Adaptive Value Estimation for Multi-modal Visual Reinforcement Learning (NeurIPS 2023) (paper) (code) |
Detailed introduction and analysis of the representative studies can be viewed by clicking its 🔖.
Automatic data augmentation is receiving growing attentions due to the demand of task-specific augmentation. Generally, different tasks benefit from different augmentations, and selecting the most appropriate one requires expert knowledge. Therefore, it is imperative to design a method that can automatically identify the most effective augmentation method.
Related Studies |
---|
[UCB-DrAC] Automatic Data Augmentation for Generalization in Reinforcement Learning (NeurIPS 2021) (paper) (code) |
[UCB-RAD] Automatic Data Augmentation by Upper Confidence Bounds for Deep Reinforcement Learning (ICCAS 2021) (paper) |
Another deficiency of current data augmentation methods is that they rely on pixel-level image transformations, where each pixel is treated in a context-agnostic manner. However, in visual RL, pixels in the observation are likely to have different levels of relevance to decision making. This context-agnostic augmentation may mask or destroy the regions in the original observation that are vital for decision-making. Therefore, it is necessary to incorporate context awareness into augmentation.
Related Studies |
---|
[EXPAND] Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation (NeurIPS 2021) (paper) 🔖 |
[TLDA] Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning (IJCAI 2022) (paper) 🔖 |
The application scenarios where data augmentation plays a vital role can be divided into three cases:
- RL agents are trained with task-specific rewards in Case 1 and Case 2, where DA is implemented as an implicit regularization penalty when enlarging the training set.
- However, the effect of implicit regularization is limited and many studies have attempted to design the auxiliary loss to exploit the potential of data augmentation.
- There are also some studies aiming to decouple the representation learning from policy optimization to gain more generalizable policies.
- In Case 3, the environment is explored without any task-specific rewards, and DA can be exploited to learn the task-agnostic representation using unsupervised learning.
The definition of implicit and explicit regularization in 📑WIKIPEDIA
Explicit regularization is regularization whenever one explicitly adds a term to the optimization problem. These terms could be priors, penalties, or constraints.
Implicit regularization is all other forms of regularization. This includes, for example, early stopping, using a robust loss function, and discarding outliers.
The initial and naive practice of DA is to expand the training set with augmented (synthesized) samples This practice introduces prior-based human knowledge into the data, instead of designing explicit penalty terms or modifying the optimization procedure. Hence, it is often classified as a type of implicit regularization.
Related Studies |
---|
[RAD] Reinforcement Learning with Augmented Data (NeurIPS 2020) (paper) (code) 🔖 |
[DrQ] Image Augmentation is All You Need: Regularizing Deep Reinforcement Learning from Pixels (ICLR 2021) (paper) (code) 🔖 |
[DrQ-v2] Mastering Visual Continuous Control- Improved Data-Augmented Reinforcement Learning (ICLR 2022) (paper) (code) |
[SVEA] Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation (NeurIPS 2021) (paper) (code) 🔖 |
To further facilitate the representation learning process, we can design auxiliary objectives (with DA) as explicit regularization. In general, an auxiliary task can be considered as an additional cost-function that the RL agent can predict and observe from the environment in a self-supervised fashion.
For example, the last layer of the network can be split into multiple parts (heads), each working on a specific task. Multiple heads then propagate errors back to the shared network layers that form the complete representations required by all heads.
Related Studies |
---|
[CURL] Contrastive Unsupervised Representations for Reinforcement Learning (ICML 2020) (paper) (code) 🔖 |
[PI-SAC] Predictive Information Accelerates Learning in RL (NeurIPS 2020) (paper) (code) |
[SPR] Data-Efficient Reinforcement Learning with Self-Predictive Representations (ICLR 2021) (paper) (code) 🔖 |
[UCB-DrAC] Automatic Data Augmentation for Generalization in Reinforcement Learning (NeurIPS 2021) (paper) (code) |
[PlayVirtual] Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning (NeurIPS 2021) (paper) (code) |
[CCFDM] Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model (IROS 2021) (paper) |
[SIM] Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation (AAAI 2022) (paper) |
[CRESP] Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions (KDD 2022) (paper) (code) |
[CCLF] CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning (IJCAI 2022) (paper) |
[DRIBO] DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck (ICML 2022) (paper) (code) 🔖 |
[CoDy] Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images (Neurocomputing 2022) (paper) |
[M-CURL] Masked Contrastive Representation Learning for Reinforcement Learning (TPAMI 2022) (paper) |
[VCD] Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement Learning (arxiv 2022) (paper) |
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels? (NeurIPS 2022) (paper) |
[InDA, ExDA] Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning (NeurIPS 2022) (paper) (code) |
[A2LS] Reinforcement Learning with Automated Auxiliary Loss Search (NeurIPS 2022) (paper) (code) |
[MLR] Mask-based Latent Reconstruction for Reinforcement Learning (NeurIPS 2022) (paper) |
The fragility of RL poses a dilemma:
- aggressive augmentations are necessary for achieving good generalization in the visual domain,
- while injecting heavy data augmentations into the optimization of RL objective may cause a deterioration in both the sample efficiency and the training stability.
Recent works argue that this is mainly due to the conflation of two objectives: policy optimization and robust representation learning. Hence, an intuitive idea is to decouple the training data flow:
- using non-augmented or weak-augmented data for RL optimization
- while using strong-augmented data for representation learning.
Related Studies |
---|
[SODA] Generalization in Reinforcement Learning by Soft Data Augmentation (ICRA 2021) (paper) (code) 🔖 |
[SECANT] Self-Expert Cloning for Zero-Shot Generalization of Visual Policies (ICML 2021) (paper) (code) 🔖 |
The visual representations of standard end-to-end RL methods heavily rely on the task-specific reward, making them ineffective for other tasks. To overcome this limitation, the environment can be first explored in a task-agnostic fashion to learn its visual representations without any task-specific rewards, and specific downstream tasks can be subsequently solved efficiently.
DA is exploited as a key part of unsupervised presentation learning.
Related Studies |
---|
[ATC] Decoupling Representation Learning from Reinforcement Learning (ICML 2021) (paper) (code) 🔖 |
[Proto-RL] Reinforcement Learning with Prototypical Representations (ICML 2021) (paper) (code) 🔖 |
[SGI] Pretraining Representations for Data-Efficient Reinforcement Learning (NeurIPS 2021) (paper) (code) |
[CIC] CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery (arxiv 2022) (paper) (code) |