Skip to content

Tracking literature and additional online resources on transformers for sequential decision making including RL and beyond.

License

Notifications You must be signed in to change notification settings

hammer-wang/Awesome-Transformers-for-Sequential-Decision-Making

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 

Repository files navigation

Awesome Transformers for Sequential Decision Making

Awesome

Recent progress on Transformers have made researchers to re-think sequential decision making.

This repo tracks literature and additional online resources on transformers for reinforcement learning and more general sequential decision making problems. We provide a short summary of each paper. Though we have tried our best to include all relevant works, it's possible that we might have missed your work. Please feel free to create an issue if you want your work to be added.

While we were preparing this repo, we noticed the Awesome-Decision-Transformer repo that also covers decision transformer literature. Awesome-Reinforcement-Learning does not provide paper summaries but lists the experiment environment used in each paper. We believe both repos are helpful for beginners to get started on Transformers for RL. If you find these resources to be useful, please follow and star both repos!

Papers

🆕 ArXiv

IS CONDITIONAL GENERATIVE MODELING ALL YOU NEED FOR DECISION-MAKING?

[paper]


How Crucial is Transformer in Decision Transformer?

[paper]


🆕 NeurIPS'22

Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing

[paper]


Pre-Trained Language Models for Interactive Decision-Making

[paper]


Masked Autoencoding for Scalable and Generalizable Decision Making

# MaskDP

[paper]

One limitation of DT is the requirement of reward-labeled dataset. In this paper, the authors borrow idea from masked language modeling for sequential decision making and develop a method called MaskDP to pretrain transformers to predict sequential decisions without reward-labeled datasets. They show that both goal-reaching and offline RL can be achieved by different masking strategy at inference time. However, offline RL is slightly more complex than simple goal-reaching as the goal is to achieve maximum return, so the authors also add a critic head and an actor head on the pretrained transformer backbone.


UniMASK: Unified Inference in Sequential Decision Problems

# Uni[MASK]

[paper]

This paper also investigates pretraining for sequential decision making and, similar to MaskDP, the authors point out that many sequential decision making tasks can be achieved by different masking schemes. Together with MaskDP, Uni[MASK] could inspire a new paradigm for sequential decision making.


You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environment

# ESPER

[Paper]

Issues of transformers in stochastic environment. The proposed method learns to cluster trajectories and conditions on average cluster returns.


Behavior Transformers: Cloning k modes with one stone

# BeT

[Paper] [Code]

The authors proposed Behavior Transformer to model unlabeled demonstration data with multiple modes. It introduces action correction to predict multi-modal continuous actions.


On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning

[paper]


Multi-Game Decision Transformers

[Paper]

Similar to GATO, this paper studies the applying a single transformer-based RL agent to play multiple games.


Bootstrapped Transformer for Offline Reinforcement Learning

[Paper]

To address the offline data limitation, this paper uses the learned dynamics model to generate data. It’s a data augmentation method. It uses trajectory transformer as the model.


Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

[Paper]


Previous

Stabilizing Transformers for Reinforcement Learning

ICML'20 [Paper]

One of the first works succssfully applying transformers in the RL settings. This work aims to replace LSTM used in online RL with Transformers. The authors observed that training large-scale transformers in RL settings is unstable. Thus they proposed the Gate Transformer-XL architecture and showed that the novel architecture outperformed LSTMs in the DMLab-30 benchmark with a good training stability.


Representation Matters: Offline Pretraining for Sequential Decision Making

ICML'21 [Paper]


Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

NeurIPS'21 [Paper]


Decision Transformer: Reinforcement Learning via Sequence Modeling

NeurIPS'21 [Paper] [Code]
A seminal work that proposed a supervised learning framework based on transformers for sequential decision making tasks. It tackles RL as a sequence generation task. Given a pre-collected sequence decision making dataset, the Decision Transformer (DT) is trained to generated the action sequence that can lead to the expected return-to-go, which is used as the input to the transformer model.


Offline Reinforcement Learning as One Big Sequence Modeling Problem

NeurIPS'21 [Paper] [Code]
This is another seminal work on applying transformers to RL and it was concurrent to Decision Transformer. The authors proposed Trajectory Transformer (TT) that combines transformers and beam search as a model-based approach for offline RL.


Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

ICLR'21 [Paper] This paper introduces a distillation procedure that transfers learning progress from a large capacity learner model to a small capacity actor model. The proposed method can reduce the inference latency of the deployed RL agent.


Generalized Decision Transformer for Offline Hindsight Information Matching

ICLR'22 [Paper] [Code]
The paper derived a RL problem formulation called Hindsight Information Matching (HIM) from many recently proposed RL algorithms that use future trajectory information to accelerate the learning of a conditional policy. The authors discussed three HIM variations including Generalized DT, Categorical DT, and Bi-Directional DT.


Scene Transformer: A unified architecture for predicting multiple agent trajectories

ICLR'22 [Paper]


RvS: what is essential for offline RL via supervised learning?

ICLR'22 [Paper] [Code]

DT solves reinforcement learning through supervised learning. It was hypothesized that the large model capacity of transformers could lead to better policies. The authors of this paper challenged the hypothesis and showed that a simple two-layer feedforward MLP led to similar performance with transformer-based methods. The findings of this paper imply that current designs of transformer-based reinforcement learning algorithms may not fully leverage the potential advantages of transformers.


Online Decision Transformer

ICML'22 [Paper] [Code]

This work combines offline pretraining and online finetuning.


Prompting Decision Transformer for Few-Shot Policy Generalization

ICML'22 [Paper] [Code]

The authors introduced prompt to DT for few-shot policy learning.


Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

ICML'22 [Paper] [Code]

This work combines VAE and TT for policy learning in stochastic environment.


Can Wikipedia help offline reinforcement learning?

arXiv [Paper] [Code]

Training transformers on RL datasets from scratch could lead to slow convergence. This paper studies whether it’s possible to transfer knowledge from vision and language domains to offline RL tasks. The authors show that wikipedia pretraining can improve the convergence by 3-6x.


A Generalist Agent

arXiv [Paper]

A transformer-based RL agent (GATO) is trained on multi-modal data to perform robot manipulation, chat, play Atari games, caption images simultaneously. The agent will determine by itself what to output based on its context.


Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

ICLR'22 Generalizable Policy Learning in the Physical World Workshop [Paper]

Applied random masking to pretrain transformers for RL.


Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

ICML'22 [Paper]

This work combines online RL and offline SL. The online phase is used for both RL training and data collection. In the offline phase, only successful trajectories are used for SL. The authors show that this approach performs well in sparse-reward settings. The authors tested DT for the SL phase and found that it was brittle and performed worse than a simple BC. This result show that the DT training stability requires more research.


Deep Reinforcement Learning with Swin Transformer

arXiv [Paper]

This paper studies replacing the convolutional neural networks used in online RL with Swin Transformer and show that it leads to better performance.


Efficient Planning in a Compact Latent Action Space

arXiv [Paper]

This work combines VQ-VAE with TT to allow efficient planning in the latent space.


Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

arXiv [Paper]

A new transformer architecture is proposed and experiments on RL show large improvement over LSTM in several Atari games.


GPT-critic: offline reinforcement learning for end-to-end task-oriented dialogue systems

ICLR'22 [Paper]

GPT-2 trained in an offline RL manner for dialogue generation.


Offline pre-trained multi-agent decision transformer: one big sequence model tackles all smac tasks

arXiv [Paper]

The authors studies offline pre-training and online finetuning in the MARL setting. The authors show that offline pretraining significantly improves sample efficiency.


Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

arXiv [Paper]

The original DT completely requires on supervised learning to learn a value-conditioned behavior policy. By increasing the condition value, DT could obtain greater returns than the maximum return in the offline dataset. However, the RCSL framework does not tap into trajectory stitching, i.e., combining sub-trajectories of multiple sub-optimal trajectories to obtain an optimal trajectory. In this paper, the authors combine Q-learning and DT. The estimated Q-values are used to relabel the return-to-gos in the training data.


StARformer: Transformer with State-Action-Reward Representations for Robot Learning

arXiv [Paper]

Proposed a transformer architecture for robot learning representations.


Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning

arXiv [Paper]

Multi-task offline RL problems. The value function is modeled as a distribution.


Transfer learning with causal counterfactual reasoning in Decision Transformers

arXiv [Paper]

The authors leverage the casual knowledge of a source environment's structure to generate a set of counterfactual environments to improve the agent's adaptability in new environments.


Transformers are Adaptable Task Planners

arXiv [Paper]

Prompt-based task planning.


Transformers are Meta-Reinforcement Learners

arXiv [Paper]

Applied transformers for meta-RL.


Transformers are Sample Efficient World Models

arXiv [Paper] [Code]

With the goal of improving sample efficiency of RL methods, the authors build a transformer model on Atari environments. They borrowed ideas from VQGAN and DALL-E to map raw image pixels to a much smaller amount of image tokens, which are used as the input to autoregressive transformers. After training the transformer world model, the RL agents then learns exclusively from the model imaginations.


Hierarchical Decision Transformer

arXiv [Paper]

The original DT highly depends on a carefully chosen return-to-go as the initial input to condition on. To address this challenge, this work proposed to predict subgoals (or options) to replace the return-to-go. Two transformers are trained together while one is used for predicting the subgoals and the the other one is used to predict actions conditioned on the subgoals. Through experiments on D4RL, the authors show that this hierarchical approach can outperform the original DT, especially in tasks that invovle long episodes.


PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training

arXiv [Paper]

A generative transformer-based architecture for pretraining with robot data in a self-supervised manner.


When does return-conditioned supervised learning work for offline reinforcement learning?

arXiv [Paper]

Focusing on offline reinforcement learning, the authors provide a study on the capabilities and limitations of return-conditioned supervised learning (RCSL). The authors found that RCSL requires assumptions stronger than the dynamic programming to return optimal policies. Specifically, the authors pointed out that RCSL requires nearly deterministic dynamics and proper condition values. The authors claim that RCSL alone is unlikely to be a general solution for offline RL problems. However, it may perform well with high quality behavior data.


Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

OpenReview Submission to ICLR'23 [Paper]

Recurrent neural networks are often used for encoding an agent's history when solving POMDP tasks. This paper proposed to replace the recurrent neural networks with transformers. Results show that transformers can solve POMDP faster and more stably than methods based on recurrent neural networks.


Contextual Transformer for Offline Meta Reinforcement Learning

Foundation Models for Deicsion Making Workshop, NeurIPS'22 [Paper]

This paper proposed an approach for learning context vectors that can be used as prompts for the transformers. With the prompts, the authors developed a contextual meta transformers that can leverage the prompt as the task context to improve he performance on unseen tasks.


MCTransformer: combining transformers and monte-carlo tree search for offline reinforcement learning

OpenReview Submission to ICLR'23 [Paper]

The authors combine transformers and MCTS for efficient online finetuning. MCTS is used as an effective approach to balance exploration and exploitation.


Pretraining the vision transformer using self-supervised methods for vision based deep reinforcement learning

OpenReview Submission to ICLR'23 [Paper]

This work replaces CNNs used in image-based RL agents with pre-trained Vision Transformers. Interestingly, the authors found Vision Transformers still perform similarly or worse than CNNs.


Preference Transformer: Modeling Human Preferences using Transformers for RL

OpenReview Submission to ICLR'23 [Paper]


Skill discovery decision transformer

OpenReview Submission to ICLR'23 [Paper]

This work applies unsupervised skill discovery to DT. The skill embedding is used as an input to the DT. This can be thought as a hierarchical RL approach.


Decision transformer under random frame dropping

OpenReview Submission to ICLR'23 [Paper]


Token turing machines

OpenReview Submission to ICLR'23 [Paper]


SMART: self-supervised multi-task pretraining with control transformers

OpenReview Submission to ICLR'23 [Paper]


Hyper-decision transformer for efficient online policy adaptation

OpenReview Submission to ICLR'23 [Paper]

This work focuses on adapting DT to unseen novel tasks. An adaptation module is added to the DT with its parameters initialized by a hyper-network. When adapting to a new task, only the parameters of the adaptation module is finetuned. The results show that adapting the module leads to faster learning than the


Multi-agent multi-game entity transformer

OpenReview Submission to ICLR'23 [Paper]


Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

arXiv'22 [Paper]

The authors compared ViTs and CNNs in image-based DRL tasks. They found that CNNs still perform better than ViTs.


Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

arXiv'22 [Paper]

The authors apply semi-supervised imitation learning to enable agents to learn to act by watching online unlabeled videos.


Behavior Cloned Transformers are Neurosymbolic Reasoners

arXiv'22 [Paper]


Exploiting Transformer in Reinforcement Learning for Interpretable Temporal Logic Motion Planning

arXiv'22 [Paper]


Transformers for One-Shot Visual Imitation

CoRL'21 [paper]


Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

IJCAI'20 [paper]


IN-CONTEXT REINFORCEMENT LEARNING WITH ALGORITHM DISTILLATION

arXiv'22 [paper]


Other Resources

License

This repo is released under Apache License 2.0.

About

Tracking literature and additional online resources on transformers for sequential decision making including RL and beyond.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published