Clean implementation of conditional and unconditional behavior transformer. The API is heavily inspired by Lucidrains' implementations, and the implementation is heavily indebted to Andrej Karpathy's implementation of NanoGPT.
git clone [email protected]:notmahi/miniBET.git
cd miniBET
pip install --upgrade .
import torch
from behavior_transformer import BehaviorTransformer, GPT, GPTConfig
conditional = True
obs_dim = 50
act_dim = 8
goal_dim = 50 if conditional else 0
K = 32
T = 16
batch_size = 256
cbet = BehaviorTransformer(
obs_dim=obs_dim,
act_dim=act_dim,
goal_dim=goal_dim,
gpt_model=GPT(
GPTConfig(
block_size=144,
input_dim=obs_dim,
n_layer=6,
n_head=8,
n_embd=256,
)
), # The sequence model to use.
n_clusters=K, # Number of clusters to use for k-means discretization.
kmeans_fit_steps=5, # The k-means discretization is done on the actions seen in the first kmeans_fit_steps.
)
optimizer = cbet.configure_optimizers(
weight_decay=2e-4,
learning_rate=1e-5,
betas=[0.9, 0.999],
)
for i in range(10):
obs_seq = torch.randn(batch_size, T, obs_dim)
goal_seq = torch.randn(batch_size, T, goal_dim)
action_seq = torch.randn(batch_size, T, act_dim)
if i <= 7:
# Training.
train_action, train_loss, train_loss_dict = cbet(obs_seq, goal_seq, action_seq)
else:
# Action inference
eval_action, eval_loss, eval_loss_dict = cbet(obs_seq, goal_seq, None)
If you want to use your own sequence model, you can pass in that model as the gpt_model
argument in the BehaviorTransformer
constructor. The only extra requirement for the sequence model (beyond being a subclass of nn.Module
having the input and output of the right shape) is to have a configure_optimizer
method that takes in the weight_decay
, learning_rate
, and betas
arguments and returns a torch.optim.Optimizer
object.
Try out the example task on the Franka kitchen environment. You will need to install extra requirements that you can find in the examples/requirements-dev.txt
file.
Fill out the details in examples/train.yaml
with the paths of your downloaded dataset from here.
If you have installed all the dependencies and have downloaded the dataset, you can run the example with:
cd examples
python train.py
It should take about 50 minutes to train on a single GPU, and have a final performance of ~3.2 conditioned tasks on average.
If you use this code in your research, please cite the following papers whenever appropriate:
@inproceedings{
shafiullah2022behavior,
title={Behavior Transformers: Cloning $k$ modes with one stone},
author={Nur Muhammad Mahi Shafiullah and Zichen Jeff Cui and Ariuntuya Altanzaya and Lerrel Pinto},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=agTr-vRQsa}
}
@article{cui2022play,
title={From play to policy: Conditional behavior generation from uncurated robot data},
author={Cui, Zichen Jeff and Wang, Yibin and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
journal={arXiv preprint arXiv:2210.10047},
year={2022}
}