VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training (ICLR 2023, Spotlight)
Jason Yecheng Ma12, Shagun Sodhani1 Dinesh Jayaraman2, Osbert Bastani2, {Vikash Kumar*1, Amy Zhang*1}
1Meta AI, 2University of Pennsylvania
This is the official repository for VIP, a self-supervised zero-shot visual reward and representation for downstream unseen robot tasks. This repository contains examples for using the pre-trained VIP model as well as training VIP from scratch using any custom video dataset.
Create a conda environment where the packages will be installed.
conda create --name vip python=3.9
conda activate vip
Then, in the root directory of this repository, run:
pip install -e .
To load the VIP model pre-trained on Ego4D, simply do:
from vip import load_vip
vip = load_vip()
vip.eval()
Example code to use the released VIP representation is located here.
We have also included an example for generating embedding distance curves as in our paper using our real-robot demonstrations. You can try it here:
cd vip/examples
python plot_reward_curves.py
This should generate the following plots in vip/examples/embedding_curves/
:
We also include an example for generating animated embedding distance curves for VIP and other models on robot videos from three different domains. You can try it here:
cd vip/examples
python plot_reward_curves_video.py
This should generate the following plots (and more!) in vip/examples/embedding_curves/
:
In addition to this official repository, VIP has also been incorporated into TorchRL as an out-of-box visual representation for any Gym environment. After you install TorchRL, using VIP is as simple as:
from torchrl.envs.transforms import VIPTransform
env = TransformedEnv(my_env, VIPTransform(keys_in=["next_pixels"], download=True)
Our codebase supports training VIP on both the Ego4D dataset that was used in pre-training our released VIP model as well as any custom video dataset. The video dataset directory should use the following structure:
my_dataset_path/
video0/
0.png
1.png
...
video1/
video2/
...
Then, you can train VIP on your dataset by running:
python train_vip.py --config-name=config_vip dataset=my_dataset_name datapath=my_dataset_path
For Ego4D or equivalent large-scale pre-training, we suggest using config config_vip_ego4d.yaml
(the config for the released VIP model):
python train_vip.py --config-name=config_vip_ego4d dataset=ego4d datapath=ego4d_dataset_path
The source code in this repository is licensed under the CC BY-NC 4.0 License.
If you find this repository or paper useful for your research, please cite
@article{ma2022vip,
title={VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training},
author={Ma, Yecheng Jason and Sodhani, Shagun and Jayaraman, Dinesh and Bastani, Osbert and Kumar, Vikash and Zhang, Amy},
journal={arXiv preprint arXiv:2210.00030},
year={2022}
}
Parts of this code are adapted from the R3M codebase.