This is the official PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching, presented at International Conference on Machine Learning, 2022.
- Offline Imitation Learning from Mismatched Experts
python smodice_tabular/run_tabular_mismatched.py
- Offline Imitation Learning from Examples
python smodice_tabular/run_tabular_example.py
- Create conda environment and activate it:
conda env create -f environment.yml conda activate smodice pip install --upgrade numpy pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html git clone https://github.com/rail-berkeley/d4rl cd d4rl pip install -e .
- Run the following command with variable
ENV
set to any ofhopper, walker2d, halfcheetah, ant, kitchen
.
python run_oil_observations.py --env_name $ENV
- For the AntMaze environment, first generate the random dataset:
cd envs
python generate_antmaze_random.py --noise
Then, run
python run_oil_antmaze.py
- For
halfcheetah
andant
, run
python run_oil_observations.py --env_name halfcheetah --dataset 0.5 --mismatch True
and
python run_oil_observations.py --env_name ant --dataset disabled --mismatch True
respectively. 2. For AntMaze, run
python run_oil_antmaze.py --mismatch True
- For the PointMass-4Direction task, run
python run_oil_examples_pointmass.py
- For the AntMaze task, run
python run_oil_antmaze.py --mismatch False --example True
- For the Franka Kitchen based tasks, run
python run_oil_examples_kitchen.py --dataset $DATASET
where DATASET
can be one of microwave, kettle
.
For any task, the BC
baseline can be run by appending --disc_type bc
to the above commands.
For RCE-TD3-BC
and ORIL
baselines, on the appropriate tasks, append --algo_type $ALGO
where ALGO
can be one of rce, oril
.
If you find this repository useful for your research, please cite
@article{ma2022smodice,
title={SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching},
author={Yecheng Jason Ma and Andrew Shen and Dinesh Jayaraman and Osbert Bastani},
year={2022},
url={https://arxiv.org/abs/2202.02433}
}
If you have any questions regarding the code or paper, feel free to contact me at [email protected].
This codebase is partially adapted from optidice, rce, relay-policy-learning, and d4rl ; We thank the authors and contributors for open-sourcing their code.