This is a Pytorch implementation of the REDQ+AdaptiveBC method proposed in the paper "Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning" by Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, and Joni Pajarinen.
conda env create -f environment.yaml
conda activate adaptive
Note: For d4rl, the MuJoCo license is still needed, you can get the license for free from http://roboti.us/, and copy it into your MuJoCo installation foler. The instruction of the mujoco_py might be useful.
The training includes two stages: pretraining on the d4rl dataset and finetuning on the corresponding task. The run the experiment:
python3 main.py --env=<TASK_NAME> --seed=<SEED>
We use wandb for logging. Please check the documentation for details.