Source code to accompany Off Policy Adversarial Inverse Reinforcement Learning.
If you use this code for your research, please consider citing the paper:
@article{arnob2020off,
title={Off-Policy Adversarial Inverse Reinforcement Learning},
author={Arnob, Samin Yeasar},
journal={arXiv preprint arXiv:2005.01138},
year={2020}
}
*`inverse_rl (from :https://github.com/justinjfu/inverse_rl)
* rllab
* sandbox`
* rllab (https://github.com/openai/rllab)
* PyTorch
* Python 2
* mjpro131
* pip install mujoco-py==0.5.7
* PyTorch
* Python 3
* mujoco-py==1.50.1.68
python Train.py --seed 0 \
--env_name "HalfCheetah-v2" \
--learn_temperature \
--policy_name "SAC"
Description of different arguments are following:
- Enviroment options: `
- OpenAI gym:
HalfCheetah-v2, Ant-v2, Hopper-v2, Walker2d-v2
- Custom environments
CustomAnt-v0, PointMazeLeft-v0
- OpenAI gym:
- learn_temperature:
- allows the temperature parameter of SAC to be a learning parameter
- Policy options
SAC
,SAC_MCP
(k=8 premitive policies),SAC_MCP2
(k=4 premitive policies)
Transfer learning experiment is computed on Custom environment from (https://github.com/justinjfu/inverse_rl/tree/master/inverse_rl)
python ReTrain.py --seed 0
--env_name "DisabledAnt-v0" \
--learn_temperature \
--policy_name "SAC" \
--initial_state "random" \
--initial_runs "policy_sample"\
--load_gating_func\
--learn_actor
Description of different arguments are following:
-
Enviroment options: `
- Custom environments
DisabledAnt-v0, PointMazeRight-v0
- Custom environments
-
learn_temperature:
- allows the temperature parameter of SAC to be a learning parameter
-
Policy options
SAC
,SAC_MCP
(k=8 premitive policies),SAC_MCP2
(k=4 premitive policies) -
initial_state
-
zero
environment starts from same state -
random
environment starts from random states--initial_runs "policy_sample"\
-
-
load_gating_func
- applicable only for
SAC_MCP
andSAC_MCP2
- if flagged, loads
gating function
from imitation training - if not flagged, random initialization of the
gating function
- applicable only for
-
learn_actor
- applicable only for
SAC_MCP
andSAC_MCP2
- if flagged, retrains
policy
andgating function
- if not flagged, retrain only
gating function
- applicable only for