TL;DR: An online imitation learning algorithm with better convergence behavior than Generative Adversarial Imitation Learning using a simple modification.
Implementation based on paper Non-Adversarial Imitation Learning and its Connections to Adversarial Methods, Arenz & Neumann, 2020. The official implementation can be found here.
The training loop follows Discriminator Actor-Critic and uses Soft Actor-Critic as the reinforcement learning algorithm.
For a short note on the algorithm and implementation see here.
We perform experiments in the discrete-action mountain car environment. We compute the demonstration policy by discretizing the state space and computing the optimal soft value function in close form. Only a critic is needed in the imitation learning algorithm.
To generate demonstrations, run:
python ./scripts/create_demonstrations.py
To train the NAIL policy, run:
sh ./scripts/train_agent.sh
You can modify the -algo
argument in the .sh
file to train AIL policy.
To test trained agents, run:
python ./scripts/test_agent.py --exp_name "your_experiment_name"