This is the repository for the paper "Benchmarking Reinforcement Learning Techniques for Autonomous Navigation".
The results shown in the paper use Condor Cluster to distribute 100 actors for collecting trajectories. This setting can greatly speed up the training and make it feasible to finish all the experiments presented in the paper, however Condor Cluster is relatively inaccessible to most users. Instead, to guarantee reproducibility, we provide this version of repository that distributes the actors over 10 Singularity containers that can run locally on a single machine.
- Clone this repository
git clone https://github.com/Daffan/ros_jackal.git
cd ros_jackal
- In your virtual environment, install the python dependencies:
pip install -r requirements.txt
-
Follow this instruction to install Singularity: https://docs.sylabs.io/guides/latest/admin-guide/installation.html#installation-on-linux. Singularity version >= 3.6.3 is required to build the image.
-
(Only do following step if you really need!) The code does not require ROS installation, since the rollout happens in the container, but if you have need to develop based on our repo, running ROS and Gazebo simulation out of the container enables GUI and is easier to debug. Follow steps below to install ROS dependencies (assume
melodic
ROS installed already):
- Create ROS workspace
mkdir -p /<YOUR_HOME_DIR>/jackal_ws/src
cd /<YOUR_HOME_DIR>/jackal_ws/src
- Clone this repo and required ros packages
git clone https://github.com/Daffan/ros_jackal.git
git clone https://github.com/jackal/jackal.git --branch melodic-devel
git clone https://github.com/jackal/jackal_simulator.git --branch melodic-devel
git clone https://github.com/jackal/jackal_desktop.git --branch melodic-devel
git clone https://github.com/utexas-bwi/eband_local_planner.git
- Install ROS package dependencies
cd ..
source /opt/ros/melodic/setup.bash
rosdep init; rosdep update
rosdep install -y --from-paths . --ignore-src --rosdistro=melodic
- Build the workspace
catkin_make
source devel/setup.bash
- Verify your installation: (this script will run open-ai gym environment for 5 episodes)
Pull image file (modify the <FOLDER_PATH_TO_SAVE_IMAGE> in the command, image file size ~ 3G
singularity pull --name <PATH_TO_THIS_REPO>/local_buffer/image:latest.sif library://zifanxu/ros_jackal_image/image:latest
./singularity_run.sh <PATH_TO_THIS_REPO>/local_buffer/nav_benchmark.sif python3 test_env.py
To train a navigation policy, you just need to specify a .yaml
file that includes the parameters for specific experiment. For instance,
python train.py --config configs/e2e_default_TD3.yaml
We provide the full list of .yaml
files used in our experiment in the end.
This repo saves the collected trajectories from each actor in a local buffer folder, also actors load the recent policy from this folder. By default, buffer folder is a folder named local_buffer
in current dictionary. You can specify a new folder as export BUFFER_FOLDER=/PATH/TO/YOUR/BUFFER_FOLDER
. The logging files can be found under folder logging
.
Cluster requires a shared file system, where multiple actors load the lastest policy, rollout, and save the trajectory in the BUFFER_FOLDER
. Then, a critic collects trajectories from BUFFER_FOLDER
and updates the policy.
This is asyncronized training pipeline, namely the actors might fall behind and do not generate trajectories from the latest policy.
- Download the Singularity image
singularity pull --name <PATH/TO/IMAGE>/image:latest.sif library://zifanxu/ros_jackal_image/image:latest
- On critic computing node
export BUFFER_PATH=<BUFFER_PATH>
./singularity_run.sh <PATH/TO/IMAGE>/image:latest.sif python train.py --config configs/e2e_default_TD3_cluster.yaml
- On actor computing node 0 (you need to run
0-50
computing nodes as defined in line 60 incontainer_config.yaml
).
export BUFFER_PATH=<BUFFER_PATH>
./singularity_run.sh <PATH/TO/IMAGE>/image:latest.sif python actor.py --id 0
Success rate of policies trained with different neural network architectures and history lengths in static (top) and dynamic-wall (bottom) environments.
Static | |||
---|---|---|---|
History length | 1 | 4 | 8 |
MLP | 65 ± 4% | 57 ± 7% | 42 ± 2% |
GRU | - | 51 ± 2% | 43 ± 4% |
CNN | - | 55 ± 4% | 45 ± 5% |
Transformer | - | 68 ± 2% | 46 ± 3% |
Dynamic box | |||
---|---|---|---|
History length | 1 | 4 | 8 |
MLP | 50 ± 5% | 35 ± 2% | 46 ± 3% |
GRU | - | 48 ± 4% | 45 ± 1% |
CNN | - | 42 ± 5% | 40 ± 1% |
Transformer | - | 52 ± 1% | 44 ± 4% |
Dynamic wall | |||
---|---|---|---|
History length | 1 | 4 | 8 |
MLP | 67 ± 7% | 72 ± 1% | 69 ± 4% |
GRU | - | 82 ± 4% | 78 ± 5% |
CNN | - | 63 ± 3% | 43 ± 3% |
Transformer | - | 33 ± 28% | 15 ± 13% |
Success rate, survival time and traversal time of policies trained with different safe-RL methods, MPC with probabilistic transition model and DWA.
Safe-RL method | MLP | Lagrangian | MPC | DWA |
---|---|---|---|---|
Success rate | 65 ± 4% | 74 ± 2% | 70 ± 3% | 43% |
Survival time | 8.0 ± 1.5s | 16.2 ± 2.5s | 55.7 ± 4.9s | 88.6s |
Traversal time | 7.5 ± 0.3s | 8.6 ± 0.2s | 24.7 ± 2.0s | 38.5s |
Success rate of policies trained with different model-based methods and different number of transition samples
Transition samples | 100k | 500k | 2000k |
---|---|---|---|
MLP | 13 ± 7% | 58 ± 2% | 65 ± 4% |
Dyna-style deterministic | 8 ± 2% | 30 ± 10% | 66 ± 5% |
MPC deterministic | 0 ± 0% | 21 ± 10% | 62 ± 3% |
Dyna-style probabilistic | 0 ± 0% | 48 ± 4% | 70 ± 1% |
MPC probabilistic | 0 ± 0% | 45 ± 4% | 70 ± 3% |
Success rate of policies trained with different number of training environments
Environments | 5 | 10 | 50 | 100 | 250 |
---|---|---|---|---|---|
Success rate | 43 ± 3% | 54 ± 8% | 65 ± 4% | 72 ± 6% | 74 ± 2 % |
(See below for all the config files used to reproduce the experiments)
└─configs
│ └─safe_rl
│ │ └─mpc.yaml
│ │ └─mlp.yaml
│ │ └─lagrangian.yaml
│ └─architecture_static
│ │ └─mlp_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─cnn_history_length_1.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─architecture_dynamic_wall
│ │ └─cnn_history_length_1.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─mlp_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─architecture_dynamic_box
│ │ └─cnn_history_length_1.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─mlp_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─model_based
│ │ └─dyna.yaml
│ │ └─mpc.yaml
│ └─generalization
│ │ └─num_world_50.yaml
│ │ └─num_world_5.yaml
│ │ └─num_world_10.yaml
│ │ └─num_world_100.yaml
│ │ └─num_world_250.yamlconfigs
│ └─safe_rl
│ │ └─mpc.yaml
│ │ └─mlp.yaml
│ │ └─lagrangian.yaml
│ └─architecture_static
│ │ └─mlp_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─cnn_history_length_1.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─architecture_dynamic_wall
│ │ └─cnn_history_length_1.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─mlp_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─architecture_dynamic_box
│ │ └─cnn_history_length_1.yaml
│ │ └─cnn_history_length_4.yaml
│ │ └─cnn_history_length_8.yaml
│ │ └─mlp_history_length_1.yaml
│ │ └─mlp_history_length_4.yaml
│ │ └─mlp_history_length_8.yaml
│ │ └─rnn_history_length_1.yaml
│ │ └─rnn_history_length_4.yaml
│ │ └─rnn_history_length_8.yaml
│ │ └─transformer_history_length_1.yaml
│ │ └─transformer_history_length_4.yaml
│ │ └─transformer_history_length_8.yaml
│ └─model_based
│ │ └─dyna.yaml
│ │ └─mpc.yaml
│ └─generalization
│ │ └─num_world_50.yaml
│ │ └─num_world_5.yaml
│ │ └─num_world_10.yaml
│ │ └─num_world_100.yaml
│ │ └─num_world_250.yaml