Dynamic attention-based visual odometry framework (DAVO) is a learning-based VO method, for estimating the ego-motion of a monocular camera. DAVO dynamically adjusts the attention weights on different semantic categories for different motion scenarios based on optical flow maps. These weighted semantic categories can then be used to generate attention maps that highlight the relative importance of different semantic regions in input frames for pose estimation. In order to examine the proposed DAVO, we perform a number of experiments on the KITTI Visual Odometry and SLAM benchmark suite to quantitatively and qualitatively inspect the impacts of the dynamically adjusted weights on the accuracy of the evaluated trajectories. Moreover, we design a set of ablation analyses to justify each of our design choices, and validate the effectiveness as well as the advantages of DAVO. Our experiments on the KITTI dataset shows that the proposed DAVO framework does provide satisfactory performance in ego-motion estimation, and is able deliver competitive performance when compared to the contemporary VO methods.
This codebase is tested on Ubuntu 16.04 with Tensorflow 1.13.1 and CUDA 10.0 (w/ cuDNN 7.5).
git clone https://github.com/BassyKuo/DAVO.git
Make sure the python version you used is python3.6.
python -V # should be python 3.6
Install necessary packages from the requirement file.
pip install -r requirements.txt
If you do not have CUDA 10.0, please download CUDA 10.0 toolkit from the official website or here, and set the path where you installed to $CUDA_HOME
.
export CUDA_HOME="<your_cuda_path>"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
Use following scripts to quickly start while you have already setup the environment and the dataset.
version="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"
# Train the model w/ flipping images.
./run_training.sh ${version}
# Train the model w/ augmented images (including flipping, brightness, contrast, saturation, hue).
./run_training.sh ${version} --data_aug
➡️ Check here to see more available ${version}
names.
export ckpt_dir="ckpt_dir/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh/"
export version="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"
export seq_name="03"
export ckpt_step="1500000"
export output_root="test_DAVO"
# Run the script to generate predicted poses of sequence ${seq_name}.
./run_inference.sh ${ckpt_dir} ${version} ${seq_name} ${ckpt_step} ${output_root}
# The result would be saved in ${output_root}/${version}--model-${ckpt_step}
➡️ Download the pretrained model from our google drive and set "pretrain-ckpt" to $ckpt_dir
for quickly.
cd kitti_benchmark/
export test_output_dir="../${output_root}/${version}--model-${ckpt_step}"
export save_name="${version}--model-${ckpt_step}"
# Use pose_kitti_eval.sh to run the KITTI Benchmark.
./pose_kitti_eval.sh ${test_output_dir} ${save_name}
➡️ Check here to see more information.
There are three types of inputs used in DAVO:
Please check here to see how to prepare them for DAVO.
Once the data are formatted following the above instructions, you are able to train the model with the following command:
python train.py \
--dataset_dir=$kitti_odom_dump \
--img_width=416 \
--img_height=128 \
--batch_size=4 \
--seq_length=3 \
--max_steps=310000 \
--save_freq=25000 \
--learning_rate=0.001 \
--pose_weight=0.1 \
--checkpoint_dir=./ckpt/${version} \
--version=${version}
python train.py \
--dataset_dir=./kitti_odom-dump/ --img_width=416 --img_height=128 --batch_size=4 \
--seq_length=3 --max_steps=310000 --save_freq=25000 --learning_rate=0.001 --pose_weight=0.1 \
--checkpoint_dir=./ckpt/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh \
--version=v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh
Note that all available ${version}
names are defined here.
To evaluate the pose estimation performance in this work, use the following command to produce esitimated poses first:
python test_kitti_pose.py \
--test_seq=$seq \
--concat_img_dir=./kitti_odom-dump/ \
--ckpt_file=${ckpt_file} \
--version=${version} \
--output_dir=${output_dir}
export VESION="v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh"
export MODEL_NAME="$VERSION"
python test_kitti_pose.py \
--test_seq=3 \
--concat_img_dir=./kitti_odom-dump/ \
--ckpt_file=./ckpt/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh/model-10 \
--version=$VERSION
--output_dir=./test_DAVO/$SAVE_NAME
Then copy the prediction file to the kitti_benchmark/results/$MDOEL_NAME/data/
folder and execute ./test_odometry_all $MODEL_NAME
. Please check here to see how to do that, or use the quickscript:
cd kitti_benchmark/
./pose_kitti_eval.sh ../test_DAVO/$SAVE_NAME $SAVE_NAME
➡️ Check here to see the evaluation results.
In our work, we use evo tool to visualize trajetories with references sequences 00 to 10:
for seq in {00..10..1} ; do
evo_traj kitti \
--ref kitti_benchmark/data/odometry/poses/${seq}.txt \
./test_DAVO/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh--model-10/${seq}-pred_kitti_pose.txt \
-p --plot_mode xz --save_plot plots/${figure_name}.png
done
Because of the lack of ground truth poses in the testing sequence 11 to 21, we use trajetories generated from ORB-SLAM2-S (ORB-SLAM2 stereo version) to compare our prediction:
for seq in {11..21..1} ; do
evo_traj kitti \
--ref kitti_benchmark/data/odometry/poses_from_ORBSLAM2-S/${seq}-ORB-SLAM2-S.txt \
./test_DAVO/v1-decay100k-sharedNN-dilatedPoseNN-cnv6_128-segmask_all-se_flow-abs_flow-fc_tanh--model-10/${seq}-pred_kitti_pose.txt \
-p --plot_mode xz --save_plot plots/${figure_name}.png
done
Before you use the --save_plot
argument to save in png file, please change the export format first:
evo_config set plot_export_format png
you could also change trajetory colors:
evo_config set plot_seaborn_palette Dark2
Please feel free to contact us if you have any questions. 😄
We appreciate the great works/repos along this direction, such as SfMLearner, GeoNet, DeepMatchVO and also the evaluation tools such as KITTI VO/SLAM devkit and evo for KITTI full sequence evaluation.