There are three types of inputs used in DAVO:
Let's check how to prepare them for DAVO.
For KITTI, you need to download the KITTI odometry dataset first,
and unzip them into the $kitti_raw_odom
folder (for example, ./kitti_odom/
). Then run the following command:
# Set folder paths.
export kitti_raw_odom="./kitti_odom/" # make sure kitti_odom/ includes `sequences/' and `poses/'
export kitti_odom_dump="./kitti_odom_dump/"
# Dump the KITTI VO dataset.
python data/preprocess.py --dataset_dir=$kitti_raw_odom --dataset_name='kitti_odom' --dump_root=$kitti_odom_dump --seq_length=3 --img_width=416 --img_height=128 --num_threads=8
For Cityscapes, download the following packages: (1) leftImg8bit_sequence_trainvaltest.zip
(324GB), (2) camera_trainvaltest.zip
(1.9MB), and unzip them into the $cityscapes_raw
folder.
Then run the following command:
# for Cityscapes dataset
python data/preprocess.py --dataset_dir=$cityscapes_raw --dataset_name='cityscapes' --dump_root=$cityscapes_odom_dump --seq_length=3 --img_width=416 --img_height=128 --num_threads=8
Note that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.
In DAVO, we use FlowNet2.0 (CVPR'2017) to generate optical flows for each input frames-pair. Resources:
- Pytorch: https://github.com/NVIDIA/flownet2-pytorch
- Tensorflow: https://github.com/sampepose/flownet2-tf (paper used)
There are two (or four) optical flows should be generated into $kitti_odom_dump/$seq_name/<6-digital-id>-flownet2.npy
:
- target -> source0 (i.e. the optical flow from
000001
to000000
) - target -> source1 (i.e. the optical flow from
000001
to000002
) - source0 -> target (i.e. the optical flow from
000000
to000001
) [optional] - source1 -> target (i.e. the optical flow from
000002
to000001
) [optional]
# ... (loading $kitti_odom_dump/$seq_name/000001.png) ...
im = scipy.misc.imread( "$kitti_odom_dump/$seq_name/000001.png" )
h,w,c = im.shape
src0, tgt, src1 = im[:,:w//3,:], im[:,w//3:w//3*2,:], im[:,w//3*2:,:]
# ... (generate flows) ...
src0_tgt = flownet2.predict(input_a=src0, input_b=tgt)
src1_tgt = flownet2.predict(input_a=src1, input_b=tgt)
tgt_src0 = flownet2.predict(input_a=tgt, input_b=src0)
tgt_src1 = flownet2.predict(input_a=tgt, input_b=src1)
# ... (save in npy file) ...
flows = [
src0_tgt, # dtype=np.float32. shape=(height,width,2)
src1_tgt, # dtype=np.float32. shape=(height,width,2)
tgt_src0, # dtype=np.float32. shape=(height,width,2)
tgt_src1, # dtype=np.float32. shape=(height,width,2)
]
all_flows = np.stack(flows) # shape=(4,height,width,2)
np.save("$kitti_odom_dump/$seq_name/000001-flownet2.npy", all_flows)
In DAVO, we use DeepLab3+ (ECCV'2018) to generate semantic segmentations for each frame.
There are three segmentation label maps should be generated into $kitti_odom_dump/$seq_name/<6-digital-id>-seglabel.npy
:
- source0 segmentation labels (i.e. the label map of
00000
) - target segmentation labels (i.e. the label map of
00001
) - source1 segmentation labels (i.e. the label map of
00002
)
# ... (loading $kitti_odom_dump/$seq_name/000001.png) ...
im = scipy.misc.imread( "$kitti_odom_dump/$seq_name/000001.png" )
h,w,c = im.shape
src0, tgt, src1 = im[:,:w//3,:], im[:,w//3:w//3*2,:], im[:,w//3*2:,:]
# ... (generate segmentations) ...
src0 = deeplab.predict(src0)
tgt = deeplab.predict(tgt)
src1 = deeplab.predict(src1)
# ... (save in npy file) ...
seglabels = [
src0, # dtype=np.float32. shape=(height,width,1)
tgt, # dtype=np.float32. shape=(height,width,1)
src1, # dtype=np.float32. shape=(height,width,1)
]
all_seglabels = np.stack(seglabels) # shape=(3,height,width,1)
np.save("$kitti_odom_dump/$seq_name/000001-seglabel.npy", all_seglabels)