This work is based on the work from "Cross Modal Transformer: Towards Fast and Robust 3D Object Detection"
CMT is a transformer-based robust 3D detector for end-to-end 3D multi-modal detection. This model is extended to cooperative perception in CMTCoop to perform deep multi-model multi-view feature fusion for 3D object detection. Through extensive, studies this work shows that the proposed model provides a mAP of 97.3% on multi-modal cooperative fusion (+6.2% increase over vehicular perception) and 96.7% on LiDAR only cooperative perception (CMTCoop-L) which runs at near-real time FPS, and a 2.1% performance gain over the current SoTA, BEVFusionCoop.
Docker provides an easy way to deal with package dependencies. Use the Dockerfile provided to build the image.
docker build . -t cmt-coop
Then run the image with the following command
nvidia-docker run -it --rm \
--ipc=host --gpus all \
-v <Path_to_datasets>:/mnt/datasets \
-v <Path_to_pretrained_models>:/home/pretrained \
--name cmt-coop \
cmt-coop bash
Create an new environment with Anaconda or venv if required
conda create -n cmt-coop
conda activate cmt-coop
Install the following packages
- Python == 3.8
- CUDA == 11.1
- pytorch == 1.9.1
- mmcv-full == 1.6.2
- mmdet == 2.28.2
- mmsegmentation == 0.30.0
- mmdet3d == 1.0.0rc6
- spconv-cu111 == 2.1.21
- flash-attn == 0.2.2
- pypcd
- open3d
Note that the repository was tested on the above versions, but may also work with later versions.
Follow the mmdet3d to process the nuScenes dataset. This is only required to repeat tests on the CMT model.
The dataset links will be released soon.
Download the TUMTraf Dataset Development Kit and follow the instructions to split the TUMTraf intersection dataset into train and val sets.The TUMTraf cooperative dataset is already split into train and val sets.
${Root}
└── datasets
├── tumtraf_intersection_dataset
| └── train
| └── val
└── tumtraf_cooperative_dataset
└── train
└── val
Finally ensure that the dataset folder has been soft linked to the CMTCoop/data
folder.
ln -s /path_to_data_folder CMTCoop/data
The TUMTraf dataset must be converted from Openlabel format to be compatible with mmdet3D framework
Run this script for data preparation:
python ./tools/create_data.py a9_nusc \\
--root-path /home/CMTCoop/data/tumtraf_intersection_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_intersection_processed \\
--splits training,validation
After data preparation, you will be able to see the following directory structure:
├── data
│ ├── tumtraf_intersection_dataset
| | ├── train
| | ├── val
| ├── tumtraf_intersection_processed
│ │ ├── a9_nusc_gt_database
| | ├── train
| | ├── val
│ │ ├── a9_nusc_infos_train.pkl
│ │ ├── a9_nusc_infos_val.pkl
│ │ ├── a9_nusc_dbinfos_train.pkl
Run this script for data preparation:
python ./tools/create_data.py a9coop_nusc \\
--root-path /home/CMTCoop/data/tumtraf_cooperative_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_cooperative_processed \\
--splits training,validation
After data preparation, you will be able to see the following directory structure:
├── data
│ ├── tumtraf_cooperative_dataset
| | ├── train
| | ├── val
| ├── tumtraf_cooperative_processed
│ │ ├── a9_nusc_coop_gt_database
| | ├── train
| | ├── val
│ │ ├── a9_nusc_coop_infos_train.pkl
│ │ ├── a9_nusc_coop_infos_val.pkl
│ │ ├── a9_nusc_coop_dbinfos_train.pkl
# train
bash tools/dist_train.sh /path_to_your_config 8
# inference
bash tools/dist_test.sh /path_to_your_config /path_to_your_pth 8 --eval bbox
Results on the TUMTraf cooperative validation set. The FPS is evaluated on a single RTX3080 GPU.
Domain | Modality | mAPBEV | mAP3D Easy | mAP3D Mod. | mAP3D Hard | mAP3D Avg. |
---|---|---|---|---|---|---|
Vehicle | Camera | 69.76 | 68.76 | 79.85 | 66.44 | 69.30 |
Vehicle | LiDAR | 88.17 | 87.94 | 88.53 | 71.99 | 84.72 |
Vehicle | Cam+LiDAR | 91.65 | 84.83 | 91.32 | 72.18 | 85.57 |
Infra. | Camera | 71.89 | 70.86 | 80.38 | 58.72 | 71.66 |
Infra. | LiDAR | 94.42 | 91.28 | 95.60 | 77.48 | 91.89 |
Infra. | Camera + LiDAR | 96.09 | 91.94 | 95.15 | 82.35 | 92.16 |
Coop. | Camera | 84.07 | 81.03 | 90.05 | 77.94 | 83.43 |
Coop. | LiDAR | 96.68 | 92.18 | 96.77 | 82.20 | 93.43 |
Coop. | Camera + LiDAR | 97.31 | 93.70 | 96.65 | 79.84 | 94.10 |
Model | FOV | Modality | mAP3D Easy | mAP3D Mod. | mAP3D Hard | mAP3D Avg. |
---|---|---|---|---|---|---|
InfraDet3D | South 1 | LiDAR | 75.81 | 47.66 | 42.16 | 55.21 |
BEVFusionCoop | South 1 | LiDAR | 76.24 | 48.23 | 35.19 | 69.47 |
CMTCoop | South 1 | LiDAR | 80.62 | 64.46 | 50.41 | 72.68 |
InfraDet3D | South 2 | LiDAR | 38.92 | 46.60 | 43.86 | 43.13 |
BEVFusionCoop | South 2 | LiDAR | 74.97 | 55.55 | 39.96 | 69.94 |
CMTCoop | South 2 | LiDAR | 79.34 | 60.81 | 45.53 | 70.31 |
InfraDet3D | South 1 | Camera + LiDAR | 67.08 | 31.38 | 35.17 | 44.55 |
BEVFusionCoop | South 1 | Camera + LiDAR | 75.68 | 45.63 | 45.63 | 66.75 |
CMTCoop | South 1 | Cam+LiDAR | 80.86 | 61.37 | 45.32 | 70.65 |
InfraDet3D | South 2 | Camera + LiDAR | 58.38 | 19.73 | 33.08 | 37.06 |
BEVFusionCoop | South 2 | Camera + LiDAR | 74.73 | 53.46 | 41.96 | 66.89 |
CMTCoop | South 2 | Cam+LiDAR | 78.92 | 52.67 | 39.76 | 67.21 |
Performance of Vehicular only model (CMT) from infrastructure perspective (left) and vehicular perspective (right)
Performance of Cooperative model (CMTCoop - left) vs. Vehicular only model (CMT - right) from infrastructure perspective.
Refer the following links for other resources related to this project:
Please consider citing the original work on CMT if you find this work helpful.