CoopDet3D: Deep Multi-Modal Cooperative 3D Object Detection of Traffic Participants Using Onboard and Roadside Sensors
Cooperative perception offers several benefits for enhancing the capabilities of autonomous vehicles and improving road safety. Using roadside sensors in addition to onboard sensors increases reliability and extends the sensor range. External sensors offer higher situational awareness for automated vehicles and prevent occlusions. We propose CoopDet3D, a cooperative multi-modal fusion model, and TUMTraf-V2X, a dataset for the cooperative 3D object detection and tracking task. Our dataset contains 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors. It includes 30k 3D boxes with track IDs and precise GPS and IMU data. We labeled nine categories and covered occlusion scenarios with challenging driving maneuvers, like traffic violations, near-miss events, overtaking, and U-turns. Through multiple experiments, we show that our CoopDet3D camera-LiDAR fusion model achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. Finally, we make our dataset, model, labeling tool, and dev-kit publicly available: https://tum-traffic-dataset.github.io/tumtraf-v2x.
- 2024/02: 🏆 Accepted paper at CVPR'24 conference: TUMTraf V2X Cooperative Perception Dataset
- 2023/11: First release of the CoopDet3D model (v1.0.0)
- Support vehicle-only, infrastructure-only, and cooperative modes
- Vehicle-only
- Infrastructure-only
- Cooperative
- Support camera-only, LiDAR-only, and camera-LiDAR fusion
- Camera-only
- LiDAR-only
- Camera-LiDAR fusion
- Support multiple camera backbones
- SwinT
- YOLOv8
- Support multiple LiDAR backbones
- VoxelNet (torchsparse)
- PointPillars
- Support offline, ROS, and shared memory operation
- Offline
- ROS
- Shared memory
- Live Test
- Export inference results to OpenLABEL format
- Inference to OpenLABEL
-
There are two versions of the TUMTraf V2X Cooperative Perception Dataset (Release R4) provided:
1.1. TUMTraf-V2X
1.2. TUMTraf-V2X-mini (half of the full dataset)
We train CoopDet3D on TUMTraf-V2X-mini and provide the results below.
Simply place the splits in a directory named tumtraf_v2x_cooperative_perception_dataset
in the data
directory and you should have a structure similar to this:
coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── tumtraf_v2x_cooperative_perception_dataset
| | ├── train
| | ├── val
-
The TUMTraf Intersection Dataset (Release R2) can be downloaded below:
2.1 TUMTraf-I.
Then, download the TUMTraf Dataset Development Kit and follow the steps provided there to split the data into train and val sets.
Finally, place the train and val sets in a directory named tumtraf_i
in the data
directory. You should then have a structure similar to this:
coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── tumtraf_i
| | ├── train
| | ├── val
The pre-trained CoopDet3D weights can be downloaded from here.
The weights for TUMTraf Intersection Dataset are named following this convention:
coopdet3d_tumtraf_i_[l/cl]_<LiDAR_backbone>_<camera_backbone>_<other_information>.pth
The weights for the TUMTraf V2X Cooperative Perception Dataset are named following this convention:
coopdet3d_[v/i/vi]_[c/l/cl]_<LiDAR_backbone>_<camera_backbone>_<other_information>.pth
Extract the files and place them in the weights
directory.
Use these weights to get the best results from the tables below:
- TUMTraf Intersection Dataset: coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth
- TUMTraf V2X Cooperative Perception Dataset: coopdet3d_vi_cl_pointpillars512_2xtestgrid_yolos_transfer_learning_best.pth
The easiest way to deal with prerequisites is to use the Dockerfile included. Make sure that nvidia-docker
is installed on your machine. After that, execute the following command to build the docker image:
cd docker && docker build . -t coopdet3d
The docker can then be run with the following commands:
If you are only using the TUMTraf Intersection Dataset:
nvidia-docker run -it -v `pwd`/../data/tumtraf_i:/home/data/tumtraf_i -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash
If you are only using the TUMTraf V2X Cooperative Perception Dataset:
nvidia-docker run -it -v `pwd`/../data/tumtraf_v2x_cooperative_perception_dataset:/home/data/tumtraf_v2x_cooperative_perception_dataset -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash
If you are using both datasets:
nvidia-docker run -it -v `pwd`/../data/tumtraf_i:/home/data/tumtraf_i -v `pwd`/../data/tumtraf_v2x_cooperative_perception_dataset:/home/data/tumtraf_v2x_cooperative_perception_dataset -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash
It is recommended for users to run data preparation (instructions are available in the next section) outside the docker if possible. Note that the dataset directory should be an absolute path. Inside the docker, run the following command to install the codebase:
cd /home/coopdet3d
python setup.py develop
Finally, you can create a symbolic link /home/coopdet3d/data/tumtraf_i
to /home/data/tumtraf_i
and /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset
to /home/data/tumtraf_v2x_cooperative_perception_dataset
in the docker.
The code is built with following libraries:
- Python >= 3.8, <3.9
- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
- Pillow = 8.4.0 (see here)
- PyTorch >= 1.9, <= 1.10.2
- tqdm
- torchpack
- mmcv = 1.4.0
- mmdetection = 2.20.0
- nuscenes-dev-kit
- Latest versions of numba, torchsparse, pypcd, and Open3D
After installing these dependencies, run this command to install the codebase:
python setup.py develop
Finally, you can create a symbolic link /home/coopdet3d/data/tumtraf_i
to /home/data/tumtraf_i
and /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset
to /home/data/tumtraf_v2x_cooperative_perception_dataset
in the docker.
Run this script for data preparation:
python ./tools/create_tumtraf_data.py --root-path /home/coopdet3d/data/tumtraf_i --out-dir /home/coopdet3d/data/tumtraf_i_processed --splits training,validation
After data preparation, you will be able to see the following directory structure:
coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── tumtraf_i
| | ├── train
| | ├── val
| ├── tumtraf_i_processed
│ │ ├── tumtraf_nusc_gt_database
| | ├── train
| | ├── val
│ │ ├── tumtraf_nusc_infos_train.pkl
│ │ ├── tumtraf_nusc_infos_val.pkl
│ │ ├── tumtraf_nusc_dbinfos_train.pkl
Run this script for data preparation:
python ./tools/create_tumtraf_v2x_data.py --root-path /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset --out-dir /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset_processed --splits training,validation
After data preparation, you will be able to see the following directory structure:
coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── tumtraf_v2x_cooperative_perception_dataset
| | ├── train
| | ├── val
| ├── tumtraf_v2x_cooperative_perception_dataset_processed
│ │ ├── tumtraf_v2x_nusc_gt_database
| | ├── train
| | ├── val
│ │ ├── tumtraf_v2x_nusc_infos_train.pkl
│ │ ├── tumtraf_v2x_nusc_infos_val.pkl
│ │ ├── tumtraf_v2x_nusc_dbinfos_train.pkl
NOTE 1: If you want to use a YOLOv8 .pth
file from MMYOLO, please make sure the keys inside fit with this model. Convert that .pth
checkpoint using this converter: ./tools/convert_yolo_checkpoint.py
.
Note 2: The paths to the pre-trained weights for YOLOv8 models are hardcoded in the config file, so change it there accordingly. This also means that when training models that use YOLOv8, the parameters --model.encoders.camera.backbone.init_cfg.checkpoint
, --model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint
, and --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint
are optional.
Note 3: We trained our model on 3 GPUs (3 x RTX 3090) and used the following prefix for that: torchpack dist-run -np 3
For training a camera-only model on the TUMTraf Intersection Dataset, run:
torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE> --model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH>
Example:
torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/centerhead/lssfpn/camera/256x704/yolov8/default.yaml
For training LiDAR-only model on the TUMTraf Intersection Dataset, run:
torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE>
Example:
torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/transfusion/secfpn/lidar/pointpillars.yaml
For training a fusion model on the TUMTraf Intersection Dataset, run:
torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE> --model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --load_from <PATH_TO_PRETRAINED_LIDAR_PTH>
Example:
torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --load_from weights/coopdet3d_tumtraf_i_l_pointpillars512_2x.pth
For training camera-only model on the TUMTraf V2X Cooperative Perception Dataset, run:
torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE> --model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH>
Use the pretrained camera parameters depending on which type of model you want to train: vehicle-only, camera-only, or cooperative (both).
Example:
torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/centerhead/lssfpn/cooperative/camera/256x704/yolov8/default.yaml
For training LiDAR-only model on the TUMTraf V2X Cooperative Perception Dataset, run:
torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE>
Example:
torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/lidar/pointpillars.yaml
For training fusion model on the TUMTraf V2X Cooperative Perception Dataset, run:
torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE> ---model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --load_from <PATH_TO_PRETRAINED_LIDAR_PTH>
Use the pretrained camera parameters depending on which type of model you want to train: vehicle-only, camera-only, or cooperative (both).
Example:
torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --load_from weights/coopdet3d_vi_l_pointpillars512_2x.pth
Note: please run tools/test.py
or tools/test_coop.py
separately after training to get the final evaluation metrics.
NOTE: This section will not work without the test set ground truth, which is not made public. To evaluate your model's mAPBEV, please send your config files and weights to the authors for evaluation!
For evaluation on the TUMTraf Intersection Dataset, run:
torchpack dist-run -np 1 python tools/test.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --eval bbox
Example:
torchpack dist-run -np 1 python tools/test.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --eval bbox
For evaluation on the TUMTraf V2X Cooperative Perception Dataset, run:
torchpack dist-run -np 1 python tools/test_coop.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --eval bbox
Example:
torchpack dist-run -np 1 python tools/test_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --eval bbox
Exporting to OpenLABEL format is needed to perform mAP3D evaluation or detection visualization using the scripts in the TUM Traffic dev-kit.
NOTE: You will not be evaluate your inference results using the dev-kit without the test set ground truth, which is not made public. To evaluate your model's mAP3D, please send your detection results to the authors for evaluation!
For TUMTraf Intersection Dataset:
torchpack dist-run -np 1 python tools/inference_to_openlabel.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --out-dir <PATH_TO_OPENLABEL_OUTPUT_FOLDER>
Example:
torchpack dist-run -np 1 python tools/inference_to_openlabel.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --out-dir inference
For TUMTraf V2X Cooperative Perception Dataset:
torchpack dist-run -np 1 python scripts/cooperative_multimodal_3d_detection.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_CHECKPOINT_PTH> --split [train, val, test] --input_type hard_drive --save_detections_openlabel --output_folder_path_detections <PATH_TO_OPENLABEL_OUTPUT_FOLDER>
Example:
torchpack dist-run -np 1 python scripts/cooperative_multimodal_3d_detection.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/bevfusion_coop_vi_cl_pointpillars512_2x_yolos.pth --split test --input_type hard_drive --save_detections_openlabel --output_folder_path_detections inference
For TUMTraf Intersection Dataset:
torchpack dist-run -np 1 python tools/benchmark.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --log-interval 50
Example:
torchpack dist-run -np 1 python tools/benchmark.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --log-interval 50
For TUMTraf V2X Cooperative Perception Dataset:
torchpack dist-run -np 1 python tools/benchmark_coop.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --log-interval 10
Example:
torchpack dist-run -np 1 python tools/benchmark_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --log-interval 10
For TUMTraf Intersection Dataset:
torchpack dist-run -np 1 python tools/visualize.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --mode pred --out-dir viz_tumtraf
Example:
torchpack dist-run -np 1 python tools/visualize.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --mode pred --out-dir viz_tumtraf
For TUMTraf V2X Cooperative Perception Dataset:
torchpack dist-run -np 1 python tools/visualize_coop.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --mode pred --out-dir viz_tumtraf
Example:
torchpack dist-run -np 1 python tools/visualize_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --mode pred --out-dir viz_tumtraf
For split, naturally one could also choose "train" or "val". For mode, the other options are "gt" (ground truth) or "combo" (prediction and ground truth).
NOTE: Ground truth visualization on test set will not work since the test set provided is missing the ground truth.
Evaluation Results (mAPBEV and mAP3D ) of CoopDet3D on TUMTraf V2X Cooperative Perception Dataset Test Set in South 2 FOV
Domain | Modality | mAPBEV | mAP3D Easy | mAP3D Mod. | mAP3D Hard | mAP3D Avg. |
---|---|---|---|---|---|---|
Vehicle | Camera | 46.83 | 31.47 | 37.82 | 30.77 | 30.36 |
Vehicle | LiDAR | 85.33 | 85.22 | 76.86 | 69.04 | 80.11 |
Vehicle | Cam+LiDAR | 84.90 | 77.60 | 72.08 | 73,12 | 76.40 |
Infra. | Camera | 61,98 | 31.19 | 46.73 | 40.42 | 35.04 |
Infra. | LiDAR | 92.86 | 86.17 | 88.07 | 75.73 | 84.88 |
Infra. | Camera + LiDAR | 92.92 | 87.99 | 89.09 | 81.69 | 87.01 |
Coop. | Camera | 68.94 | 45.41 | 42.76 | 57.83 | 45.74 |
Coop. | LiDAR | 93.93 | 92.63 | 78.06 | 73.95 | 85.86 |
Coop. | Camera + LiDAR | 94.22 | 93.42 | 88.17 | 79.94 | 90.76 |
Evaluation Results of Infrastructure-only CoopDet3D vs. InfraDet3D on TUMTraf Intersection Dataset Test Set
Model | FOV | Modality | mAP3D Easy | mAP3D Mod. | mAP3D Hard | mAP3D Avg. |
---|---|---|---|---|---|---|
InfraDet3D | South 1 | LiDAR | 75.81 | 47.66 | 42.16 | 55.21 |
CoopDet3D | South 1 | LiDAR | 76.24 | 48.23 | 35.19 | 69.47 |
InfraDet3D | South 2 | LiDAR | 38.92 | 46.60 | 43.86 | 43.13 |
CoopDet3D | South 2 | LiDAR | 74.97 | 55.55 | 39.96 | 69.94 |
InfraDet3D | South 1 | Camera + LiDAR | 67.08 | 31.38 | 35.17 | 44.55 |
CoopDet3D | South 1 | Camera + LiDAR | 75.68 | 45.63 | 45.63 | 66.75 |
InfraDet3D | South 2 | Camera + LiDAR | 58.38 | 19.73 | 33.08 | 37.06 |
CoopDet3D | South 2 | Camera + LiDAR | 74.73 | 53.46 | 41.96 | 66.89 |
The codebase is built upon BEVFusion with vehicle-infrastructure fusion inspired by the method proposed in PillarGrid.
@inproceedings{zimmer2024tumtrafv2x,
title={TUMTraf V2X Cooperative Perception Dataset},
author={Zimmer, Walter and Wardana, Gerhard Arya and Sritharan, Suren and Zhou, Xingcheng and Song, Rui and Knoll, Alois C.},
publisher={IEEE/CVF},
booktitle={2024 IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
- The CoopDet3D model is released under MIT license as found in the license file.
- The TUM Traffic Dataset (
TUMTraf
) dataset itself is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). By downloading the dataset you agree to the terms of this license.