CoopDet3D: Deep Multi-Modal Cooperative 3D Object Detection of Traffic Participants Using Onboard and Roadside Sensors

Abstract

Cooperative perception offers several benefits for enhancing the capabilities of autonomous vehicles and improving road safety. Using roadside sensors in addition to onboard sensors increases reliability and extends the sensor range. External sensors offer higher situational awareness for automated vehicles and prevent occlusions. We propose CoopDet3D, a cooperative multi-modal fusion model, and TUMTraf-V2X, a dataset for the cooperative 3D object detection and tracking task. Our dataset contains 2,000 labeled point clouds and 5,000 labeled images from five roadside and four onboard sensors. It includes 30k 3D boxes with track IDs and precise GPS and IMU data. We labeled nine categories and covered occlusion scenarios with challenging driving maneuvers, like traffic violations, near-miss events, overtaking, and U-turns. Through multiple experiments, we show that our CoopDet3D camera-LiDAR fusion model achieves an increase of +14.36 3D mAP compared to a vehicle camera-LiDAR fusion model. Finally, we make our dataset, model, labeling tool, and dev-kit publicly available: https://tum-traffic-dataset.github.io/tumtraf-v2x.

News 📢

2024/02: 🏆 Accepted paper at CVPR'24 conference: TUMTraf V2X Cooperative Perception Dataset
2023/11: First release of the CoopDet3D model (v1.0.0)

Features 🔥

Support vehicle-only, infrastructure-only, and cooperative modes
- Vehicle-only
- Infrastructure-only
- Cooperative
Support camera-only, LiDAR-only, and camera-LiDAR fusion
- Camera-only
- LiDAR-only
- Camera-LiDAR fusion
Support multiple camera backbones
- SwinT
- YOLOv8
Support multiple LiDAR backbones
- VoxelNet (torchsparse)
- PointPillars
Support offline, ROS, and shared memory operation
- Offline
- ROS
- Shared memory
- Live Test
Export inference results to OpenLABEL format
- Inference to OpenLABEL

Dataset Download 📂

There are two versions of the TUMTraf V2X Cooperative Perception Dataset (Release R4) provided:

1.1. TUMTraf-V2X

1.2. TUMTraf-V2X-mini (half of the full dataset)

We train CoopDet3D on TUMTraf-V2X-mini and provide the results below.

Simply place the splits in a directory named tumtraf_v2x_cooperative_perception_dataset in the data directory and you should have a structure similar to this:

coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── tumtraf_v2x_cooperative_perception_dataset
|   |   ├── train
|   |   ├── val

The TUMTraf Intersection Dataset (Release R2) can be downloaded below:

2.1 TUMTraf-I.

Then, download the TUMTraf Dataset Development Kit and follow the steps provided there to split the data into train and val sets.

Finally, place the train and val sets in a directory named tumtraf_i in the data directory. You should then have a structure similar to this:

coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── tumtraf_i
|   |   ├── train
|   |   ├── val

Pretrained Weights 🚀

The pre-trained CoopDet3D weights can be downloaded from here.

The weights for TUMTraf Intersection Dataset are named following this convention:

coopdet3d_tumtraf_i_[l/cl]_<LiDAR_backbone>_<camera_backbone>_<other_information>.pth

The weights for the TUMTraf V2X Cooperative Perception Dataset are named following this convention:

coopdet3d_[v/i/vi]_[c/l/cl]_<LiDAR_backbone>_<camera_backbone>_<other_information>.pth

Extract the files and place them in the weights directory.

Use these weights to get the best results from the tables below:

TUMTraf Intersection Dataset: coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth
TUMTraf V2X Cooperative Perception Dataset: coopdet3d_vi_cl_pointpillars512_2xtestgrid_yolos_transfer_learning_best.pth

Usage 🌟

Working with Docker

The easiest way to deal with prerequisites is to use the Dockerfile included. Make sure that nvidia-docker is installed on your machine. After that, execute the following command to build the docker image:

cd docker && docker build . -t coopdet3d

The docker can then be run with the following commands:

If you are only using the TUMTraf Intersection Dataset:

nvidia-docker run -it -v `pwd`/../data/tumtraf_i:/home/data/tumtraf_i -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash

If you are only using the TUMTraf V2X Cooperative Perception Dataset:

nvidia-docker run -it -v `pwd`/../data/tumtraf_v2x_cooperative_perception_dataset:/home/data/tumtraf_v2x_cooperative_perception_dataset -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash

If you are using both datasets:

nvidia-docker run -it  -v `pwd`/../data/tumtraf_i:/home/data/tumtraf_i -v `pwd`/../data/tumtraf_v2x_cooperative_perception_dataset:/home/data/tumtraf_v2x_cooperative_perception_dataset -v <PATH_TO_COOPDET3D>:/home/coopdet3d --shm-size 16g coopdet3d /bin/bash

It is recommended for users to run data preparation (instructions are available in the next section) outside the docker if possible. Note that the dataset directory should be an absolute path. Inside the docker, run the following command to install the codebase:

cd /home/coopdet3d
python setup.py develop

Finally, you can create a symbolic link /home/coopdet3d/data/tumtraf_i to /home/data/tumtraf_i and /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset to /home/data/tumtraf_v2x_cooperative_perception_dataset in the docker.

Working without Docker

The code is built with following libraries:

Python >= 3.8, <3.9
OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
Pillow = 8.4.0 (see here)
PyTorch >= 1.9, <= 1.10.2
tqdm
torchpack
mmcv = 1.4.0
mmdetection = 2.20.0
nuscenes-dev-kit
Latest versions of numba, torchsparse, pypcd, and Open3D

After installing these dependencies, run this command to install the codebase:

python setup.py develop

Finally, you can create a symbolic link /home/coopdet3d/data/tumtraf_i to /home/data/tumtraf_i and /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset to /home/data/tumtraf_v2x_cooperative_perception_dataset in the docker.

Data Preparation

TUMTraf Intersection Dataset

Run this script for data preparation:

python ./tools/create_tumtraf_data.py --root-path /home/coopdet3d/data/tumtraf_i --out-dir /home/coopdet3d/data/tumtraf_i_processed --splits training,validation

After data preparation, you will be able to see the following directory structure:

coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── tumtraf_i
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_i_processed
│   │   ├── tumtraf_nusc_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── tumtraf_nusc_infos_train.pkl
│   │   ├── tumtraf_nusc_infos_val.pkl
│   │   ├── tumtraf_nusc_dbinfos_train.pkl

TUMTraf V2X Cooperative Perception Dataset

Run this script for data preparation:

python ./tools/create_tumtraf_v2x_data.py --root-path /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset --out-dir /home/coopdet3d/data/tumtraf_v2x_cooperative_perception_dataset_processed --splits training,validation

After data preparation, you will be able to see the following directory structure:

coopdet3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── tumtraf_v2x_cooperative_perception_dataset
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_v2x_cooperative_perception_dataset_processed
│   │   ├── tumtraf_v2x_nusc_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── tumtraf_v2x_nusc_infos_train.pkl
│   │   ├── tumtraf_v2x_nusc_infos_val.pkl
│   │   ├── tumtraf_v2x_nusc_dbinfos_train.pkl

Training

NOTE 1: If you want to use a YOLOv8 .pth file from MMYOLO, please make sure the keys inside fit with this model. Convert that .pth checkpoint using this converter: ./tools/convert_yolo_checkpoint.py.

Note 2: The paths to the pre-trained weights for YOLOv8 models are hardcoded in the config file, so change it there accordingly. This also means that when training models that use YOLOv8, the parameters --model.encoders.camera.backbone.init_cfg.checkpoint, --model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint, and --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint are optional.

Note 3: We trained our model on 3 GPUs (3 x RTX 3090) and used the following prefix for that: torchpack dist-run -np 3

For training a camera-only model on the TUMTraf Intersection Dataset, run:

torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE> --model.encoders.camera.backbone.init_cfg.checkpoint  <PATH_TO_PRETRAINED_CAMERA_PTH>

Example:

torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/centerhead/lssfpn/camera/256x704/yolov8/default.yaml

For training LiDAR-only model on the TUMTraf Intersection Dataset, run:

torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE>

Example:

torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/transfusion/secfpn/lidar/pointpillars.yaml

For training a fusion model on the TUMTraf Intersection Dataset, run:

torchpack dist-run -np 3 python tools/train.py <PATH_TO_CONFIG_FILE> --model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --load_from <PATH_TO_PRETRAINED_LIDAR_PTH>

Example:

torchpack dist-run -np 3 python tools/train.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --load_from weights/coopdet3d_tumtraf_i_l_pointpillars512_2x.pth

For training camera-only model on the TUMTraf V2X Cooperative Perception Dataset, run:

torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE> --model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH> --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint <PATH_TO_PRETRAINED_CAMERA_PTH>

Use the pretrained camera parameters depending on which type of model you want to train: vehicle-only, camera-only, or cooperative (both).

Example:

torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/centerhead/lssfpn/cooperative/camera/256x704/yolov8/default.yaml

For training LiDAR-only model on the TUMTraf V2X Cooperative Perception Dataset, run:

torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE>

Example:

torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/lidar/pointpillars.yaml

For training fusion model on the TUMTraf V2X Cooperative Perception Dataset, run:

torchpack dist-run -np 3 python tools/train_coop.py <PATH_TO_CONFIG_FILE> ---model.vehicle.fusion_model.encoders.camera.backbone.init_cfg.checkpoint  <PATH_TO_PRETRAINED_CAMERA_PTH> --model.infrastructure.fusion_model.encoders.camera.backbone.init_cfg.checkpoint  <PATH_TO_PRETRAINED_CAMERA_PTH> --load_from <PATH_TO_PRETRAINED_LIDAR_PTH>

Use the pretrained camera parameters depending on which type of model you want to train: vehicle-only, camera-only, or cooperative (both).

Example:

torchpack dist-run -np 3 python tools/train_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --load_from weights/coopdet3d_vi_l_pointpillars512_2x.pth

Note: please run tools/test.py or tools/test_coop.py separately after training to get the final evaluation metrics.

BEV mAP Evaluation (Customized nuScenes Protocol)

NOTE: This section will not work without the test set ground truth, which is not made public. To evaluate your model's mAP_BEV, please send your config files and weights to the authors for evaluation!

For evaluation on the TUMTraf Intersection Dataset, run:

torchpack dist-run -np 1 python tools/test.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --eval bbox

Example:

torchpack dist-run -np 1 python tools/test.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --eval bbox

For evaluation on the TUMTraf V2X Cooperative Perception Dataset, run:

torchpack dist-run -np 1 python tools/test_coop.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --eval bbox

Example:

torchpack dist-run -np 1 python tools/test_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --eval bbox

Running CoopDet3D Inference and Save Detections in OpenLABEL Format

Exporting to OpenLABEL format is needed to perform mAP_3D evaluation or detection visualization using the scripts in the TUM Traffic dev-kit.

NOTE: You will not be evaluate your inference results using the dev-kit without the test set ground truth, which is not made public. To evaluate your model's mAP_3D, please send your detection results to the authors for evaluation!

For TUMTraf Intersection Dataset:

torchpack dist-run -np 1 python tools/inference_to_openlabel.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --out-dir <PATH_TO_OPENLABEL_OUTPUT_FOLDER>

Example:

torchpack dist-run -np 1 python tools/inference_to_openlabel.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --out-dir inference

For TUMTraf V2X Cooperative Perception Dataset:

torchpack dist-run -np 1 python scripts/cooperative_multimodal_3d_detection.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_CHECKPOINT_PTH> --split [train, val, test] --input_type hard_drive --save_detections_openlabel --output_folder_path_detections <PATH_TO_OPENLABEL_OUTPUT_FOLDER>

Example:

torchpack dist-run -np 1 python scripts/cooperative_multimodal_3d_detection.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/bevfusion_coop_vi_cl_pointpillars512_2x_yolos.pth --split test --input_type hard_drive --save_detections_openlabel --output_folder_path_detections inference

Runtime Evaluation:

For TUMTraf Intersection Dataset:

torchpack dist-run -np 1 python tools/benchmark.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --log-interval 50

Example:

torchpack dist-run -np 1 python tools/benchmark.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --log-interval 50

For TUMTraf V2X Cooperative Perception Dataset:

torchpack dist-run -np 1 python tools/benchmark_coop.py <PATH_TO_CONFIG_FILE> <PATH_TO_PTH_FILE> --log-interval 10

Example:

torchpack dist-run -np 1 python tools/benchmark_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --log-interval 10

Built in visualization:

For TUMTraf Intersection Dataset:

torchpack dist-run -np 1 python tools/visualize.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --mode pred --out-dir viz_tumtraf

Example:

torchpack dist-run -np 1 python tools/visualize.py configs/tumtraf_i/det/transfusion/secfpn/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_tumtraf_i_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --mode pred --out-dir viz_tumtraf

For TUMTraf V2X Cooperative Perception Dataset:

torchpack dist-run -np 1 python tools/visualize_coop.py <PATH_TO_CONFIG_FILE> --checkpoint <PATH_TO_PTH_FILE> --split test --mode pred --out-dir viz_tumtraf

Example:

torchpack dist-run -np 1 python tools/visualize_coop.py configs/tumtraf_v2x/det/transfusion/secfpn/cooperative/camera+lidar/yolov8/pointpillars.yaml --checkpoint weights/coopdet3d_vi_cl_pointpillars512_2x_yolos_transfer_learning_best.pth --split test --mode pred --out-dir viz_tumtraf

For split, naturally one could also choose "train" or "val". For mode, the other options are "gt" (ground truth) or "combo" (prediction and ground truth).

NOTE: Ground truth visualization on test set will not work since the test set provided is missing the ground truth.

Benchmark 🎯

Evaluation Results (mAP_BEV and mAP_3D ) of CoopDet3D on TUMTraf V2X Cooperative Perception Dataset Test Set in South 2 FOV

Domain	Modality	mAP_BEV	mAP_3D Easy	mAP_3D Mod.	mAP_3D Hard	mAP_3D Avg.
Vehicle	Camera	46.83	31.47	37.82	30.77	30.36
Vehicle	LiDAR	85.33	85.22	76.86	69.04	80.11
Vehicle	Cam+LiDAR	84.90	77.60	72.08	73,12	76.40
Infra.	Camera	61,98	31.19	46.73	40.42	35.04
Infra.	LiDAR	92.86	86.17	88.07	75.73	84.88
Infra.	Camera + LiDAR	92.92	87.99	89.09	81.69	87.01
Coop.	Camera	68.94	45.41	42.76	57.83	45.74
Coop.	LiDAR	93.93	92.63	78.06	73.95	85.86
Coop.	Camera + LiDAR	94.22	93.42	88.17	79.94	90.76

Evaluation Results of Infrastructure-only CoopDet3D vs. InfraDet3D on TUMTraf Intersection Dataset Test Set

Model	FOV	Modality	mAP_3D Easy	mAP_3D Mod.	mAP_3D Hard	mAP_3D Avg.
InfraDet3D	South 1	LiDAR	75.81	47.66	42.16	55.21
CoopDet3D	South 1	LiDAR	76.24	48.23	35.19	69.47
InfraDet3D	South 2	LiDAR	38.92	46.60	43.86	43.13
CoopDet3D	South 2	LiDAR	74.97	55.55	39.96	69.94
InfraDet3D	South 1	Camera + LiDAR	67.08	31.38	35.17	44.55
CoopDet3D	South 1	Camera + LiDAR	75.68	45.63	45.63	66.75
InfraDet3D	South 2	Camera + LiDAR	58.38	19.73	33.08	37.06
CoopDet3D	South 2	Camera + LiDAR	74.73	53.46	41.96	66.89

Acknowledgement 🤝

The codebase is built upon BEVFusion with vehicle-infrastructure fusion inspired by the method proposed in PillarGrid.

Citation 📝

@inproceedings{zimmer2024tumtrafv2x,
  title={TUMTraf V2X Cooperative Perception Dataset},
  author={Zimmer, Walter and Wardana, Gerhard Arya and Sritharan, Suren and Zhou, Xingcheng and Song, Rui and Knoll, Alois C.},
  publisher={IEEE/CVF},
  booktitle={2024 IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

License 📜

The CoopDet3D model is released under MIT license as found in the license file.
The TUM Traffic Dataset (TUMTraf) dataset itself is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). By downloading the dataset you agree to the terms of this license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoopDet3D: Deep Multi-Modal Cooperative 3D Object Detection of Traffic Participants Using Onboard and Roadside Sensors

Abstract

Overview ✨

News 📢

Features 🔥

Dataset Download 📂

Pretrained Weights 🚀

Usage 🌟

Working with Docker

Working without Docker

Data Preparation

TUMTraf Intersection Dataset

TUMTraf V2X Cooperative Perception Dataset

Training

BEV mAP Evaluation (Customized nuScenes Protocol)

Running CoopDet3D Inference and Save Detections in OpenLABEL Format

Runtime Evaluation:

Built in visualization:

Benchmark 🎯

Evaluation Results (mAP_BEV and mAP_3D ) of CoopDet3D on TUMTraf V2X Cooperative Perception Dataset Test Set in South 2 FOV

Evaluation Results of Infrastructure-only CoopDet3D vs. InfraDet3D on TUMTraf Intersection Dataset Test Set

Acknowledgement 🤝

Citation 📝

License 📜

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
docker		docker
imgs		imgs
mmdet3d		mmdet3d
scripts		scripts
tools		tools
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

License

tum-traffic-dataset/coopdet3d

Folders and files

Latest commit

History

Repository files navigation

CoopDet3D: Deep Multi-Modal Cooperative 3D Object Detection of Traffic Participants Using Onboard and Roadside Sensors

Abstract

Overview ✨

News 📢

Features 🔥

Dataset Download 📂

Pretrained Weights 🚀

Usage 🌟

Working with Docker

Working without Docker

Data Preparation

TUMTraf Intersection Dataset

TUMTraf V2X Cooperative Perception Dataset

Training

BEV mAP Evaluation (Customized nuScenes Protocol)

Running CoopDet3D Inference and Save Detections in OpenLABEL Format

Runtime Evaluation:

Built in visualization:

Benchmark 🎯

Evaluation Results (mAPBEV and mAP3D ) of CoopDet3D on TUMTraf V2X Cooperative Perception Dataset Test Set in South 2 FOV

Evaluation Results of Infrastructure-only CoopDet3D vs. InfraDet3D on TUMTraf Intersection Dataset Test Set

Acknowledgement 🤝

Citation 📝

License 📜

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Evaluation Results (mAP_BEV and mAP_3D ) of CoopDet3D on TUMTraf V2X Cooperative Perception Dataset Test Set in South 2 FOV

Packages