TransCenter: Transformers with Dense Queries for Multiple-Object Tracking
Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda
[Paper] [Project]
If you find this code useful, please star the project and consider citing:
@misc{xu2021transcenter,
title={TransCenter: Transformers with Dense Queries for Multiple-Object Tracking},
author={Yihong Xu and Yutong Ban and Guillaume Delorme and Chuang Gan and Daniela Rus and Xavier Alameda-Pineda},
year={2021},
eprint={2103.15145},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
We provide two singularity images (similar to docker) containing all the packages we need for TransCenter:
- Install singularity > 3.7.1: https://sylabs.io/guides/3.0/user-guide/installation.html#install-on-linux
- Download one of the singularity images:
pytorch1-5cuda10-1.sif tested with Nvidia GTX TITAN. Or
pytorch1-5cuda10-1_RTX.sif tested with Nvidia RTX TITAN, Quadro RTX 8000, RTX 2080Ti, Quadro RTX 4000.
- Launch a Singularity image
singularity shell --nv --bind yourLocalPath:yourPathInsideImage YourSingularityImage.sif
- -bind: to link a singularity path with a local path. By doing this, you can find data from local PC inside Singularity image;
- -nv: use the local Nvidia driver.
You can also build your own environment:
- we use anaconda to simplify the package installations, you can download anaconda (4.9.2) here: https://www.anaconda.com/products/individual
- you can create your conda env by doing
conda create --name <YourEnvName> --file requirements.txt
- TransCenter uses Deformable transformer from Deformable DETR. Therefore, we need to install deformable attention modules:
cd ./to_install/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
- TransCenter uses pytorch-liteflownet during tracking, which depends on correlation_package. You can install it by doing:
cd ./to_install/correlation_package
python setup.py install
- for the up-scale and merge module in TransCenter, we use deformable convolution module, you can install it with:
cd ./to_install/DCNv2
./make.sh # build
python testcpu.py # run examples and gradient check on cpu
python testcuda.py # run examples and gradient check on gpu
see also known issues from https://github.com/CharlesShang/DCNv2. If you have issues related to cuda of the third-party modules, please try to recompile them in the GPU that you use for training and testing. The dependencies are compatible with Pytorch 1.5, cuda 10.2.
ms coco: we use only the person category for pretraining TransCenter. The code for filtering is provided in ./data/coco_person.py.
@inproceedings{lin2014microsoft,
title={Microsoft coco: Common objects in context},
author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
booktitle={European conference on computer vision},
pages={740--755},
year={2014},
organization={Springer}
}
CrowdHuman: CrowdHuman labels are converted to coco format, the conversion can be done through ./data/convert_crowdhuman_to_coco.py.
@article{shao2018crowdhuman,
title={CrowdHuman: A Benchmark for Detecting Human in a Crowd},
author={Shao, Shuai and Zhao, Zijian and Li, Boxun and Xiao, Tete and Yu, Gang and Zhang, Xiangyu and Sun, Jian},
journal={arXiv preprint arXiv:1805.00123},
year={2018}
}
MOT17: MOT17 labels are converted to coco format, the conversion can be done through ./data/convert_mot_to_coco.py.
@article{milan2016mot16,
title={MOT16: A benchmark for multi-object tracking},
author={Milan, Anton and Leal-Taix{\'e}, Laura and Reid, Ian and Roth, Stefan and Schindler, Konrad},
journal={arXiv preprint arXiv:1603.00831},
year={2016}
}
MOT20: MOT20 labels are converted to coco format, the conversion can be done through ./data/convert_mot20_to_coco.py.
@article{dendorfer2020mot20,
title={Mot20: A benchmark for multi object tracking in crowded scenes},
author={Dendorfer, Patrick and Rezatofighi, Hamid and Milan, Anton and Shi, Javen and Cremers, Daniel and Reid, Ian and Roth, Stefan and Schindler, Konrad and Leal-Taix{\'e}, Laura},
journal={arXiv preprint arXiv:2003.09003},
year={2020}
}
We also provide the filtered/converted labels:
ms coco person labels: please put the annotations folder inside cocoperson to your ms coco dataset root folder.
CrowdHuman coco format labels: please put the annotations folder inside crowdhuman to your CrowdHuman dataset root folder.
MOT17 coco format labels: please put the annotations and annotations_onlySDP folders inside MOT17 to your MOT17 dataset root folder.
MOT20 coco format labels: please put the annotations folder inside MOT20 to your MOT20 dataset root folder.
deformable transformer pretrained: pretrained model from deformable-DETR.
coco_pretrained: model trained with coco person dataset.
CH_pretrained: model pretrained on coco person and fine-tuned on CrowdHuman dataset.
MOT17_fromCoCo: model pretrained on coco person and fine-tuned on MOT17 trainset.
MOT17_fromCH: model pretrained on CrowdHuman and fine-tuned on MOT17 trainset.
MOT20_fromCoCo: model pretrained on coco person and fine-tuned on MOT20 trainset.
MOT20_fromCH: model pretrained on CrowdHuman and fine-tuned on MOT20 trainset.
Please put all the pretrained models to ./model_zoo .
- Pretrained on coco person dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/transcenter/main_coco_tracking.py --output_dir=./output/whole_coco --batch_size=4 --num_workers=20 --resume=./model_zoo/r50_deformable_detr-checkpoint.pth --pre_hm --tracking --data_dir=PathToCoCoDataset
- Pretrained on CrowdHuman dataset:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=4 --use_env ./training/transcenter/main_crowdHuman_tracking.py --output_dir=./output/whole_ch_from_COCO --batch_size=4 --num_workers=20 --resume=./model_zoo/coco_pretrained.pth --pre_hm --tracking --data_dir=PathToCrowdHumanDataset
- Train MOT17 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/transcenter/main_mot17_tracking.py --output_dir=./output/whole_MOT17_from_COCO --batch_size=2 --num_workers=20 --resume=./model_zoo/coco_pretrained.pth --pre_hm --tracking --same_aug_pre --image_blur_aug --data_dir=PathToMOT17dataset
- Train MOT17 from CrowdHuman pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/transcenter/main_mot17_tracking.py --output_dir=./output/whole_MOT17_from_CH --batch_size=2 --num_workers=20 --resume=./model_zoo/CH_pretrained.pth --pre_hm --tracking --same_aug_pre --image_blur_aug --data_dir=PathToMOT17dataset
- Train MOT20 from CoCo pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/transcenter/main_mot20_tracking.py --output_dir=./output/whole_MOT20_from_COCO --batch_size=2 --num_workers=20 --resume=./model_zoo/coco_pretrained.pth --pre_hm --tracking --same_aug_pre --image_blur_aug --not_max_crop --data_dir=PathToMOT20dataset
- Train MOT20 from CrowdHuman pretrained model:
cd TransCenter_official
python -m torch.distributed.launch --nproc_per_node=2 --use_env ./training/transcenter/main_mot20_tracking.py --output_dir=./output/whole_MOT20_from_CH --batch_size=2 --num_workers=20 --resume=./model_zoo/CH_pretrained.pth --pre_hm --tracking --same_aug_pre --image_blur_aug --not_max_crop --data_dir=PathToMOT20dataset
Tips:
- If you encounter RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR in some GPUs, please try to set torch.backends.cudnn.benchmark=False. In most of the cases, setting torch.backends.cudnn.benchmark=True is more memory-efficient.
- Depending on your environment and GPUs, you might experience MOTA jitter in your final models.
- You may see training noise during fine-tuning, especially for MOT17/MOT20 training with well-pretrained models. You can slow down the training rate by 1/10, apply early stopping, increase batch size with GPUs having more memory.
- If you have GPU memory issues, try to lower the batch size for training and evaluation in main_****.py, freeze the resnet backbone and use our coco/CH pretrained models.
Using Public detections:
- MOT17:
cd TransCenter_official
python ./tracking/transcenter/mot17_pub.py --data_dir=YourMOT17Path
- MOT20:
cd TransCenter_official
python ./tracking/transcenter/mot20_pub.py --data_dir=YourMOT20Path
Using Private detections:
- MOT17:
cd TransCenter_official
python ./tracking/transcenter/mot17_private.py --data_dir=YourMOT17Path
- MOT20:
cd TransCenter_official
python ./tracking/transcenter/mot20_private.py --data_dir=YourMOT20Path
Notes:
- we recently corrected an image loading bug during reading certain images having an image ratio close to 1 (in MOT20) in the code, bringing better performance in MOT20.
- you can test your model by changing the model_path inside mot17[20]_private[pub].py.
MOT17 public detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 68.8% | 79.9% | 61.4% | 22,860 | 149,188 | 4,102 |
CH | 71.9% | 81.4% | 62.3% | 17,378 | 137,008 | 4,046 |
MOT20 public detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 61.0% | 79.5% | 49.8% | 49,189 | 147,890 | 4,493 |
CH | 62.3% | 79.9% | 50.3% | 43,006 | 147,505 | 4,545 |
MOT17 private detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 70.0% | 79.6% | 62.1% | 28,119 | 136,722 | 4,647 |
CH | 73.2% | 81.1% | 62.2% | 23,112 | 123,738 | 4,614 |
MOT20 private detections:
Pretrained | MOTA | MOTP | IDF1 | FP | FN | IDS |
---|---|---|---|---|---|---|
CoCo | 60.6% | 79.5% | 49.6% | 52,332 | 146,809 | 4,604 |
CH | 61.9% | 79.9% | 50.4% | 45,895 | 146,347 | 4,653 |
Note:
- The results can be slightly different depending on the running environment.
- We might keep updating the results in the near future.
The code for TransCenter is modified and network pre-trained weights are obtained from the following repositories:
- The Person Reid Network (./tracking/transcenter/model_zoo/ResNet_iter_25245.pth) is from Tracktor.
- The lightflownet pretrained model (./tracking/transcenter/util/LiteFlownet/network-kitti.pytorch) is from pytorch-liteflownet and LiteFlowNet.
- The deformable transformer pretrained model (./model_zoo/r50_deformable_detr-checkpoint.pth) is from Deformable-DETR.
- The data format conversion code is modified from CenterTrack.
CenterTrack, Deformable-DETR, Tracktor.
@article{zhou2020tracking,
title={Tracking Objects as Points},
author={Zhou, Xingyi and Koltun, Vladlen and Kr{\"a}henb{\"u}hl, Philipp},
journal={ECCV},
year={2020}
}
@InProceedings{tracktor_2019_ICCV,
author = {Bergmann, Philipp and Meinhardt, Tim and Leal{-}Taix{\'{e}}, Laura},
title = {Tracking Without Bells and Whistles},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}}
@article{zhu2020deformable,
title={Deformable DETR: Deformable Transformers for End-to-End Object Detection},
author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng},
journal={arXiv preprint arXiv:2010.04159},
year={2020}
}
Several modules are from:
MOT Metrics in Python: py-motmetrics
Soft-NMS: Soft-NMS
DETR: DETR
DCNv2: DCNv2
correlation_package: correlation_package
pytorch-liteflownet: pytorch-liteflownet
LiteFlowNet: LiteFlowNet
@InProceedings{hui18liteflownet,
author = {Tak-Wai Hui and Xiaoou Tang and Chen Change Loy},
title = {LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018},
pages = {8981--8989},
}