Skip to content

VID-Trans-ReID: Enhanced Video Transformers for Person Re-identification

Notifications You must be signed in to change notification settings

moon-2000/VID-Trans-ReID

 
 

Repository files navigation

VID-Trans-ReID

This is an Official Pytorch Implementation of our paper VID-Trans-ReID: Enhanced Video Transformers for Person Re-identification

Python 3.6 Tested using Python 3.7.x and Torch: 1.8.0.

Architecture:

modelupdated2

Abstract

"Video-based person Re-identification (Re-ID) has received increasing attention recently due to its important role within surveillance video analysis. Video-based Re-ID expands upon earlier image-based methods by extracting person features temporally across multiple video image frames. The key challenge within person Re-ID is extracting a robust feature representation that is invariant to the challenges of pose and illumination variation across multiple camera viewpoints. Whilst most contemporary methods use a CNN based methodology, recent advances in vision transformer (ViT) architectures boos fine-grained feature discrimination via the use of both multi-head attention without any loss of feature robustness. To specifically enable ViT architectures to effectively address the challenges of video person Re-ID, we propose two novel modules constructs, Tem- poral Clip Shift and Shuffled (TCSS) and Video Patch Part Feature (VPPF), that boost the robustness of the resultant Re-ID feature representation. Furthermore, we combine our proposed approach with current best practices spanning both image and video based Re-ID including camera view embedding. Our proposed approach outperforms existing state-of-the-art work on the MARS, PRID2011, and iLIDS-VID Re-ID benchmark datasets achieving 96.36%, 96.63%, 94.67% rank-1 accuracy respectively and achieving 90.25% mAP on MARS."

[A. Alsehaim, T.P. Breckon, In Proc. British Machine Vision Conference, BMVA, 2022] [Talk] [Poster]

non-id2

paper2Dig

Requirements

pip install -r requirements.txt

Getting Started

  1. Download the ImageNet pretrained transformer model : ViT_base.
  2. Download the video person Re-ID datasets MARS, PRID and iLIDS-VID

Train and Evaluate

Use the pre-trained model ViT_base to initialize ViT transformer then train the whole model.

MARS Datasete

python -u VID_Trans_ReID.py --Dataset_name 'Mars' --ViT_path 'jx_vit_base_p16_224-80ecf9dd.pth'

PRID Dataset

python -u VID_Trans_ReID.py --Dataset_name 'PRID' --ViT_path 'jx_vit_base_p16_224-80ecf9dd.pth'

iLIDS-VID Dataset

python -u VID_Trans_ReID.py --Dataset_name 'iLIDSVID' --ViT_path 'jx_vit_base_p16_224-80ecf9dd.pth'

Test

To test the model you can use our pretrained model on MARS dataset download

python -u VID_Test.py --Dataset_name 'Mars' --model_path 'MarsMain_Model.pth'

Acknowledgement

Thanks to Hao Luo, using some implementation from his repository

Citation

If you are making use of this work in any way, you must please reference the following paper in any report, publication, presentation, software release or any other associated materials:

VID-Trans-ReID: Enhanced Video Transformers for Person Re-identification (A. Alsehaim, T.P. Breckon), In Proc. British Machine Vision Conference, BMVA, 2022.

@inproceedings{alsehaim22vidtransreid,
 author = {Alsehaim, A. and Breckon, T.P.},
 title = {VID-Trans-ReID: Enhanced Video Transformers for Person Re-identification},
 booktitle = {Proc. British Machine Vision Conference},
 year = {2022},
 month = {November},
 publisher = {BMVA},
 url = {https://breckon.org/toby/publications/papers/alsehaim22vidtransreid.pdf}
}

About

VID-Trans-ReID: Enhanced Video Transformers for Person Re-identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.4%
  • Python 1.6%