Group Contextualization for Video Recognition (CVPR 2022)

This is an official implementaion of paper "Group Contextualization for Video Recognition", which has been accepted by CVPR 2022. Paper link

Updates

March 11, 2022

Release this V1 version (the version used in paper) to public.

Content

Prerequisites
Data Preparation
Code
Pretrained Models
Train
Test
Contibutors
Citing
Acknowledgement

Prerequisites

The code is built with following libraries:

PyTorch >= 1.7, torchvision
tensorboardx

For video data pre-processing, you may need ffmpeg.

Data Preparation

For GC-TSN, GC-GST, GC-TSM, we need to first extract videos into frames for all datasets (Kinetics-400, Something-Something V1 and V2, Diving48 and EGTEA Gaze+), following the TSN repo. While for GC-TDN, the data process follows the backbone TDN work, which resizes the short edge of video to 320px and directly decodes video mp4 file during training/evaluation.

Code

GC-TSN/TSM/GST/TDN codes are based on TSN, TSM, GST and TDN codebases, respectively.

Pretrained Models

Here we provide some of the pretrained models.

Kinetics-400

Model	Frame * view * clip	Top-1 Acc.	Top-5 Acc.	Checkpoint
GC-TSN ResNet50	8 * 1 * 10	75.2%	92.1%	link
GC-TSM ResNet50	8 * 1 * 10	75.4%	91.9%	link
GC-TSM ResNet50	16 * 1 * 10	76.7%	92.9%	link
GC-TSM ResNet50	16 * 3 * 10	77.1%	92.9%
GC-TDN ResNet50	8 * 3 * 10	77.3%	93.2%	link
GC-TDN ResNet50	16 * 3 * 10	78.8%	93.8%	link
GC-TDN ResNet50	(8+16) * 3 * 10	79.6%	94.1%

Something-Something

Something-Something V1&V2 datasets are highly temporal-related. Here, we use the 224×224 resolution for performance report.

Something-Something-V1

Model	Frame * view * clip	Top-1 Acc.	Top-5 Acc.	Checkpoint
GC-GST ResNet50	8 * 1 * 2	48.8%	78.5%	link
GC-GST ResNet50	16 * 1 * 2	50.4%	79.4%	link
GC-GST ResNet50	(8+16) * 1 * 2	52.5%	81.3%
GC-TSN ResNet50	8 * 1 * 2	49.7%	78.2%	link
GC-TSN ResNet50	16 * 1 * 2	51.3%	80.0%	link
GC-TSN ResNet50	(8+16) * 1 * 2	53.7%	81.8%
GC-TSM ResNet50	8 * 1 * 2	51.1%	79.4%	link
GC-TSM ResNet50	16 * 1 * 2	53.1%	81.2%	link
GC-TSM ResNet50	(8+16) * 1 * 2	55.0%	82.6%
GC-TSM ResNet50	(8+16) * 3 * 2	55.3%	82.7%
GC-TDN ResNet50	8 * 1 * 1	53.7%	82.2%	link
GC-TDN ResNet50	16 * 1 * 1	55.0%	82.3%	link
GC-TDN ResNet50	(8+16) * 1 * 1	56.4%	84.0%

Something-Something-V2

Model	Frame * view * clip	Top-1 Acc.	Top-5 Acc.	Checkpoint
GC-GST ResNet50	8 * 1 * 2	61.9%	87.8%	link
GC-GST ResNet50	16 * 1 * 2	63.3%	88.5%	link
GC-GST ResNet50	(8+16) * 1 * 2	65.0%	89.5%
GC-TSN ResNet50	8 * 1 * 2	62.4%	87.9%	link
GC-TSN ResNet50	16 * 1 * 2	64.8%	89.4%	link
GC-TSN ResNet50	(8+16) * 1 * 2	66.3%	90.3%
GC-TSM ResNet50	8 * 1 * 2	63.0%	88.4%	link
GC-TSM ResNet50	16 * 1 * 2	64.9%	89.7%	link
GC-TSM ResNet50	(8+16) * 1 * 2	66.7%	90.6%
GC-TSM ResNet50	(8+16) * 3 * 2	67.5%	90.9%
GC-TDN ResNet50	8 * 1 * 1	64.9%	89.7%	link
GC-TDN ResNet50	16 * 1 * 1	65.9%	90.0%	link
GC-TDN ResNet50	(8+16) * 1 * 1	67.8%	91.2%

Diving48

Model	Frame * view * clip	Top-1 Acc.	Checkpoint
GC-GST ResNet50	16 * 1 * 1	82.5%	link
GC-TSN ResNet50	16 * 1 * 1	86.8%	link
GC-TSM ResNet50	16 * 1 * 1	87.2%	link
GC-TDN ResNet50	16 * 1 * 1	87.6%	link

EGTEA Gaze

Model	Frame * view * clip	Split1	Split2	Split3
GC-GST ResNet50	8 * 1 * 1	65.5%	61.6%	60.6%
GC-TSN ResNet50	8 * 1 * 1	66.4%	64.6%	61.4%
GC-TSM ResNet50	8 * 1 * 1	66.5%	66.1%	62.6%
GC-TDN ResNet50	8 * 1 * 1	65.0%	61.8%	61.0%

Train

For different backbones, please use their corresponding training code, like 'train_tsn.sh' with the usage of TSN.

Test

For TSN/TSM/GST backbones, please use the test py "test_models_tsntsmgst_gc.py", run 'sh bash_test_tsntsmgst_gc.sh'. Please change the "from ops_tsntsmgst.models_tsn import VideoNet" (line-19 in test_models_tsntsmgst_gc.py) with the corresponding model name.

For TDN backbone, please use its official test file, see https://github.com/MCG-NJU/TDN.

Contributors

GC codes are jointly written and owned by Dr. Yanbin Hao and Dr. Hao Zhang.

Citing

@article{gc2022,
  title={Group Contextualization for Video Recognition},
  author={Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He},
  journal={CVPR 2022},
}

Acknowledgement

Thanks for the following Github projects:

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
demo		demo
nets		nets
ops_tdn		ops_tdn
ops_tsntsmgst		ops_tsntsmgst
LICENSE		LICENSE
README.md		README.md
bash_test_tsntsmgst_gc.sh		bash_test_tsntsmgst_gc.sh
main_gst.py		main_gst.py
main_tdn.py		main_tdn.py
main_tsm.py		main_tsm.py
main_tsn.py		main_tsn.py
opts.py		opts.py
opts_tdn.py		opts_tdn.py
test_models_tsntsmgst_gc.py		test_models_tsntsmgst_gc.py
train_gst.sh		train_gst.sh
train_tdn.sh		train_tdn.sh
train_tsm.sh		train_tsm.sh
train_tsn.sh		train_tsn.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group Contextualization for Video Recognition (CVPR 2022)

Updates

March 11, 2022

Content

Prerequisites

Data Preparation

Code

Pretrained Models

Kinetics-400

Something-Something

Something-Something-V1

Something-Something-V2

Diving48

EGTEA Gaze

Train

Test

Contributors

Citing

Acknowledgement

About

Releases

Packages

Languages

License

haoyanbin918/Group-Contextualization

Folders and files

Latest commit

History

Repository files navigation

Group Contextualization for Video Recognition (CVPR 2022)

Updates

March 11, 2022

Content

Prerequisites

Data Preparation

Code

Pretrained Models

Kinetics-400

Something-Something

Something-Something-V1

Something-Something-V2

Diving48

EGTEA Gaze

Train

Test

Contributors

Citing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages