Skip to content

Latest commit

 

History

History
123 lines (84 loc) · 15.1 KB

MODEL_ZOO.md

File metadata and controls

123 lines (84 loc) · 15.1 KB

Model Zoo

Action Recognition

For action recognition, unless specified, models are trained on Kinetics-400. The version of Kinetics-400 we used contains 240436 training videos and 19796 testing videos. For TSN, we also train it on UCF-101, initialized with ImageNet pretrained weights. We also provide transfer learning results on UCF101 and HMDB51 for some algorithms. Models with * are converted from other repos(including VMZ and kinetics_i3d), others are trained by ourselves.

For data preprocessing, we find that resizing short-edges of videos to 256px is generally a better choice than resizing the video to fixed width and height 340x256, since the size ratios are kept. Most of our Kinetics-400 models are trained with videos which short-edges are resized to 256px. However, some legacy Kinetics-400 models are trained with videos with fixed width and height (340x256). We use the mark $^{340\times256}$ to indicate the model is legacy.

If you can not reproduce our testing results due to dataset unalignment, please submit a request at get validation data.

TSN

Kinetics

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB ImageNet ResNet50 3seg 70.6 89.4 model$^{340\times256}$

UCF101

Modality Pretrained Backbone Input Top-1 Download
RGB ImageNet BNInception 3seg 86.4 model
TV-L1 ImageNet BNInception 3seg 87.7 model

C3D

Sports-1M

Modality Pretrained Backbone Input Top-1 Download
RGB None C3D 16x1 N/A model*

* Converted from C3D-v1.0 in Caffe and TGAN in Chainer.

UCF101

Modality Pretrained Backbone Input Top-1 Download
RGB Sports-1M C3D 16x1 82.26 model*

* Converted from C3D-v1.0 in Caffe and TGAN in Chainer.

I3D

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB ImageNet Inception-V1 64x1 71.1 89.3 model*
RGB ImageNet ResNet50 32x2 72.9 90.8 model$^{340\times256}$
Flow ImageNet Inception-V1 64x1 63.4 84.9 model*
Two-Stream ImageNet Inception-V1 64x1 74.2 91.3 /

* Converted from kinetics_i3d in TensorFlow.

SlowOnly

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB None ResNet50 4x16 72.9 90.9 model
RGB ImageNet ResNet50 4x16 73.8 90.9 model
RGB None ResNet50 8x8 74.8 91.9 model
RGB ImageNet ResNet50 8x8 75.7 92.2 model
RGB None ResNet101 8x8 76.5 92.7 model
RGB ImageNet ResNet101 8x8 76.8 92.8 model

SlowFast

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB None ResNet50 4x16 75.4 92.1 model
RGB ImageNet ResNet50 4x16 75.9 92.3 model

R(2+1)D

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB None ResNet34 8x8 63.7 85.9 model
RGB IG-65M ResNet34 8x8 74.4 91.7 model
RGB None ResNet34 32x2 71.8 90.4 model
RGB IG-65M ResNet34 32x2 80.3 94.7 model

CSN

Modality Pretrained Backbone Input Top-1 Top-5 Download
RGB IG-65M irCSN-152 32x2 82.6 95.7 model*
RGB IG-65M ipCSN-152 32x2 82.7 95.6 model*

OmniSource

Modality Pretrained Backbone Input Top-1 (Baseline / OmniSource ($\Delta$)) Top-5 (Baseline / OmniSource ($\Delta$)) Download
RGB ImageNet ResNet50 3seg 70.6 / 73.6 (+ 3.0) 89.4 / 91.0 (+ 1.6) Baseline$^{340\times256}$ / OmniSource$^{340\times256}$
RGB IG-1B ResNet50 3seg 73.1 / 75.7 (+ 2.6) 90.4 / 91.9 (+ 1.5) Baseline / OmniSource
RGB Scratch ResNet50 4x16 72.9 / 76.8 (+ 3.9) 90.9 / 92.5 (+ 1.6) Baseline / OmniSource
RGB Scratch ResNet101 8x8 76.5 / 80.4 (+ 3.9) 92.7 / 94.4 (+ 1.7) Baseline / OmniSource

Transfer Learning

Model Modality Pretrained Backbone Input UCF101 HMDB51 Download (split1)
I3D RGB Kinetics I3D 64x1 94.8 72.6 UCF101 / HMDB51
I3D Flow Kinetics I3D 64x1 96.6 79.2 UCF101 / HMDB51
I3D TwoStream Kinetics I3D 64x1 97.8 80.8 /

Action Detection

For action detection, we release models trained on THUMOS14.

SSN

Modality Pretrained Backbone [email protected] [email protected] [email protected] [email protected] [email protected] Download
RGB ImageNet BNInception 43.09% 37.95% 32.56% 25.71% 18.33% model

Spatial Temporal Action Detection

For spatial temporal action detection, we release models trained on AVA.

Modality Model Pretrained Backbone [email protected] Download
RGB Fast-RCNN Kinetics NL-I3D R50 21.2 model