From 2b56a75d166c619f71019e3d1bb1c4aedafe7a90 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=AB=A5=E6=B9=9B?= Date: Mon, 8 Aug 2022 17:40:33 +0800 Subject: [PATCH] update README.md and MODEL_ZOO.md --- MODEL_ZOO.md | 10 +++++----- README.md | 15 ++++++++------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md index d80b22a..c17fae4 100644 --- a/MODEL_ZOO.md +++ b/MODEL_ZOO.md @@ -4,17 +4,17 @@ | Method | Extra Data | Backbone | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 | | :------: | :--------: | :------: | :---: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: | -| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1kP3_-465jCL7PRNFq1JcAghPo2BONRWY/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1lI9qtgrTUw9Fi96-2WkB8aJu3iyPyTxA/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/18EEgdXY9347yK3Yb28O-GxFMbk41F6Ne/view?usp=sharing)
(w/o repeated aug) | 79.4 | 94.1 | -| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | same as above | TODO | 80.4 | 94.4 | -| VideoMAE | ***no*** | ViT-B | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1ftVHzzCupEGV4bCHC5JWIUsEwOEeAQcg/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1tEhLyskjb755TJ65ptsrafUG2llSwQE1/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/154ygeIO5TwFa5I76908RmkiuroCnHHNr/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1MzwteHH-1yuMnFb8vRBQDvngV1Zl-d3z/view?usp=sharing) | 80.9 | 94.7 | -| VideoMAE | ***no*** | ViT-L | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1X7WBzn_yG4lDWuvBMBBgrtgqDLZVHrc2/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1qLOXWb_MGEvaI7tvuAe94CV7S2HXRwT3/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1SRKgFfAoVoSgwqqijQbaG8c88UC4GY9v/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1jX1CiqxSkCfc94y8FRW1YGHy-GNvHCuD/view?usp=sharing) | 84.7 | 96.5 | +| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1kP3_-465jCL7PRNFq1JcAghPo2BONRWY/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1JOJzhlCujgpsjjth0J49k5EwBNxy76xt/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/18EEgdXY9347yK3Yb28O-GxFMbk41F6Ne/view?usp=sharing)
(w/o repeated aug) | 80.0 | 94.4 | +| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | same as above | TODO | 81.0 | 94.8 | +| VideoMAE | ***no*** | ViT-B | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1ftVHzzCupEGV4bCHC5JWIUsEwOEeAQcg/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1tEhLyskjb755TJ65ptsrafUG2llSwQE1/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1fYXtL2y2ZTMxDtTRqoUOe6leVmdVI5HH/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1MzwteHH-1yuMnFb8vRBQDvngV1Zl-d3z/view?usp=sharing) | 81.5 | 95.1 | +| VideoMAE | ***no*** | ViT-L | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1X7WBzn_yG4lDWuvBMBBgrtgqDLZVHrc2/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1qLOXWb_MGEvaI7tvuAe94CV7S2HXRwT3/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1Doqx6zDQEMnMyPvDdz2knG385o0sZn3f/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1jX1CiqxSkCfc94y8FRW1YGHy-GNvHCuD/view?usp=sharing) | 85.2 | 96.8 | ### Something-Something V2 | Method | Extra Data | Backbone | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 | | :------: | :--------: | :------: | :---: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: | | VideoMAE | ***no*** | ViT-B | 800 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1eGS18rKvbgEJ3nbsXxokkMSwNGxxoX48/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/181hLvyrrPW2IOGA46fkxdJk0tNLIgdB2/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1jYAHPcs7zt_QMPM2D_geEWoWrf3yHox8/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1xZCiaPF4w7lYmLt5o1D5tIZyDdLtJAvH/view?usp=sharing)
(w/o repeated aug) | 69.6 | 92.0 | -| VideoMAE | ***no*** | ViT-B | 2400 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/pretrain.sh)/[log](https://drive.google.com/file/d/148nURgfcIFBQd3IQH5YhJ9dTwNCc2jkU/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1I18dY_7rSalGL8fPWV82c0-foRUDzJJk/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/finetune.sh)/[log](https://drive.google.com/file/d/1IRme58NHRTfcfdy1wfdph9AZMQT8zKv5/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1dt_59tBIyzdZd5Ecr22lTtzs_64MOZkT/view?usp=sharing) | 70.6 | 92.6 | +| VideoMAE | ***no*** | ViT-B | 2400 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/pretrain.sh)/[log](https://drive.google.com/file/d/148nURgfcIFBQd3IQH5YhJ9dTwNCc2jkU/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1I18dY_7rSalGL8fPWV82c0-foRUDzJJk/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/finetune.sh)/[log](https://drive.google.com/file/d/15TPBiUl_K2Q_9l6J41G_vf-2lovVLEHM/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1dt_59tBIyzdZd5Ecr22lTtzs_64MOZkT/view?usp=sharing) | 70.8 | 92.4 | ### Note: diff --git a/README.md b/README.md index d92b84d..00e363e 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,7 @@ > [Zhan Tong](https://github.com/yztongzhan), [Yibing Song](https://ybsong00.github.io/), [Jue Wang](https://juewang725.github.io/), [Limin Wang](http://wanglimin.github.io/)
Nanjing University, Tencent AI Lab ## ๐Ÿ“ฐ News +**[2022.8.8]** We have fixed a bug ๐Ÿ› in this [commit](https://github.com/MCG-NJU/VideoMAE/commit/2254c5eeeff30cda700622d8a24f14403eda4038) and the performance on Kinetics-400 can be improved by about 0.5%๐Ÿ˜ฎ. Thank @JerryFlymi for help.
**[2022.7.7]** We have updated new results on downstream AVA 2.2 benckmark. Please refer to our [paper](https://arxiv.org/abs/2203.12602) for details.
**[2022.4.24]** Code and pre-trained models are available now! Please leave a starโญ๏ธ for our best efforts.๐Ÿ˜†
**[2022.4.15]** The **[LICENSE](https://github.com/MCG-NJU/VideoMAE/blob/main/LICENSE)** of this project has been upgraded to CC-BY-NC 4.0.
**[2022.3.24]** ~~Code and pre-trained models will be released here.~~ Welcome to **watch** this repository for the latest updates. @@ -36,17 +37,17 @@ VideoMAE works well for video datasets of different scales and can achieve **85. | Method | Extra Data | Backbone | Resolution | #Frames x Clips x Crops | Top-1 | Top-5 | | :------: | :--------: | :------: | :--------: | :---------------------: | :---: | :---: | -| VideoMAE | ***no*** | ViT-B | 224x224 | 16x2x3 | 70.6 | 92.6 | -| VideoMAE | ***no*** | ViT-L | 224x224 | 16x2x3 | 74.2 | 94.7 | -| VideoMAE | ***no*** | ViT-L | 224x224 | 32x1x3 | 75.3 | 95.2 | +| VideoMAE | ***no*** | ViT-B | 224x224 | 16x2x3 | 70.8 | 92.4 | +| VideoMAE | ***no*** | ViT-L | 224x224 | 16x2x3 | 74.3 | 94.6 | +| VideoMAE | ***no*** | ViT-L | 224x224 | 32x1x3 | 75.4 | 95.2 | ### โœจ Kinetics-400 | Method | Extra Data | Backbone | Resolution | #Frames x Clips x Crops | Top-1 | Top-5 | | :------: | :--------: | :------: | :--------: | :---------------------: | :---: | :---: | -| VideoMAE | ***no*** | ViT-B | 224x224 | 16x5x3 | 80.9 | 94.7 | -| VideoMAE | ***no*** | ViT-L | 224x224 | 16x5x3 | 84.7 | 96.5 | -| VideoMAE | ***no*** | ViT-L | 320x320 | 32x5x3 | 85.8 | 97.1 | +| VideoMAE | ***no*** | ViT-B | 224x224 | 16x5x3 | 81.5 | 95.1 | +| VideoMAE | ***no*** | ViT-L | 224x224 | 16x5x3 | 85.2 | 96.8 | +| VideoMAE | ***no*** | ViT-L | 320x320 | 32x5x3 | 86.1 | 97.3 | ### โœจ AVA 2.2 @@ -96,7 +97,7 @@ Zhan Tong: tongzhan@smail.nju.edu.cn ## ๐Ÿ‘ Acknowledgements -Thanks to [Ziteng Gao](https://sebgao.github.io/), Lei Chen and [Chongjian Ge](https://chongjiange.github.io/) for their kindly support.
+Thanks to [Ziteng Gao](https://sebgao.github.io/), Lei Chen, [Chongjian Ge](https://chongjiange.github.io/), and [Zhiyu Zhao](https://github.com/JerryFlymi) for their kindly support.
This project is built upon [MAE-pytorch](https://github.com/pengzhiliang/MAE-pytorch) and [BEiT](https://github.com/microsoft/unilm/tree/master/beit). Thanks to the contributors of these great codebases. ## ๐Ÿ”’ License