diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md
index d80b22a..c17fae4 100644
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
@@ -4,17 +4,17 @@
| Method | Extra Data | Backbone | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
| :------: | :--------: | :------: | :---: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: |
-| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1kP3_-465jCL7PRNFq1JcAghPo2BONRWY/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1lI9qtgrTUw9Fi96-2WkB8aJu3iyPyTxA/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/18EEgdXY9347yK3Yb28O-GxFMbk41F6Ne/view?usp=sharing)
(w/o repeated aug) | 79.4 | 94.1 |
-| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | same as above | TODO | 80.4 | 94.4 |
-| VideoMAE | ***no*** | ViT-B | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1ftVHzzCupEGV4bCHC5JWIUsEwOEeAQcg/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1tEhLyskjb755TJ65ptsrafUG2llSwQE1/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/154ygeIO5TwFa5I76908RmkiuroCnHHNr/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1MzwteHH-1yuMnFb8vRBQDvngV1Zl-d3z/view?usp=sharing) | 80.9 | 94.7 |
-| VideoMAE | ***no*** | ViT-L | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1X7WBzn_yG4lDWuvBMBBgrtgqDLZVHrc2/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1qLOXWb_MGEvaI7tvuAe94CV7S2HXRwT3/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1SRKgFfAoVoSgwqqijQbaG8c88UC4GY9v/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1jX1CiqxSkCfc94y8FRW1YGHy-GNvHCuD/view?usp=sharing) | 84.7 | 96.5 |
+| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1kP3_-465jCL7PRNFq1JcAghPo2BONRWY/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1JOJzhlCujgpsjjth0J49k5EwBNxy76xt/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/18EEgdXY9347yK3Yb28O-GxFMbk41F6Ne/view?usp=sharing)
(w/o repeated aug) | 80.0 | 94.4 |
+| VideoMAE | ***no*** | ViT-B | 800 | 16x5x3 | same as above | TODO | 81.0 | 94.8 |
+| VideoMAE | ***no*** | ViT-B | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1ftVHzzCupEGV4bCHC5JWIUsEwOEeAQcg/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1tEhLyskjb755TJ65ptsrafUG2llSwQE1/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1fYXtL2y2ZTMxDtTRqoUOe6leVmdVI5HH/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1MzwteHH-1yuMnFb8vRBQDvngV1Zl-d3z/view?usp=sharing) | 81.5 | 95.1 |
+| VideoMAE | ***no*** | ViT-L | 1600 | 16x5x3 | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/pretrain.sh)/[log](https://drive.google.com/file/d/1X7WBzn_yG4lDWuvBMBBgrtgqDLZVHrc2/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1qLOXWb_MGEvaI7tvuAe94CV7S2HXRwT3/view?usp=sharing) | [script](scripts/kinetics/videomae_vit_large_patch16_224_tubemasking_ratio_0.9_epoch_1600/finetune.sh)/[log](https://drive.google.com/file/d/1Doqx6zDQEMnMyPvDdz2knG385o0sZn3f/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1jX1CiqxSkCfc94y8FRW1YGHy-GNvHCuD/view?usp=sharing) | 85.2 | 96.8 |
### Something-Something V2
| Method | Extra Data | Backbone | Epoch | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
| :------: | :--------: | :------: | :---: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: |
| VideoMAE | ***no*** | ViT-B | 800 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/pretrain.sh)/[log](https://drive.google.com/file/d/1eGS18rKvbgEJ3nbsXxokkMSwNGxxoX48/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/181hLvyrrPW2IOGA46fkxdJk0tNLIgdB2/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_800/finetune.sh)/[log](https://drive.google.com/file/d/1jYAHPcs7zt_QMPM2D_geEWoWrf3yHox8/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1xZCiaPF4w7lYmLt5o1D5tIZyDdLtJAvH/view?usp=sharing)
(w/o repeated aug) | 69.6 | 92.0 |
-| VideoMAE | ***no*** | ViT-B | 2400 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/pretrain.sh)/[log](https://drive.google.com/file/d/148nURgfcIFBQd3IQH5YhJ9dTwNCc2jkU/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1I18dY_7rSalGL8fPWV82c0-foRUDzJJk/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/finetune.sh)/[log](https://drive.google.com/file/d/1IRme58NHRTfcfdy1wfdph9AZMQT8zKv5/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1dt_59tBIyzdZd5Ecr22lTtzs_64MOZkT/view?usp=sharing) | 70.6 | 92.6 |
+| VideoMAE | ***no*** | ViT-B | 2400 | 16x2x3 | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/pretrain.sh)/[log](https://drive.google.com/file/d/148nURgfcIFBQd3IQH5YhJ9dTwNCc2jkU/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1I18dY_7rSalGL8fPWV82c0-foRUDzJJk/view?usp=sharing) | [script](scripts/ssv2/videomae_vit_base_patch16_224_tubemasking_ratio_0.9_epoch_2400/finetune.sh)/[log](https://drive.google.com/file/d/15TPBiUl_K2Q_9l6J41G_vf-2lovVLEHM/view?usp=sharing)/[checkpoint](https://drive.google.com/file/d/1dt_59tBIyzdZd5Ecr22lTtzs_64MOZkT/view?usp=sharing) | 70.8 | 92.4 |
### Note:
diff --git a/README.md b/README.md
index d92b84d..00e363e 100644
--- a/README.md
+++ b/README.md
@@ -12,6 +12,7 @@
> [Zhan Tong](https://github.com/yztongzhan), [Yibing Song](https://ybsong00.github.io/), [Jue Wang](https://juewang725.github.io/), [Limin Wang](http://wanglimin.github.io/)
Nanjing University, Tencent AI Lab
## ๐ฐ News
+**[2022.8.8]** We have fixed a bug ๐ in this [commit](https://github.com/MCG-NJU/VideoMAE/commit/2254c5eeeff30cda700622d8a24f14403eda4038) and the performance on Kinetics-400 can be improved by about 0.5%๐ฎ. Thank @JerryFlymi for help.
**[2022.7.7]** We have updated new results on downstream AVA 2.2 benckmark. Please refer to our [paper](https://arxiv.org/abs/2203.12602) for details.
**[2022.4.24]** Code and pre-trained models are available now! Please leave a starโญ๏ธ for our best efforts.๐
**[2022.4.15]** The **[LICENSE](https://github.com/MCG-NJU/VideoMAE/blob/main/LICENSE)** of this project has been upgraded to CC-BY-NC 4.0.
**[2022.3.24]** ~~Code and pre-trained models will be released here.~~ Welcome to **watch** this repository for the latest updates.
@@ -36,17 +37,17 @@ VideoMAE works well for video datasets of different scales and can achieve **85.
| Method | Extra Data | Backbone | Resolution | #Frames x Clips x Crops | Top-1 | Top-5 |
| :------: | :--------: | :------: | :--------: | :---------------------: | :---: | :---: |
-| VideoMAE | ***no*** | ViT-B | 224x224 | 16x2x3 | 70.6 | 92.6 |
-| VideoMAE | ***no*** | ViT-L | 224x224 | 16x2x3 | 74.2 | 94.7 |
-| VideoMAE | ***no*** | ViT-L | 224x224 | 32x1x3 | 75.3 | 95.2 |
+| VideoMAE | ***no*** | ViT-B | 224x224 | 16x2x3 | 70.8 | 92.4 |
+| VideoMAE | ***no*** | ViT-L | 224x224 | 16x2x3 | 74.3 | 94.6 |
+| VideoMAE | ***no*** | ViT-L | 224x224 | 32x1x3 | 75.4 | 95.2 |
### โจ Kinetics-400
| Method | Extra Data | Backbone | Resolution | #Frames x Clips x Crops | Top-1 | Top-5 |
| :------: | :--------: | :------: | :--------: | :---------------------: | :---: | :---: |
-| VideoMAE | ***no*** | ViT-B | 224x224 | 16x5x3 | 80.9 | 94.7 |
-| VideoMAE | ***no*** | ViT-L | 224x224 | 16x5x3 | 84.7 | 96.5 |
-| VideoMAE | ***no*** | ViT-L | 320x320 | 32x5x3 | 85.8 | 97.1 |
+| VideoMAE | ***no*** | ViT-B | 224x224 | 16x5x3 | 81.5 | 95.1 |
+| VideoMAE | ***no*** | ViT-L | 224x224 | 16x5x3 | 85.2 | 96.8 |
+| VideoMAE | ***no*** | ViT-L | 320x320 | 32x5x3 | 86.1 | 97.3 |
### โจ AVA 2.2
@@ -96,7 +97,7 @@ Zhan Tong: tongzhan@smail.nju.edu.cn
## ๐ Acknowledgements
-Thanks to [Ziteng Gao](https://sebgao.github.io/), Lei Chen and [Chongjian Ge](https://chongjiange.github.io/) for their kindly support.
+Thanks to [Ziteng Gao](https://sebgao.github.io/), Lei Chen, [Chongjian Ge](https://chongjiange.github.io/), and [Zhiyu Zhao](https://github.com/JerryFlymi) for their kindly support.
This project is built upon [MAE-pytorch](https://github.com/pengzhiliang/MAE-pytorch) and [BEiT](https://github.com/microsoft/unilm/tree/master/beit). Thanks to the contributors of these great codebases.
## ๐ License