Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Xiang Wang, Yiliang Lv, Changxin Gao, Nong Sang
[Paper].
[2022-11] Codes are available!
This repo is a modification on the TAdaConv repo.
Requirements:
- Python>=3.6
- torch>=1.5
- torchvision (version corresponding with torch)
- simplejson==3.11.1
- decord>=0.6.0
- pyyaml
- einops
- oss2
- psutil
- tqdm
- pandas
optional requirements
- fvcore (for flops calculation)
The general pipeline for using this repo is the installation, data preparation and running. See GUIDELINES.md.
You can download the Video-MAE pre-trained checkpoints from here.
Next please use this simple python script to convert the pre-trained checkpoints to adapt to our code base.
Then you need modify the TRAIN.CHECKPOINT_FILE_PATH
to the converted checkpoints for fine-tuning.
For detailed explanations on the approach itself, please refer to the paper.
For an example run, set the DATA_ROOT_DIR
, ANNO_DIR
, TRAIN.CHECKPOINT_FILE_PATH
and OUTPUT_DIR
in configs\projects\mar\ft-ssv2\vit_base_50%.yaml
, and run the command for the training:
python tools/run_net.py --cfg configs/projects/mar/ft-ssv2/vit_base_50%.yaml
If you find MAR useful for your research, please consider citing the paper as follows:
@article{qing2022mar,
title={Mar: Masked autoencoders for efficient action recognition},
author={Qing, Zhiwu and Zhang, Shiwei and Huang, Ziyuan and Wang, Xiang and Wang, Yuehuan and Lv, Yiliang and Gao, Changxin and Sang, Nong},
journal={arXiv preprint arXiv:2207.11660},
year={2022}
}