This repo is the official implementation of "Adaptive Frequency Filters As Efficient Global Token Mixers", by Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo
AFFNet is a lightweight neural network designed for efficient deployment on mobile devices, achieving superior accuracy and efficiency trade-offs compared to other lightweight network designs on a wide range of visual tasks, including visual recognition and dense prediction tasks. AFFNet, AFFNet-T and AFFNet-ET achieve 79.8%, 77.0% and 73.0% top-1 accuracy on ImageNet-1K dataset.
name | size | acc@1(%) | #params | FLOPs | download |
---|---|---|---|---|---|
AFFNet-ET | 256 |
73.0 | 1.4M | 0.4G | model/log/config |
AFFNet-T | 256 |
77.0 | 2.6M | 0.8G | model/log/config |
AFFNet | 256 |
79.8 | 5.5M | 1.5G | model/log/config |
name | size | mIOU(%) | #params | download |
---|---|---|---|---|
AFFNet-ET + deeplab | 256 |
33.0 | 2.2M | model/log/config |
AFFNet-T + deeplab | 256 |
36.9 | 3.5M | model/log/config |
AFFNet + deeplab | 256 |
38.4 | 6.9M | model/log/config |
name | size | mIOU(%) | #params | download |
---|---|---|---|---|
AFFNet-ET + deeplab | 256 |
76.1 | 2.2M | model/log/config |
AFFNet-T + deeplab | 256 |
77.8 | 3.5M | model/log/config |
AFFNet + deeplab | 256 |
80.5 | 6.9M | model/log/config |
- Clone the repository:
git clone https://github.com/microsoft/TokenMixers.git
cd TokenMixers/AFFNet/
-
Prepare the base enviroment, we use ubuntu20, python3.8, and cuda11.5. 8 A100 GPUs are used for training and evaluation.
-
Install required packages:
conda create -fyn AFFNet python=3.8
conda activate AFFNet
python -m pip install wandb ptflops einops
python -m pip install -r requirements.txt
python -m pip install psutil torchstat tqdm
python -m pip install --upgrade fvcore
python -m pip install complexPyTorch
Download the standard ImageNet-1K dataset from http://image-net.org, ADE20K dataset from https://groups.csail.mit.edu/vision/datasets/ADE20K/, and VOC dataset from http://host.robots.ox.ac.uk/pascal/VOC/ and construct the data like:
Dataset_Root
├── ImageNet
│ ├── train
│ │ ├── n01440764
│ │ │ ├── n01440764_10026.JPEG
│ │ │ ├── n01440764_10027.JPEG
│ │ │ ├── ...
│ │ ├── ...
│ ├── val
│ │ ├── n02093754
│ │ │ ├── ILSVRC2012_val_00000832.JPEG
│ │ │ ├── ILSVRC2012_val_00003267.JPEG
│ │ │ ├── ...
│ │ ├── ...
├── ADEChallengeData2016
│ ├── annotations
│ ├── images
│ ├── objectinfo150.txt
│ ├── sceneCategories.txt
├── VOCdevkit
├── rec_data
├── VOC2007
├── VOC2012
run the following command to train the model on 8 A100 GPUs Node:
python main_train.py --log-wandb --common.config-file <config_path> --common.results-loc <save_path>
replace the <config_path>
with the path of the config file (you can get from here ), and <save_path>
with the path to save the model and log files.
run the following command to evaluate the model on 8 A100 GPUs Node:
python main_eval.py --common.config-file <config_path> --common.results-loc <save_path> --model.classification.pretrained <model_path>
replace the <config_path>
with the path of the config file (you can get from here ), <save_path>
with the path to save the model and log files, and <model_path>
(you can get from here) with the path of the pretrained model.
If you find this code and work useful, please consider citing the following paper and star this repo. Thank you very much!
@inproceedings{huang2023adaptive,
title={Adaptive Frequency Filters As Efficient Global Token Mixers},
author={Huang, Zhipeng and Zhang, Zhizheng and Lan, Cuiling and Zha, Zheng-Jun and Lu, Yan and Guo, Baining},
booktitle={ICCV},
year={2023}
}
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.