-
Notifications
You must be signed in to change notification settings - Fork 158
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Oops, forgot FlexiViT's readme, sorry! (#25)
- Loading branch information
1 parent
26c7bc8
commit b00544b
Showing
1 changed file
with
64 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# FlexiViT: One Model for All Patch Sizes | ||
*by Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic* | ||
|
||
## Introduction | ||
We publish all pre-trained FlexiViT models, and configurations for training | ||
those, as well as training logs for one run. | ||
|
||
Please read the main [big_vision README](/README.md) to learn how to run | ||
configs, and remember that each config file contains an example invocation in | ||
the top-level comment. | ||
|
||
## Pre-trained paper models | ||
|
||
Here are the models that we used as backbones in the paper. See Tables in the | ||
appendix of the paper for expected scores at various patch-sizes and on various | ||
datasets. | ||
|
||
First, the recommended models we used for all experiments. | ||
Remember that the input is 240px, not 224px: | ||
|
||
| Dataset | Model | Download link | Notes | | ||
| :--- | :---: | :---: | :---: | | ||
| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k.npz) | 1200ep version | | ||
| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k.npz) | 1200ep version | | ||
| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k.npz) | 1200ep version | | ||
| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_300ep.npz) | 300ep version. 1000ep version below is better but was not used in the paper for fair comparison to baselines. | | ||
| ImageNet-21k | ViT-B/16 | [link](https://storage.googleapis.com/big_vision/flexivit/vit_b16_i21k_300ep.npz) | Apples-to-apples non-flexi baseline used throughout the paper. | | ||
| ImageNet-21k | ViT-B/30 | [link](https://storage.googleapis.com/big_vision/flexivit/vit_b30_i21k_300ep.npz) | Apples-to-apples non-flexi baseline used throughout the paper. | | ||
|
||
These models can be used directly in our codebase by specifying | ||
`model_name = "proj.flexi.vit"` and `model_init = "FlexiViT-L i1k"` for example. | ||
See the file `models/proj/flexi/vit.py` for more names. | ||
|
||
*Important detail:* When further re-using these models with a flexible patch | ||
size, it is recommended to keep the patch-embedding parameter buffer at its | ||
original size, and change patch-size on the fly using pi-resize, as opposed to | ||
changing the parameter buffer's size at load-time. | ||
For re-using the models with a fixed patch size, either way is fine. | ||
(The reason is that it is impossible to chain multiple resizes without loss, | ||
eg doing 32->8->32 does not result in the original weights.) | ||
|
||
Second, the list of all released models for completeness: | ||
|
||
| Dataset | Model | Download link | Notes | | ||
| :--- | :---: | :---: | :---: | | ||
| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_1000ep.npz) | 1000ep version. Should be the best available -B model. | | ||
| ImageNet-21k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i21k_90ep.npz) | 90ep version | | ||
| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_600ep.npz) | 600ep version | | ||
| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_300ep.npz) | 300ep version | | ||
| ImageNet-1k | FlexiViT-L | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_l_i1k_90ep.npz) | 90ep version | | ||
| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_600ep.npz) | 600ep version | | ||
| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_300ep.npz) | 300ep version | | ||
| ImageNet-1k | FlexiViT-B | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_b_i1k_90ep.npz) | 90ep version | | ||
| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_600ep.npz) | 600ep version | | ||
| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_300ep.npz) | 300ep version | | ||
| ImageNet-1k | FlexiViT-S | [link](https://storage.googleapis.com/big_vision/flexivit/flexivit_s_i1k_90ep.npz) | 90ep version | | ||
|
||
## Results | ||
|
||
We provide full training logs for a run with this public code on Cloud that | ||
reproduces the FlexiViT-S 90ep on i1k results: | ||
- [metrics](https://storage.googleapis.com/big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254/big_vision_metrics.txt) | ||
- [config](https://storage.googleapis.com/big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254/config.json) | ||
- or `gs://big_vision/flexivit/deit3_i1k_s_90ep_12-15_2254`. |