Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun

🚀 News

(Jul 11, 2023)
- Inference demo for object detection in Jupyter.
(May 31, 2023)
- Inference demo for image classification in Google Colab.
(Mar 22, 2023)
- Codes for prompt pretraining (POMP) on ImageNet-21K, cross-dataset and cross-task evaluation.
- Checkpoints of pre-trained POMP prompts, segmentation backbones, and detection backbones.

Highlights

Main Contributions

We introduce a prompt pre-training method POMP, which fisrt enables prompt learning on large-scale datasets like ImageNet-21K with over twenty-thousand classes.
POMP is memory and computation efficient. Compared with previous methods like CoOp, it achieves comparable accuracy on ImageNet-1K with only 19% GPU memory and 50% training time.
POMP achieves new SOTAs on various open-vocabulary visual recognition datasets and tasks.

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Pre-trained Models

Please follow the instructions at MODELS.md to prepare all pre-trained models.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results.

Contact

If you have any questions, please feel free to create an issue on this repository.

Citation

If you find this code useful for your research, please consider citing:

@article{ren2023pomp,
  title={Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition},
  author={Ren, Shuhuai and Zhang, Aston and Zhu, Yi and Zhang, Shuai and Zheng, Shuai and Li, Mu and Smola, Alex and Sun, Xu},
  journal={arXiv preprint arXiv:2304.04704},
  year={2023}
}

Acknowledgements

Our code is based on CoOp, MaPLe, Dassl, Detic and ZSSeg repositories. We thank the authors for releasing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clip		clip
configs		configs
datasets		datasets
docs		docs
scripts		scripts
third_party		third_party
trainers		trainers
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
align_uniform.py		align_uniform.py
clip_words.csv		clip_words.csv
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
validation_test.py		validation_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."

🚀 News

Highlights

Main Contributions

Installation

Data preparation

Pre-trained Models

Training and Evaluation

Contact

Citation

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

amazon-science/prompt-pretraining

Folders and files

Latest commit

History

Repository files navigation

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."

🚀 News

Highlights

Main Contributions

Installation

Data preparation

Pre-trained Models

Training and Evaluation

Contact

Citation

Acknowledgements

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages