Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun
- (Jul 11, 2023)
- Inference demo for object detection in Jupyter.
- (May 31, 2023)
- Inference demo for image classification in Google Colab.
- (Mar 22, 2023)
- Codes for prompt pretraining (POMP) on ImageNet-21K, cross-dataset and cross-task evaluation.
- Checkpoints of pre-trained POMP prompts, segmentation backbones, and detection backbones.
- We introduce a prompt pre-training method POMP, which fisrt enables prompt learning on large-scale datasets like ImageNet-21K with over twenty-thousand classes.
- POMP is memory and computation efficient. Compared with previous methods like CoOp, it achieves comparable accuracy on ImageNet-1K with only 19% GPU memory and 50% training time.
- POMP achieves new SOTAs on various open-vocabulary visual recognition datasets and tasks.
For installation and other package requirements, please follow the instructions detailed in INSTALL.md.
Please follow the instructions at DATASETS.md to prepare all datasets.
Please follow the instructions at MODELS.md to prepare all pre-trained models.
Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results.
If you have any questions, please feel free to create an issue on this repository.
If you find this code useful for your research, please consider citing:
@article{ren2023pomp,
title={Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition},
author={Ren, Shuhuai and Zhang, Aston and Zhu, Yi and Zhang, Shuai and Zheng, Shuai and Li, Mu and Smola, Alex and Sun, Xu},
journal={arXiv preprint arXiv:2304.04704},
year={2023}
}
Our code is based on CoOp, MaPLe, Dassl, Detic and ZSSeg repositories. We thank the authors for releasing their code.