Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu, Haidong Zhu and Ram Nevatia
Official implementation of Large Language Models are Good Prompt Learners for Low-Shot Image Classification.
We build our model based on Python 3.11
and PyTorch 2.2.0
. To prepare the environment, please follow the instructions below.
- Create a conda environment and install the requirements:
conda create -n llamp python=3.11 pip
- Enter the environment:
conda activate llamp
- Install the requirements:
pip install -r requirements.txt
- Install
DASSL
from this repo
Please follow this link to prepare the datasets. The datasets should be organized as follows:
$DATA/
├── imagenet/
├── caltech-101/
├── oxford_pets/
├── stanford_cars/
...
After downloading the data, set the DATA_FOLDER
variable in flags.py
to your data path.
For LLaMA-2 weights, please visit this link to obtain the access directly from Meta.
You can download preprocessed metadata from here or run the following command to preprocess the data:
PYTHONPATH='.' tools/run_feature_extraction_all.sh
After you obtain the preprocessed metadata, please organize them as follows:
$DATA/
├── imagenet/
│ ├── release_past_key_value.pt
│ ├── release_clip_text_embeddings.pt
├── caltech-101/
│ ├── release_past_key_value.pt
│ ├── release_clip_text_embeddings.pt
...
We provide LLaMP checkpoints of all 11 datasets for the base-to-novel generalization benchmark. They can be downloaded from here. After downloading the checkpoints, please organize them as follows:
checkpoints/
├── imagenet/
│ ├── release
│ | ├── *.t7
├── caltech-101/
├── oxford_pets/
├── stanford_cars/
...
To evaluate the model, run the following command:
CUDA_VISIBLE_DEVICES=0 TOKENIZERS_PARALLELISM=False deepspeed test_llamp.py --deepspeed_config deepspeed_config/zero2_a100_40g.json --naive_decoding --freeze_decoder_kv --freeze_decoder_ffn --visual_prompting --dataset $DATASET --logpath $LOGPATH
, where $DATASET
is the dataset name and $LOGPATH
is the path where checkpoints are saved.
$DATASET
should be one of the following: ImageNet
, Caltech101
, OxfordPets
, StanfordCars
, FGVCAircraft
, OxfordFlowers
, DescribableTextures
, Food101
, SUN397
, UCF101
, EuroSAT
.
Please run
bash scripts/launch/launch.sh $DATASET $SEED
to launch training. $DATASET
is the dataset name and $SEED
is the random seed chosen from 1, 2 and 3.
$DATASET
should be one of the following: ImageNet
, Caltech101
, OxfordPets
, StanfordCars
, FGVCAircraft
, OxfordFlowers
, DescribableTextures
, Food101
, SUN397
, UCF101
, EuroSAT
.
If you find LLaMP useful in your research, please consider citing:
@InProceedings{Zheng_2024_Large,
title={Large Language Models are Good Prompt Learners for Low-Shot Image Classification},
author={Zheng, Zhaoheng and Wei, Jingmin and Hu, Xuefeng and Zhu, Haidong and Nevatia, Ram},
booktitle = {CVPR},
year = {2024},
}