We require all our datasets to be under ./data
in the project root folder. The ./data
folder should look like this:
data/
|–– ucf101/
|–– cifar10/
|–– cifar100/
|–– caltech101/
|–– caltech256/
|–– imagenet/
|–– sun397/
|–– fgvcaircraft/
|–– birdsnap/
|–– stanfordcars/
|–– cub/
|–– flowers102/
|–– food101/
|–– oxfordpets/
|–– dtd/
|–– eurosat/
|–– imagenet-sketch/
|–– imagenet-r/
|–– country211/
In case you need to download your datasets to an external device or have them already downloaded at another location, you can simply create symbolic links inside ./data
pointing to the correct dataset location using:
ln -s /path/to/existing/dataset ./data/dataset
We detail the steps to prepare each dataset below. To ensure reproducibility and consistency to prior works, we utilize the CoOp val/test splits where possible. For cases where this is not possible, we provide our own val and test splits. For ImageNet, ImageNet-Sketch, ImageNet-R, CIFAR-10 and CIFAR-100, following previous works, we use the test set as the validation set.
- Create a folder named
ucf101/
under./data
. - Download the zip file
UCF-101-midframes.zip
from here and extract it to./data/ucf101/
. This zip file contains the extracted middle video frames. - Download
split_zhou_UCF101.json
from this link and put it under./data/ucf101
.
The directory structure should look like
ucf101/
|–– UCF-101-midframes/
|–– split_zhou_UCF101.json
- Create a folder named
cifar10/
under./data
. - The
dataloader
script will automatically download the CIFAR-10 dataset to this directory using the Pytorch dataloader.
The directory structure should look like
cifar10/
|–– cifar-10-batches-py
|–– cifar-10-python.tar.gz
- Create a folder named
cifar100/
under./data
. - The
dataloader
script will automatically download the CIFAR-100 dataset to this directory using the Pytorch dataloader.
The directory structure should look like
cifar100/
|–– cifar-100-python
|–– cifar-100-python.tar.gz
- Create a folder named
caltech101/
under./data
. - Download
101_ObjectCategories.tar.gz
from http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz and extract the file under./data/caltech101
. - Download
split_zhou_Caltech101.json
from this link and put it under./data/caltech101
.
The directory structure should look like
caltech101/
|–– 101_ObjectCategories/
|–– split_zhou_Caltech101.json
- Create a folder named
caltech256/
under./data
. - Download
256_ObjectCategories.tar
from https://data.caltech.edu/records/nyy15-4j048/files/256_ObjectCategories.tar and extract the file under./data/caltech256
. - Download
split_Caltech256.json
from this link and put it under./data/caltech256
.
The directory structure should look like
caltech256/
|–– 256_ObjectCategories/
|–– split_Caltech256.json
- Create a folder named
imagenet/
under./data
. - Download the dataset from the official website and extract the training and validation sets to
./data/imagenet
.
The directory structure should look like
imagenet/
|–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
|–– val/
- Create a folder named
sun397/
under./data
. - Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
- Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
- Extract these files under
./data/sun397/
. - Download
split_zhou_SUN397.json
from this link and put it under./data/sun397
.
The directory structure should look like
sun397/
|–– SUN397/
|–– split_zhou_SUN397.json
|–– ... # a bunch of .txt files
- Download the data from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz.
- Extract
fgvc-aircraft-2013b.tar.gz
and keep onlydata/
. - Move
data/
to./data
and rename the folder tofgvcaircraft/
.
The directory structure should look like
fgvcaircraft/
|–– images/
|–– ... # a bunch of .txt files
- Download the data from http://thomasberg.org/datasets/birdsnap/1.1/birdsnap.tgz.
- Extract
birdsnap.tgz
and ensure that it contains theget-birdsnap.py
script. - Run the
get-birdsnap.py
script resulting in the creation of a folder nameddownload
. - Move
download/
to./data
and rename the folder tobirdsnap/
. - Download
split_Birdsnap.json
from this link and put it under./data/birdsnap
.
The directory structure should look like
birdsnap/
|–– images/
|–– temp/
|–– split_Birdsnap.json
|–– ... # a bunch of .txt files
- Create a folder named
stanfordcars/
under./data
. - Download the train images http://ai.stanford.edu/~jkrause/car196/cars_train.tgz.
- Download the test images http://ai.stanford.edu/~jkrause/car196/cars_test.tgz.
- Download the train labels https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.
- Download the test labels http://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat.
- Download
split_zhou_StanfordCars.json
from this link and put it under./data/stanfordcars
.
The directory structure should look like
stanfordcars/
|–– cars_test/
|–– cars_test_annos_withlabels.mat
|–– cars_train/
|–– devkit/
|–– split_zhou_StanfordCars.json
- Download the data from https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz
- Extract CUB_200_2011 and keep only the
CUB_200_2011/
subfolder inside the extracted folder - Move
CUB_200_2011/
to./data
and rename the folder tocub
. - Download
split_CUB.json
from this link and put it under./data/cub
.
The directory structure should look like
cub/
|–– images/
|–– parts/
|–– attributes/
|–– split_CUB.json
|–– ... # a bunch of .txt files
- Create a folder named
flowers102/
under./data
. - Download the images and labels from https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz and https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat respectively.
- Download
cat_to_name.json
from here and put it under./data/flowers102
. - Download
split_zhou_OxfordFlowers.json
from here and put it under./data/flowers102
.
The directory structure should look like
flowers102/
|–– cat_to_name.json
|–– imagelabels.mat
|–– jpg/
|–– split_zhou_OxfordFlowers.json
- Download the dataset from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract the file
food-101.tar.gz
under./data
, resulting in a folder named./data/food-101/
. - Rename
./data/food-101
to./data/food101
. - Download
split_zhou_Food101.json
from here and put it under./data/food101
.
The directory structure should look like
food101/
|–– images/
|–– license_agreement.txt
|–– meta/
|–– README.txt
|–– split_zhou_Food101.json
- Create a folder named
oxfordpets/
under./data
. - Download the images from https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz.
- Download the annotations from https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz.
- Download
split_zhou_OxfordPets.json
from this link and put it under./data/oxfordpets
.
The directory structure should look like
oxfordpets/
|–– images/
|–– annotations/
|–– split_zhou_OxfordPets.json
- Download the dataset from https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz and extract it to
./data
. This should lead to./data/dtd/
. - Download
split_zhou_DescribableTextures.json
from this link and put it under./data/dtd
.
The directory structure should look like
dtd/
|–– images/
|–– imdb/
|–– labels/
|–– split_zhou_DescribableTextures.json
- Create a folder named
eurosat/
under./data
. - Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to
./data/eurosat/
. - Download
split_zhou_EuroSAT.json
from here and put it under./data/eurosat
.
The directory structure should look like
eurosat/
|–– 2750/
|–– split_zhou_EuroSAT.json
- Download the dataset from https://github.com/HaohanWang/ImageNet-Sketch.
- Extract the dataset to
./data/imagenet-sketch
. - Download
classnames.txt
to./data/imagenet-sketch/
from this link. The class names are copied from CLIP.
The directory structure should look like
imagenet-sketch/
|–– images/ # contains 1,000 folders whose names have the format of n*
|–– classnames.txt
- Create a folder named
imagenet-r/
under./data
. - Download the dataset from https://github.com/hendrycks/imagenet-r and extract it to
./data/imagenet-r/
. - Copy
./data/imagenet-sketch/classnames.txt
to./data/imagenet-r/
.
The directory structure should look like
imagenet-r/
|–– imagenet-r/ # contains 200 folders whose names have the format of n*
|–– classnames.txt
- Create a folder named
country211
under./data
. - Download the dataset following the instructions in https://github.com/openai/CLIP/blob/main/data/country211.md and extract it under
./data/country211
. - Download the metadata text file from here and put it under
./data/country211
. - Download the metadata python script from here and put it under
./data/country211
.
The directory structure should look like
country211/
|–– test
|–– train
|–– valid
|–– country-iso-mapping.txt
|–– country_iso_mapping.py
This README has been adapted from the amazing READMEs prepared by: