Skip to content

Latest commit

 

History

History
296 lines (250 loc) · 11.9 KB

DATA.md

File metadata and controls

296 lines (250 loc) · 11.9 KB

Setting up the datasets

We require all our datasets to be under ./data in the project root folder. The ./data folder should look like this:

data/
|–– ucf101/
|–– cifar10/
|–– cifar100/
|–– caltech101/
|–– caltech256/
|–– imagenet/
|–– sun397/
|–– fgvcaircraft/
|–– birdsnap/
|–– stanfordcars/
|–– cub/
|–– flowers102/
|–– food101/
|–– oxfordpets/
|–– dtd/
|–– eurosat/
|–– imagenet-sketch/
|–– imagenet-r/
|–– country211/

In case you need to download your datasets to an external device or have them already downloaded at another location, you can simply create symbolic links inside ./data pointing to the correct dataset location using:

ln -s /path/to/existing/dataset ./data/dataset

We detail the steps to prepare each dataset below. To ensure reproducibility and consistency to prior works, we utilize the CoOp val/test splits where possible. For cases where this is not possible, we provide our own val and test splits. For ImageNet, ImageNet-Sketch, ImageNet-R, CIFAR-10 and CIFAR-100, following previous works, we use the test set as the validation set.

UCF101

  • Create a folder named ucf101/ under ./data.
  • Download the zip file UCF-101-midframes.zip from here and extract it to ./data/ucf101/. This zip file contains the extracted middle video frames.
  • Download split_zhou_UCF101.json from this link and put it under ./data/ucf101.

The directory structure should look like

ucf101/
|–– UCF-101-midframes/
|–– split_zhou_UCF101.json

CIFAR-10

  • Create a folder named cifar10/ under ./data.
  • The dataloader script will automatically download the CIFAR-10 dataset to this directory using the Pytorch dataloader.

The directory structure should look like

cifar10/
|–– cifar-10-batches-py
|–– cifar-10-python.tar.gz

CIFAR-100

  • Create a folder named cifar100/ under ./data.
  • The dataloader script will automatically download the CIFAR-100 dataset to this directory using the Pytorch dataloader.

The directory structure should look like

cifar100/
|–– cifar-100-python
|–– cifar-100-python.tar.gz

Caltech101

The directory structure should look like

caltech101/
|–– 101_ObjectCategories/
|–– split_zhou_Caltech101.json

Caltech256

The directory structure should look like

caltech256/
|–– 256_ObjectCategories/
|–– split_Caltech256.json

ImageNet

  • Create a folder named imagenet/ under ./data.
  • Download the dataset from the official website and extract the training and validation sets to ./data/imagenet.

The directory structure should look like

imagenet/
|–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
|–– val/

SUN397

The directory structure should look like

sun397/
|–– SUN397/
|–– split_zhou_SUN397.json
|–– ... # a bunch of .txt files

FGVCAircraft

The directory structure should look like

fgvcaircraft/
|–– images/
|–– ... # a bunch of .txt files

Birdsnap

  • Download the data from http://thomasberg.org/datasets/birdsnap/1.1/birdsnap.tgz.
  • Extract birdsnap.tgz and ensure that it contains the get-birdsnap.py script.
  • Run the get-birdsnap.py script resulting in the creation of a folder named download.
  • Move download/ to ./data and rename the folder to birdsnap/.
  • Download split_Birdsnap.json from this link and put it under ./data/birdsnap.

The directory structure should look like

birdsnap/
|–– images/
|–– temp/
|–– split_Birdsnap.json
|–– ... # a bunch of .txt files

StanfordCars

The directory structure should look like

stanfordcars/
|–– cars_test/
|–– cars_test_annos_withlabels.mat
|–– cars_train/
|–– devkit/
|–– split_zhou_StanfordCars.json

CUB

The directory structure should look like

cub/
|–– images/
|–– parts/
|–– attributes/
|–– split_CUB.json
|–– ... # a bunch of .txt files

Flowers102

The directory structure should look like

flowers102/
|–– cat_to_name.json
|–– imagelabels.mat
|–– jpg/
|–– split_zhou_OxfordFlowers.json

Food101

The directory structure should look like

food101/
|–– images/
|–– license_agreement.txt
|–– meta/
|–– README.txt
|–– split_zhou_Food101.json

OxfordPets

The directory structure should look like

oxfordpets/
|–– images/
|–– annotations/
|–– split_zhou_OxfordPets.json

DTD

The directory structure should look like

dtd/
|–– images/
|–– imdb/
|–– labels/
|–– split_zhou_DescribableTextures.json

EuroSAT

The directory structure should look like

eurosat/
|–– 2750/
|–– split_zhou_EuroSAT.json

ImageNet-Sketch

The directory structure should look like

imagenet-sketch/
|–– images/ # contains 1,000 folders whose names have the format of n*
|–– classnames.txt

ImageNet-R

The directory structure should look like

imagenet-r/
|–– imagenet-r/ # contains 200 folders whose names have the format of n*
|–– classnames.txt

Country211

The directory structure should look like

country211/
|–– test
|–– train
|–– valid
|–– country-iso-mapping.txt
|–– country_iso_mapping.py

Acknowledgements

This README has been adapted from the amazing READMEs prepared by: