Skip to content

Latest commit

 

History

History
46 lines (26 loc) · 1.21 KB

dataset_prepare.md

File metadata and controls

46 lines (26 loc) · 1.21 KB

Setup

Create two folders imagenet_info and text_info in the current project directory

/path/to/DeCLIP/
├── docs/
├── experiments/
├── linklink/
├── prototype/
├── text_info/
├── imagenet_info/
...

Pretrain Dataset

YFCC15M Setup

  1. First Download our YFCC15M label file - Google Driver and put it into imagenet_info dir

  2. Download Image data, You have two ways to download Image data:

  • DownLoad by labels: Crawl the image by the url in label dirctely.
  • Filter by label: Download offical YFCC100M data, and Prepare the YFCC15M subset metadata pickle by the label.

Text

  1. Download our vocab file for Text encoder Google Driver
  2. put it into text_info dir

Downstream Dataset

Imagenet Setup

  1. DownLoad offical ImageNet Dataset
  2. DownLoad our ImageNet validation label file - Google Driver
  3. put it into imagenet_info dir