We provide the download scripts for each of the pretraining datasets we used: CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics, SynthCI-30M, in their respective folders. For the Mayilvahanan et al. experiment, we have released the exact sample indices from LAION400M that are included in the final dataset here with the authors' permission---we thank the authors again for their great work!
For the zero-shot classification experiments, please follow the data download setup from the SuS-X github repository. For the retrieval experiments, we directly use the splits provided on huggingface: flickr1k and coco. For text-to-image generation experiments, we use the datasets from HEIM---for more details on these evaluations, please see the src/text_to_image_experiments
folder.