DreamLIP-DATA

license: cc-by-4.0 task_categories:

text-to-image
zero-shot-classification language:
en size_categories:
10M<n<100M

Dataset Card for DreamLIP-30M

Raw/Long/Short Caption	Huggingface Dataset
CC3M+YFCC15M+CC12M	Link

Dataset Description

Homepage: DreamLIP homepage
Repository: DreamLIP repository
Paper: DreamLIP: Language-Image Pre-training with Long Captions

Dataset Summary

DreamLIP-Long-Captions is a dataset consisting of ~30M image annotations, i.e. detailed long captions. In contrast with the curated style of other synthetic image caption annotations, DreamLIP-30M utilizes pre-trained Multi-modality Large Language Model to obtain detailed descriptions with an average length of 247. More precisely, the detailed descriptions are generated by asking the ShareGPT4V/InstructBLIP/LLava1.5 the question "Describe the image in detail". Meanwhile, we also provide the generated short caption by prompting "Describe the image in one sentence". The question of detailed long captions has little impact on the diversity of answers, so we can obtain comprehensive captions of each image.

Additional Information

Dataset Curators

Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen and Yujun Shen.

Licensing Information

We distribute the image url with long captions under a standard Creative Common CC-BY-4.0 license. The individual images are under their own copyrights.

Citation Information

@inproceedings{DreamLIP,
  title={DreamLIP: Language-Image Pre-training with Long Captions},
  author={Zheng, Kecheng and Zhang, Yifei and Wu, Wei and Lu, Fan and Ma, Shuailei and Jin, Xin and Chen, Wei and Shen, Yujun},
  booktitle={ECCV},
  year={2024}
}

Acknowledgements

This dataset is based on CC3M, and thanks for the nice work! We also thank InstructBLIP, ShareGPT4V and LLAVA for the pretrained models.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamLIP-DATA

Dataset Card for DreamLIP-30M

Table of Contents

Dataset Description

Dataset Summary

Additional Information

Dataset Curators

Licensing Information

Citation Information

Acknowledgements

About

Releases

Packages

ant-research/DreamLIP-DATA

Folders and files

Latest commit

History

Repository files navigation

DreamLIP-DATA

Dataset Card for DreamLIP-30M

Table of Contents

Dataset Description

Dataset Summary

Additional Information

Dataset Curators

Licensing Information

Citation Information

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages