- Mar. 24, 2024. The inference code have been released.
- Nov. 28, 2023. ParaPrompts-400 and ParaImage-3k have been released.
- Nov. 15, 2023. Rep initialization.
ParaDiffusion an information-enriched diffusion model for paragraph-to-image generation task, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to align the text-image feature spaces in the generation task. A high-quality paragraph-image pair dataset, namely ParaImage is proposed to facilitate the training of long-text semantic alignment.
- Python >= 3.10 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 1.13.0+cu11.7
- diffusers == 0.20.0
diffusers
git clone https://github.com/weijiawu/ParaDiffusion
cd ParaDiffusion
conda create -n ParaDiffusion python=3.8
conda activate ParaDiffusion
pip install -r requirements.txt
Download our pretrained model for the ParaDiffusion:
mkdir -p weight
cd weight
# download the weight of ParaDiffusion to ./weight
git lfs install
git clone https://huggingface.co/weijiawu/ParaDiffusion
We provide two sets of UNet weights, and you can choose the corresponding one for testing and inference.
python demo.py
The proposed ParaImage dataset mainly includes two parts:
(a) ParaImage-Big: High-quality images with generative captions (ParaImage-Big) are primarily employed for the paragraph-image alignment learning in Stage 2.
(b) ParaImage-Small: Aesthetic images with manual long-term description (ParaImage- Small) are primarily used for quality-tuning in Stage 3.
ParaImage-Small is a few thousand high-quality images are thoughtfully selected from LAION-Aesthetics, adhering to common principles in photography, then professionally annotated by skilled annotators.
The ParaImage-Small can be download from Google Drive
✏️ New Prompts Eval: ParaPrompts-400
The current test prompts focus on short text-to-image generation, ignoring the evaluation for paragraph-to-image generation, we introduced a new evaluation set of prompts called ParaPrompts, including 400 long-text descriptions.
The previous prompts testing was mostly concentrated on text alignments within the range of 0-25 words, while our prompts extend to long-text alignments of 100 words or more.
@misc{wu2023paradiffusion,
title={Paragraph-to-Image Generation with Information-Enriched Diffusion Model},
author={Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang},
year={2023},
eprint={2311.14284},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Thanks to Diffusers for the wonderful work and codebase.