Requirements

Code for paper: "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

Requirements

A suitable conda environment named AMC can be created and activated with:

conda env create -f environment.yaml
conda activate AMC

Data Prepearing

First, please download the coco dataset from here. We use COCO2014 in the paper. Then, you can process your data with this script:

python coco_preprocess.py \
    --coco_image_path /YOUR/COCO/PATH/train2014 \
    --coco_caption_file /YOUR/COCO/PATH/annotations/captions_train2014.json \
    --coco_instance_file /YOUR/COCO/PATH/annotations/instances_train2014.json \
    --output_dir /YOUR/DATA/PATH

Training

Before training, you need to change configs in train_boxnet.sh

ROOT_DIR: where to save all the results.
webdataset_base_urls: /YOUR/DATA/PATH/{xxx-xxx}.tar
model_path: stable diffusion v1-5 checkpoint

You can train the BoxNet through this script:

sh train_boxnet.sh $NODE_NUM $CURRENT_NODE_RANK $GPUS_PER_NODE

Text-to-Image Synthesis

With a trained BoxNet, you can start the Text-to-Image Synthesis with:

python test_pipeline_onestage.py \
	--stable_model_path /stable-diffusion-v1-5/checkpoint
	--boxnet_model_path /TRAINED/BOXNET/CKPT
	--output_dir /YOUR/SAVE/DIR

all the test prompt is saved in file test_prompts.json.

TODOs

Acknowledgements

This implementation is based on the repo from the diffusers library. Fengshenbang-LM codebase. DETR codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
boxnet_models		boxnet_models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
coco_preprocess.py		coco_preprocess.py
custom_dataset.py		custom_dataset.py
environment.yaml		environment.yaml
input_process.py		input_process.py
p2p.py		p2p.py
test_pipeline_onestage.py		test_pipeline_onestage.py
test_prompts.json		test_prompts.json
train_boxnet.py		train_boxnet.py
train_boxnet.sh		train_boxnet.sh
universal_checkpoint.py		universal_checkpoint.py
universal_datamodule.py		universal_datamodule.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for paper: "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

Requirements

Data Prepearing

Training

Text-to-Image Synthesis

TODOs

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

OPPO-Mente-Lab/attention-mask-control

Folders and files

Latest commit

History

Repository files navigation

Code for paper: "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

Requirements

Data Prepearing

Training

Text-to-Image Synthesis

TODOs

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages