Skip to content

Attention-Refocusing/attention-refocusing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention Refocusing

[Website][Demo]

This is the official implementation of the paper "Grounded Text-to-Image Synthesis with Attention Refocusing"

intro_small.mp4

Setup

conda create --name ldm_layout python==3.8.0
conda activate ldm_layout
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Inference

Teaser figure

Download the model GLIGEN and put them in gligen_checkpoints

Run with the prompts in HRS/Drawbench prompts :

python guide_gligen.py --ckpt [model_checkpoint]  --file_save [save_path] \
                       --type [category] --box_pickle [saved_boxes] --use_gpt4

Where

  • --ckpt: Path to the GLIGEN checkpoint
  • --file_save: Path to save the generated images
  • --type: The category to test (options include counting, spatial, color, size)
  • --box_pickle: Path to save the generated layout from GPT-4
  • --use_gpt4: Whether to use GPT-4 to generate the layout. If you're using GPT-4, set your GPT-4 API key as follows:
export OPENAI_API_KEY='your-api-key'

For instance, to generate images according to the layouts and prompts of the counting category:

python guide_gligen.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin --file_save counting_500 \
                       --type counting --box_pickle ../data_evaluate_LLM/gpt_generated_box/counting.p

To run with user input text prompts:

export OPENAI_API_KEY='your-api-key'
python inference.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin

We provide generated layout from GPT4 for HRS benchmark in the HRS boxes, DrawBench boxes
We also provide generated images from GLIGEN, and other baselines including Stable Diffusion, Attend-and-excite, MultiDiffusion, Layout-guidance, GLIGEN and ours here

Evaluation

Set up the environment, download detector models, and run evaluation for each category, see the evaluation.

Attention-refocusing with other baselines

ControlNet + attention-refocusing

Acknowledgments

This project is built on the following resources:

  • GLIGEN: Our code is built upon the foundational work provided by GLIGEN.

  • HRS: The evaluation component of our project has been adopted from HRS.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published