PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

Rishubh Parihar^1,*, Sachidanand VS^1,*, Sabarishwaran Mani^1,2,
Tejan Karmali³, Venkatesh Babu¹ (*)

¹ VAL IISc ² IIT Kharagpur ³ Avataar.ai

TL;DR: Embedding a unique individual into the pre-trained diffusion model with:

✅ Single image personalization in a few minutes
✅ Fine-grained attribute control with background preservation
✅ Generate and interact with other (new person) concepts
✅ Realistic composition of two faces with high quality identity preservation and selective attribute control

Updates

2024/07/15: Code released!

How It Work

Setup

Our code mainly bases on CelebBasis. Additionally it uses the following repositories Prompt-Mixing for delayed identity injection, Lora for efficient finetuning and GroundedSAM to obtain initial layout for two-person generation. To set up our environment, please run:

conda env create -f environment.yaml
conda activate sd
python -m pip install git+https://github.com/cloneofsimo/lora.git

Pretrained weights:

GroundedSAM
Stable Diffusion 2.1
CosFace R100 for face computing Identity Loss
Encoder4Editing (E4E).
PIPNet for face preprocessing (align and crop). PIPNet weights can be downloaded from this link (provided by @justindujardin) or our Baidu Yun Drive with extracting code: ygss. Please copy epoch59.pth and FaceBoxesV2.pth to PreciseControl/evaluation/face_align/PIPNet/weights/.
Mapper weights wt_mapper and copy it under logs directory

Copy the pretrained weights to './weights' folder, the directory structure is shown below:

PreciseControl/
  |-- weights/
      |--glint360k_cosface_r100_fp16_0.1/
          |-- backbone.pth (249MB)
      |--encoder/
          |-- e4e_ffhq_encode.pt(~1.1GB)
          |-- shape_predictor_68_face_landmarks.dat
      |-- v2-1_512-ema-pruned.ckpt (~5.0GB)
      |-- model_ir_se50.pt
      |-- sam_vit_b_01ec64.pth (for multi person)
      |-- groundingdino_swint_ogc.pth (for multi person)

Usage

0. Face Alignment

To make the Face Recognition model work as expected, given an image of a person, we first align and crop the face following FFHQ-Dataset.

Put your input images in ./aug_images/comparision and run the following command with output path as ./aug_images/comparision/edited/, this will align & crop images the input images as per e4e requirement and save images in format required for lora finetuning. The above code also save the aligned images in ./aug_images/lora_finetune_comparision_data/ with each image in a folder structure required by dataloader for finetuning.

bash ./00_align_face.sh ./aug_images/comparision ./aug_images/comparision/edited/

For example, we provide some faces in ./aug_images/comparision/

1. Personalization

The training config file is ./configs/stable-diffusion/aigc_id_for_lora.yaml. The most important settings are listed as follows. The id_name folder structure should be which should be taken care by above command

id_name(eg: cook)
  |-- 0000/
      |-- img.jpg

Important Data Settings

data:
  params:
    batch_size: 2  # We use batch_size 2
    train:
      target: ldm.data.face_id.FFhq_dataset 
      params:
        root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name
        split: train
        use_aug: False
        image_size: 512
        limit_dataset_size: -1
        use_data_interpolation: False
        percentage_of_synthetic_data: 0.1
        lora_finetuning: True
        multiple_samples: True
    validation:
      target: ldm.data.face_id.FFhq_dataset
      params:
        root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name

Important Training Settings

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 20
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 50
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True
    max_steps: 50
    accumulate_grad_batches: 8

Reduce the accumulate grad batches as per the GPU availablity, but for lower value increase the max_steps appropriately.

Training

# bash ./01_start_lora_finetuning.sh --model weights --folder_name_to_save_output
bash ./01_start_lora_finetuning.sh "./weights/v2-1_512-ema-pruned.ckpt" "id_name"

Consequently, a project folder named id_name is generated under ./logs.

2. Generation

Edit the prompt file ./infer_images/example_prompt_1.txt, where sks denotes the first identity. To get better identity increase lora scale parameter, but this might reduce text editability.

Testing

# bash ./02_start_test.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --is_lora_weight_used --batch_size --lora_iteration --lora_scale --image_name
bash ./02_start_test.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 4 49 0.2 image_name.jpg

The generated images are under ./outputs/id_name/.

3. Attribute Edit

Edit the prompt file ./infer_images/example_prompt.txt, where sks denotes the first identity. image_name.jpg should be present inside ./aug_images/comparision/edited/ or else manually you have to change the root dir in code. There are some edits available in all_delta_w_dict.json file. You can check the keys and pass it as attr_name. Some attr present are - smile, beard, bang, age70, gender, eyeglasses, yellow(asian), black, eyesclose, white.

Testing attr edit

# bash ./02_start_test_pmm.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --(whether to add lora weights) --batch_size(use 1) --lora_it --lora_scale --image_name --edit_attr_name
bash ./02_start_test_pmm.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 1 49 0.2 image_name.jpg attr_name

This will generate a gif and list of images with different edit strength.

TODO

release code
multiple person generation

BibTex

@misc{rishubh2024precisecontrol,
      title={PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control},
      author={Rishubh Parihar, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.idea		.idea
assets		assets
aug_images		aug_images
configs		configs
evaluation		evaluation
img		img
infer_images		infer_images
ldm		ldm
models		models
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
00_align_face.sh		00_align_face.sh
01_start_lora_finetuning.sh		01_start_lora_finetuning.sh
01_start_train.sh		01_start_train.sh
02_start_test.sh		02_start_test.sh
02_start_test_multi_diff_lora.sh		02_start_test_multi_diff_lora.sh
02_start_test_multi_diff_pmm.sh		02_start_test_multi_diff_pmm.sh
02_start_test_pmm.sh		02_start_test_pmm.sh
02_start_test_seed.sh		02_start_test_seed.sh
LICENSE		LICENSE
README.md		README.md
all_delta_w_dict.json		all_delta_w_dict.json
environment.yaml		environment.yaml
main.py		main.py
main_id_embed.py		main_id_embed.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

Updates

How It Work

Setup

Pretrained weights:

Usage

0. Face Alignment

1. Personalization

2. Generation

3. Attribute Edit

TODO

BibTex

About

Releases

Packages

Contributors 3

Languages

License

rishubhpar/PreciseControl

Folders and files

Latest commit

History

Repository files navigation

PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

Updates

How It Work

Setup

Pretrained weights:

Usage

0. Face Alignment

1. Personalization

2. Generation

3. Attribute Edit

TODO

BibTex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages