Skip to content

This repo contains the code for PreciseControl project [ECCV'24]

License

Notifications You must be signed in to change notification settings

rishubhpar/PreciseControl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control

   

1 VAL IISc     2 IIT Kharagpur     3 Avataar.ai    

TL;DR: Embedding a unique individual into the pre-trained diffusion model with:

✅ Single image personalization in a few minutes
✅ Fine-grained attribute control with background preservation
✅ Generate and interact with other (new person) concepts     
✅ Realistic composition of two faces with high quality identity preservation and selective attribute control

Fig1

Updates

  • 2024/07/15: Code released!

How It Work

Setup

Our code mainly bases on CelebBasis. Additionally it uses the following repositories Prompt-Mixing for delayed identity injection, Lora for efficient finetuning and GroundedSAM to obtain initial layout for two-person generation. To set up our environment, please run:

conda env create -f environment.yaml
conda activate sd
python -m pip install git+https://github.com/cloneofsimo/lora.git

Pretrained weights:

Copy the pretrained weights to './weights' folder, the directory structure is shown below:

PreciseControl/
  |-- weights/
      |--glint360k_cosface_r100_fp16_0.1/
          |-- backbone.pth (249MB)
      |--encoder/
          |-- e4e_ffhq_encode.pt(~1.1GB)
          |-- shape_predictor_68_face_landmarks.dat
      |-- v2-1_512-ema-pruned.ckpt (~5.0GB)
      |-- model_ir_se50.pt
      |-- sam_vit_b_01ec64.pth (for multi person)
      |-- groundingdino_swint_ogc.pth (for multi person)
      

Usage

0. Face Alignment

To make the Face Recognition model work as expected, given an image of a person, we first align and crop the face following FFHQ-Dataset.

Put your input images in ./aug_images/comparision and run the following command with output path as ./aug_images/comparision/edited/, this will align & crop images the input images as per e4e requirement and save images in format required for lora finetuning. The above code also save the aligned images in ./aug_images/lora_finetune_comparision_data/ with each image in a folder structure required by dataloader for finetuning.

bash ./00_align_face.sh ./aug_images/comparision ./aug_images/comparision/edited/

For example, we provide some faces in ./aug_images/comparision/

1. Personalization

The training config file is ./configs/stable-diffusion/aigc_id_for_lora.yaml. The most important settings are listed as follows. The id_name folder structure should be which should be taken care by above command

id_name(eg: cook)
  |-- 0000/
      |-- img.jpg

Important Data Settings

data:
  params:
    batch_size: 2  # We use batch_size 2
    train:
      target: ldm.data.face_id.FFhq_dataset 
      params:
        root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name
        split: train
        use_aug: False
        image_size: 512
        limit_dataset_size: -1
        use_data_interpolation: False
        percentage_of_synthetic_data: 0.1
        lora_finetuning: True
        multiple_samples: True
    validation:
      target: ldm.data.face_id.FFhq_dataset
      params:
        root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name 

Important Training Settings

lightning:
  modelcheckpoint:
    params:
      every_n_train_steps: 20
  callbacks:
    image_logger:
      target: main.ImageLogger
      params:
        batch_frequency: 50
        max_images: 8
        increase_log_steps: False

  trainer:
    benchmark: True
    max_steps: 50
    accumulate_grad_batches: 8

Reduce the accumulate grad batches as per the GPU availablity, but for lower value increase the max_steps appropriately.

Training

# bash ./01_start_lora_finetuning.sh --model weights --folder_name_to_save_output
bash ./01_start_lora_finetuning.sh "./weights/v2-1_512-ema-pruned.ckpt" "id_name"

Consequently, a project folder named id_name is generated under ./logs.

2. Generation

Edit the prompt file ./infer_images/example_prompt_1.txt, where sks denotes the first identity. To get better identity increase lora scale parameter, but this might reduce text editability.

Testing

# bash ./02_start_test.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --is_lora_weight_used --batch_size --lora_iteration --lora_scale --image_name
bash ./02_start_test.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 4 49 0.2 image_name.jpg 

The generated images are under ./outputs/id_name/.

3. Attribute Edit

Edit the prompt file ./infer_images/example_prompt.txt, where sks denotes the first identity. image_name.jpg should be present inside ./aug_images/comparision/edited/ or else manually you have to change the root dir in code. There are some edits available in all_delta_w_dict.json file. You can check the keys and pass it as attr_name. Some attr present are - smile, beard, bang, age70, gender, eyeglasses, yellow(asian), black, eyesclose, white.

Testing attr edit

# bash ./02_start_test_pmm.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --(whether to add lora weights) --batch_size(use 1) --lora_it --lora_scale --image_name --edit_attr_name
bash ./02_start_test_pmm.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 1 49 0.2 image_name.jpg attr_name

This will generate a gif and list of images with different edit strength.

TODO

  • release code
  • multiple person generation

BibTex

@misc{rishubh2024precisecontrol,
      title={PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control},
      author={Rishubh Parihar, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
}

About

This repo contains the code for PreciseControl project [ECCV'24]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •