TL;DR: Embedding a unique individual into the pre-trained diffusion model with:
✅ Single image personalization in a few minutes
✅ Fine-grained attribute control with background preservation
✅ Generate and interact with other (new person) concepts
✅ Realistic composition of two faces with high quality identity preservation and selective attribute control
- 2024/07/15: Code released!
Our code mainly bases on CelebBasis. Additionally it uses the following repositories Prompt-Mixing for delayed identity injection, Lora for efficient finetuning and GroundedSAM to obtain initial layout for two-person generation. To set up our environment, please run:
conda env create -f environment.yaml
conda activate sd
python -m pip install git+https://github.com/cloneofsimo/lora.git
- GroundedSAM
- Stable Diffusion 2.1
- CosFace R100 for face computing Identity Loss
- Encoder4Editing (E4E).
- PIPNet for face preprocessing (align and crop). PIPNet weights can be downloaded from this link (provided by @justindujardin) or our Baidu Yun Drive with extracting code:
ygss
. Please copyepoch59.pth
andFaceBoxesV2.pth
toPreciseControl/evaluation/face_align/PIPNet/weights/
. - Mapper weights wt_mapper and copy it under logs directory
Copy the pretrained weights to './weights' folder, the directory structure is shown below:
PreciseControl/
|-- weights/
|--glint360k_cosface_r100_fp16_0.1/
|-- backbone.pth (249MB)
|--encoder/
|-- e4e_ffhq_encode.pt(~1.1GB)
|-- shape_predictor_68_face_landmarks.dat
|-- v2-1_512-ema-pruned.ckpt (~5.0GB)
|-- model_ir_se50.pt
|-- sam_vit_b_01ec64.pth (for multi person)
|-- groundingdino_swint_ogc.pth (for multi person)
To make the Face Recognition model work as expected, given an image of a person, we first align and crop the face following FFHQ-Dataset.
Put your input images in ./aug_images/comparision
and run the following command with output path as ./aug_images/comparision/edited/
,
this will align & crop images the input images as per e4e requirement and save images in format required for lora finetuning. The above code also save the aligned images in ./aug_images/lora_finetune_comparision_data/
with each image in a folder structure required by dataloader for finetuning.
bash ./00_align_face.sh ./aug_images/comparision ./aug_images/comparision/edited/
For example, we provide some faces in ./aug_images/comparision/
The training config file is ./configs/stable-diffusion/aigc_id_for_lora.yaml
.
The most important settings are listed as follows. The id_name folder structure should be which should be taken care by above command
id_name(eg: cook)
|-- 0000/
|-- img.jpg
Important Data Settings
data:
params:
batch_size: 2 # We use batch_size 2
train:
target: ldm.data.face_id.FFhq_dataset
params:
root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name
split: train
use_aug: False
image_size: 512
limit_dataset_size: -1
use_data_interpolation: False
percentage_of_synthetic_data: 0.1
lora_finetuning: True
multiple_samples: True
validation:
target: ldm.data.face_id.FFhq_dataset
params:
root_dir: "absoulute path for the id_name folder" e.g. /data/.../id_name
Important Training Settings
lightning:
modelcheckpoint:
params:
every_n_train_steps: 20
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 50
max_images: 8
increase_log_steps: False
trainer:
benchmark: True
max_steps: 50
accumulate_grad_batches: 8
Reduce the accumulate grad batches as per the GPU availablity, but for lower value increase the max_steps appropriately.
Training
# bash ./01_start_lora_finetuning.sh --model weights --folder_name_to_save_output
bash ./01_start_lora_finetuning.sh "./weights/v2-1_512-ema-pruned.ckpt" "id_name"
Consequently, a project folder named id_name
is generated under ./logs
.
Edit the prompt file ./infer_images/example_prompt_1.txt
, where sks
denotes the first identity. To get better identity increase lora scale parameter, but this might reduce text editability.
Testing
# bash ./02_start_test.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --is_lora_weight_used --batch_size --lora_iteration --lora_scale --image_name
bash ./02_start_test.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 4 49 0.2 image_name.jpg
The generated images are under ./outputs/id_name/
.
Edit the prompt file ./infer_images/example_prompt.txt
, where sks
denotes the first identity. image_name.jpg
should be present inside
./aug_images/comparision/edited/
or else manually you have to change the root dir in code. There are some edits available in all_delta_w_dict.json
file. You can check the keys and pass it as attr_name. Some attr present are - smile, beard, bang, age70, gender, eyeglasses, yellow(asian), black, eyesclose, white
.
Testing attr edit
# bash ./02_start_test_pmm.sh sd_weights_path text_prompt_path --logs_folder_name "0 0 0 0" --(whether to add lora weights) --batch_size(use 1) --lora_it --lora_scale --image_name --edit_attr_name
bash ./02_start_test_pmm.sh "./weights/v2-1_512-ema-pruned.ckpt" "./infer_images/example_prompt_1.txt" id_name "0 0 0 0" True 1 49 0.2 image_name.jpg attr_name
This will generate a gif and list of images with different edit strength.
- release code
- multiple person generation
@misc{rishubh2024precisecontrol,
title={PreciseControl : Enhancing Text-to-Image Diffusion Models with Fine-Grained Attribute Control},
author={Rishubh Parihar, Sachidanand VS, Sabariswaran Mani, Tejan Karmali, R. Venkatesh Babu},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024},
}