MasterWeaver

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

With one single reference image, our MasterWeaver can generate photo-realistic personalized images with diverse clothing, accessories, facial attributes and actions in various contexts.

Method

(a) Training pipeline of our MasterWeaver. To improve the editability while maintaining identity fidelity, we propose an editing direction loss for training. Additionally, we construct a face-augmented dataset to facilitate disentangled identity learning, further improving editability. (b) Framework of our MasterWeaver. It adopts an encoder to extract identity features and employ it with text to steer personalized image generation through cross attention.

By inputting paired text prompts that denote an editing operation, e.g., (a photo of a woman, a photo of a smiling woman), we identify the editing direction in the feature space of diffusion model. Then we align the editing direction of our MasterWeaver with that of original T2I model to improve the text controllability without affecting the identity.

Getting Started

Environment Setup

git clone https://github.com/csyxwei/MasterWeaver.git
cd MasterWeaver
conda create -n masterweaver python=3.9
conda activate masterweaver
pip install -r requirements.txt
pip install dlib==19.24.0

Inference

Download the dlib model and the face parsing model, and place them in the ./pretrained directory.

Download our pretrained model and save it to the ./pretrained directory.

Then, run the following command to perform inference:

# (optional for downloading model from huggingface)
# export HF_ENDPOINT=https://hf-mirror.com
python inference.py

We also provide the gradio demo, just run:

# (optional for downloading model from huggingface)
# export HF_ENDPOINT=https://hf-mirror.com
python gradio_app.py

Training

Please first prepare the dataset following instruction.

After that, we train the first stage model by running the following command:

## (optional for downloading the huggingface model)
# export HF_ENDPOINT="https://hf-mirror.com"
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR='/path/to/filtered_laion_faces/'
accelerate launch --num_processes 4 --multi_gpu --mixed_precision "no" train_masterweaver_stage1.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --image_encoder_path="openai/clip-vit-large-patch14" \
  --data_root_path=$DATA_DIR \
  --mixed_precision="no" \
  --resolution=512 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=100000 \
  --learning_rate=1e-06 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --dataloader_num_workers=16 \
  --output_dir="./adapter_experiments/masterweaver-stage1" \
  --save_steps=2000 \
  --vis_steps=200

Then, we tune the model using editing direction loss and the face-augmented dataset:

# (optional for downloading model from huggingface)
# export HF_ENDPOINT="https://hf-mirror.com"
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR='/path/to/filtered_laion_faces/'
accelerate launch --num_processes 4 --multi_gpu --mixed_precision "no" train_masterweaver_stage2.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --image_encoder_path="openai/clip-vit-large-patch14" \
  --adapter_path="./adapter_experiments/masterweaver-stage1/adapter_100000.pt" \
  --data_root_path=$DATA_DIR \
  --mixed_precision="no" \
  --resolution=512 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=100000 \
  --learning_rate=1e-06 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --lambda_edit=0.02 \
  --dataloader_num_workers=16 \
  --output_dir="./adapter_experiments/masterweaver-stage2" \
  --save_steps=2000 \
  --vis_steps=200

Citation

@inproceedings{wei2024masterweaver,
  title={MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation},
  author={Wei, Yuxiang and Ji, Zhilong and Bai, Jinfeng and Zhang, Hongzhi and Zhang, Lei and Zuo, Wangmeng},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

Acknowledgements

This code is built on diffusers and IP-Adapter. We thank the authors for sharing the codes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data_scripts		data_scripts
masterweaver		masterweaver
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets_laion.py		datasets_laion.py
gradio_app.py		gradio_app.py
inference.py		inference.py
requirements.txt		requirements.txt
train_masterweaver_stage1.py		train_masterweaver_stage1.py
train_masterweaver_stage2.py		train_masterweaver_stage2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MasterWeaver

Method

Getting Started

Environment Setup

Inference

Training

Citation

Acknowledgements

About

Releases

Packages

Languages

License

csyxwei/MasterWeaver

Folders and files

Latest commit

History

Repository files navigation

MasterWeaver

Method

Getting Started

Environment Setup

Inference

Training

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages