DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Yujun Shi Chuhui Xue Jiachun Pan Wenqing Zhang Vincent Y. F. Tan Song Bai

Disclaimer

This is a research project, NOT a commercial product.

News and Update

[Sept 3rd] v0.1.0 Release.
- Enable Dragging Diffusion-Generated Images.
- Introducing a new guidance mechanism that greatly improve quality of dragging results. (Inspired by MasaCtrl)
- Enable Dragging Images with arbitrary aspect ratio
- Adding support for DPM++Solver (Generated Images)
[July 18th] v0.0.1 Release.
- Integrate LoRA training into the User Interface. No need to use training script and everything can be conveniently done in UI!
- Optimize User Interface layout.
- Enable using better VAE for eyes and faces (See this)
[July 8th] v0.0.0 Release.
- Implement Basic function of DragDiffusion

Installation

It is recommended to run our code on a Nvidia GPU with a linux system. We have not yet tested on other configurations. Currently, it requires around 14 GB GPU memory to run our method. We will continue to optimize memory efficiency

To install the required libraries, simply run the following command:

conda env create -f environment.yaml
conda activate dragdiff

Run DragDiffusion

To start with, in command line, run the following to start the gradio user interface:

python3 drag_ui.py

You may check our GIF above that demonstrate the usage of UI in a step-by-step manner.

Basically, it consists of the following steps:

Step 1: train a LoRA

Drop our input image into the left-most box.
Input a prompt describing the image in the "prompt" field
Click the "Train LoRA" button to train a LoRA given the input image

Step 2: do "drag" editing

Draw a mask in the left-most box to specify the editable areas.
Click handle and target points in the middle box. Also, you may reset all points by clicking "Undo point".
Click the "Run" button to run our algorithm. Edited results will be displayed in the right-most box.

Explanation for parameters in the user interface:

General Parameters

Parameter	Explanation
prompt	The prompt describing the user input image (This will be used to train the LoRA and conduct "drag" editing).
lora_path	The directory where the trained LoRA will be saved.

Algorithm Parameters

These parameters are collapsed by default as we normally do not have to tune them. Here are the explanations:

Base Model Config

Parameter	Explanation
Diffusion Model Path	The path to the diffusion models. By default we are using "runwayml/stable-diffusion-v1-5". We will add support for more models in the future.
VAE Choice	The Choice of VAE. Now there are two choices, one is "default", which will use the original VAE. Another choice is "stabilityai/sd-vae-ft-mse", which can improve results on images with human eyes and faces (see explanation)

Drag Parameters

Parameter	Explanation
n_pix_step	Maximum number of steps of motion supervision. Increase this if handle points have not been "dragged" to desired position.
lam	The regularization coefficient controlling unmasked region stays unchanged. Increase this value if the unmasked region has changed more than what was desired (do not have to tune in most cases).
n_actual_inference_step	Number of DDIM inversion steps performed (do not have to tune in most cases).

LoRA Parameters

Parameter	Explanation
LoRA training steps	Number of LoRA training steps (do not have to tune in most cases).
LoRA learning rate	Learning rate of LoRA (do not have to tune in most cases)
LoRA rank	Rank of the LoRA (do not have to tune in most cases).

License

Code related to the DragDiffusion algorithm is under Apache 2.0 license.

BibTeX

@article{shi2023dragdiffusion,
  title={DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing},
  author={Shi, Yujun and Xue, Chuhui and Pan, Jiachun and Zhang, Wenqing and Tan, Vincent YF and Bai, Song},
  journal={arXiv preprint arXiv:2306.14435},
  year={2023}
}

Contact

For any questions on this project, please contact Yujun ([email protected])

Acknowledgement

This work is inspired by the amazing DragGAN. The lora training code is modified from an example of diffusers. Image samples are collected from unsplash, pexels, pixabay. Finally, a huge shout-out to all the amazing open source diffusion models and libraries.

Common Issues and Solutions

For users struggling in loading models from huggingface due to internet constraint, please 1) follow this links and download the model into the directory "local_pretrained_models"; 2) Run "drag_ui.py" and select the directory to your pretrained model in "Algorithm Parameters -> Base Model Config -> Diffusion Model Path".

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
local_pretrained_models		local_pretrained_models
lora		lora
release-doc		release-doc
utils		utils
LICENSE		LICENSE
README.md		README.md
drag_pipeline.py		drag_pipeline.py
drag_ui.py		drag_ui.py
environment.yaml		environment.yaml
test_lora.py		test_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Disclaimer

News and Update

Installation

Run DragDiffusion

Step 1: train a LoRA

Step 2: do "drag" editing

Explanation for parameters in the user interface:

General Parameters

Algorithm Parameters

License

BibTeX

Contact

Acknowledgement

Related Links

Common Issues and Solutions

About

Releases

Packages

Languages

License

tychodaimon/DragDiffusion

Folders and files

Latest commit

History

Repository files navigation

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Disclaimer

News and Update

Installation

Run DragDiffusion

Step 1: train a LoRA

Step 2: do "drag" editing

Explanation for parameters in the user interface:

General Parameters

Algorithm Parameters

License

BibTeX

Contact

Acknowledgement

Related Links

Common Issues and Solutions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages