arXiv | Project Page | Hugging Face
Yuseung Lee, Kunho Kim, Hyunjin Kim, Minhyuk Sung
- (2024.04.21) Added evaluation code in
eval_code/
. - (2023.12.28) Added code for loop-closed generation.
--loop_closure
- (2023.09.22) Released the Hugging Face demo for SyncDiffusion.
- (2023.09.22) 🎉 SyncDiffusion is accepted to NeurIPS 2023!
- (2023.09.18) Added the code for conditional generation based on ControlNet (Zhang et al.). Code is at
SyncControlNet/
. - (2023.08.14) Released the code for SyncDiffusion.
This repository contains the official implementation of SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions.
SyncDiffusion is a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. More results can be viewed on our project page.
The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score).
Our code is tested with Python 3.9, CUDA 11.3 and Pytorch 1.12.1.
First, clone our repository:
git clone https://github.com/KAIST-Geometric-AI-Group/SyncDiffusion.git
cd SyncDiffusion
Then you either can create a new conda environment:
conda env create -f environment.yaml
conda activate syncdiffusion
or install essential packages into an existing environment:
pip install -r requirements.txt
We provide a simple demo code for SyncDiffusion at notebooks/syncdiffusion_demo.ipynb
.
You can also generate a panorama with SyncDiffusion by:
sh sample_syncdiffusion.sh
We provide a Hugging Face Demo where you can directly run SyncDiffusion with custom prompts. You can also run a Gradio Demo for SyncDiffusion locally with the below code, with an easy control of the hyperparameters. First, install Gradio with
pip install gradio
and run the demo with
python gradio_syncdiffusion.py
- We have observed that
w=20
is a suitable weight value for SyncDiffusion in terms of image coherence and quality. However, you can freely test different weights by changing thesync_weight
parameter. - For computation efficiency, you can set the
sync_thres = N
so that SyncDiffusion computes the gradient for only the first N steps of the sampling process. The below figure shows the results ofN = 0, 3, 5, 50
.
If you find our work useful, please consider citing:
@article{lee2023syncdiffusion,
title={SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions},
author={Yuseung Lee and Kunho Kim and Hyunjin Kim and Minhyuk Sung},
journal={arXiv preprint arXiv:2306.05178},
year={2023}
}
Our code is heavily based on the official implementation of MultiDiffusion. We borrowed the Github template from SALAD.