Official implementation of the paper Collaborative Score Distillation for Consistent Visual Editing (NeurIPS 2023).
Subin Kim*1,
Kyungmin Lee*1,
June Suk Choi1,
Jongheon Jeong1,
Kihyuk Sohn2,
Jinwoo Shin1.
1KAIST, 2Google Research
paper | project page
| arXiv
TL;DR: Consistent zero-shot visual synthesis across various and complex visual modalities
Required packages you should install are listed below:
conda create -n csd python=3.8
conda activate csd
pip install torch==2.0.1 torchvision==0.15.2
pip install diffusers==0.20.0 transformers accelerate mediapy
# for consistency decoder
pip install git+https://github.com/openai/consistencydecoder.git
Run the following script with a single GPU.
python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/river.jpg' \
--batch=4 --tgt_prompt='turn into van gogh style painting' \
--guidance_scale=7.5 --image_guidance_scale=5
python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/sheeps.jpg' \
--batch=4 --tgt_prompt='turn the sheeps into wolves' \
--guidance_scale=7.5 --image_guidance_scale=5
To edit image of high resolution, we encode and decode in patch-wise. To do that, add '--stride_vae':
python csdedit_image.py --device=0 --svgd --fp16 --stride=16 \
--save_path='output/' --data_path='data/michelangelo.jpeg' \
--batch=8 --tgt_prompt='Re-imagine people are in galaxy' \
--guidance_scale=15 --image_guidance_scale=5 --stride_vae --lr=4.0
To edit the image with region-wise prompts while ensuring smooth transitions between patches with different instructions, do the following:
python csdedit_image_region.py --device 0 --svgd --fp16 \
--save_path 'output/' --data_path 'data/vienna.jpg' \
--tgt_prompt 'turn into sunny weather' 'turn into cloudy weather' 'turn into rainy weather' 'turn into snowy weather' \
--stride 16 --batch 4 --guidance_scale 15 --image_guidance_scale 5
python csdedit_video.py --device 0 --svgd --fp16 \
--save_path 'output/break/' --data_path 'data/break' \
--tgt_prompt="Change the color of his T-shirt to yellow" \
--guidance_scale=7.5 --image_guidance_scale=1.5 --lr=0.5 \
--rows 2 --cols 12 --svgd --num_steps 100
One can obtain 3D editing results by following the codebase of Instruct-NeRF2NeRF but with a few lines of adaptation, particularly for this line into that of CSD-Edit.
@inproceedings{
kim2023collaborative,
title={Collaborative score distillation for consistent visual editing},
author={Kim, Subin and Lee, Kyungmin and Choi, June Suk and Jeong, Jongheon and Sohn, Kihyuk and Shin, Jinwoo},
booktitle={Advances in Neural Information Processing Systems},
year={2023},
}