conda create -n pixedit python==3.9.0
conda activate pixedit
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
#Install git, if not available
conda install anaconda::git
git clone https://github.com/dair-iitd/PixEdit
cd PixEdit
pip install -r requirements.txt
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu118
#Install git-lfs, if not available
conda install anaconda::git-lfs
# SDXL-VAE, T5 checkpoints
git lfs install
git clone https://huggingface.co/PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers
git lfs install
#We use just the real image editing pairs
git clone https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit-Part2-3
cd SEED-Data-Edit-Part2-3/multi_turn_editing/images
cat multi_turn.tar.gz.part-* > multi_turn.tar.gz
#unzip the images
tar -xvf multi_turn.tar.gz
Follow this for setting up AURORA training data
We require all the dataset to be in the format required by Pixart-.json
files for both Seed-edit and Aurora datasets here
You can additionally use the following command to convert any dataset of your choice in the required format.
python tools/convert_data_pixedit.py [params] images_path output_path
We performed all training on a 8xA100 server. Set --nproc_per_node
according to your configuration.
python -m torch.distributed.launch --nproc_per_node=8 \
--master_port=12345 train_scripts/train.py \
configs/pixart_simga_config/editing_at_512.py \
--load-from output/pretrained_models/PixArt-Sigma-XL-2-512-MS.pth \
--work-dir output/run1 --report_to wandb --tracker_project_name PixEdit
Download the v1 trained checkpoint PixEdit-v1.pth or 🤗, place it in ckpt
folder.
python edit_image.py <image_path> <edit_instruction>
- Release Training and Inference Code.
- Release PixEdit-v1.
- PixEdit-v2
- Thanks to Pixart-$
\Sigma
$ for their wonderful codebase!
If you find this repository useful, please consider giving a star ⭐ and citation.
@misc{goswami2024grapegenerateplaneditframeworkcompositional,
title={GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis},
author={Ashish Goswami and Satyam Kumar Modi and Santhosh Rishi Deshineni and Harman Singh and Prathosh A. P and Parag Singla},
year={2024},
eprint={2412.06089},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.06089},
}