This repository contains an implementation of the methods described in Differentially Private Latent Diffusion Models. The code is based off a public implementation of Latent Diffusion Models, available here (commit a506df5
).
This project uses Conda as its package management tool which can downloaded here. Once installed, clone the repository. The remainder of this document will assume the project is stored in a directory called DP-LDM
.
Important: We strongly recommend using the Mamba solver for Conda as it dramatically speeds up environment creation.
cd DP-LDM/
conda env create -f environment.yaml
conda activate ldm
Once you have chosen a public/private dataset pair, there are three steps to training your own differentially private latent diffusion models. In each step, you will need to create a configuration file that specifies the hyperparameter of each model. Example config files can be found in DP-LDM/configs/
.
Step 1: Autoencoder Pretraining
CUDA_VISIBLE_DEVICES=0 python main.py --base <path to autoencoder yaml> -t --gpus 0,
Step 2: LDM Pretraining
CUDA_VISIBLE_DEVICES=0 python main.py --base <path to dm yaml> -t --gpus 0,
Step 3: Private Fine-tuning
Important: Due to implementation constraints, this step can only be run on a single GPU, specified by the --accelerator gpu
command line argument.
CUDA_VISIBLE_DEVICES=0 python main.py \
--base <path to fine-tune yaml> \
-t \
--gpus 0, \
--accelerator gpu
To sample from class-conditional models (e.g. MNIST, FMNIST, CIFAR10):
python sampling/cond_sampling_test.py \
-y path/to/config.yaml \
-ckpt path/to/checkpoint.ckpt \
-c 0 1 2 3 4 5 6 7 8 9
To sample from unconditional models (e.g. CelebA):
python sampling/unonditional_sampling.py \
--yaml path/to/config.yaml \
--ckpt path/to/checkpoint.ckpt
We evaulated our models using two metrics. Code for both is available in the repository. For both methods, first follow the section above to generate sufficiently many samples from your model.
For MNIST, to compute the accuracy, the command is :
python scripts/dpdm_downstreaming_classifier_mnist.py \
--train path/to/generated_train_images.pt \
--test path/to/real_test_images.pt
We also provide a script that combines sampling and accuracy computation
python scripts/mnist_sampling_and_acc.py \
--yaml path/to/config.yaml \
--ckpt path/to/checkpoint.ckpt
python txt2img.py \
--yaml path/to/config.yaml \
--ckpt path/to/checkpoint.ckpt \
--n_samples 30000 \
--outname txt2img_samples.pt
First, compute Inception network statistics for the real dataset
python fid/compute_dataset_stats.py \
--dataset ldm.data.celeba.CelebATrain \
--args size:32 \
--output celeba_train_stats.npz
Next, compute the statistics for the generated samples:
python fid/compute_samples_stats.py \
--samples celeba32_samples.pt \
--output celeba_samples_stats.npz
Finally, compute FID:
python fid/compute_fid.py \
--path1 celeba32_train_stats.npz \
--path2 celeba32_samples_stats.npz
We build our code on top of the Latent Diffusion repository. Thanks to the authors for open sourcing their code! We also borrow techniques from Transferring Pretrained Diffusion Probabilistic Models, and would like to thank the authors for privately sending us their code before making it public.
- Moved the implementation of the
DDPM
class to a new fileddpm_base.py
- Moved callbacks from
main.py
tocallbacks/*.py
- Added
glob.escape
to log folder parsing to support special characters - Changed name of checkpoint created on exception from
last.ckpt
toon_exception.ckpt
- Changed name of checkpoint created on signal from
last.ckpt
toon_signal.ckpt