Official Code Repository for the paper Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations (ICML 2022).
🔴UPDATE: We provide an seperate code repo for GDSS using Graph Transformer here!
In this repository, we implement the Graph Diffusion via the System of SDEs (GDSS).
- We propose a novel score-based generative model for graphs that overcomes the limitation of previous generative methods, by introducing a diffusion process for graphs that can generate node features and adjacency simultaneously via the system of SDEs.
- We derive novel training objectives to estimate the gradient of the joint log-density for the proposed diffusion process and further introduce an efficient integrator to solve the proposed system of SDEs.
- We validate our method on both synthetic and real-world graph generation tasks, on which ours outperforms existing graph generative models.
GDSS is built in Python 3.7.0 and Pytorch 1.10.1. Please use the following command to install the requirements:
pip install -r requirements.txt
For molecule generation, additionally run the following command:
conda install -c conda-forge rdkit=2020.09.1.0
We provide four generic graph datasets (Ego-small, Community_small, ENZYMES, and Grid) and two molecular graph datasets (QM9 and ZINC250k).
We additionally provide the commands for generating generic graph datasets as follows:
python data/data_generators.py --dataset ${dataset_name}
To preprocess the molecular graph datasets for training models, run the following command:
python data/preprocess.py --dataset ${dataset_name}
python data/preprocess_for_nspdk.py --dataset ${dataset_name}
For the evaluation of generic graph generation tasks, run the following command to compile the ORCA program (see http://www.biolab.si/supp/orca/orca.html):
cd evaluation/orca
g++ -O2 -std=c++11 -o orca orca.cpp
The configurations are provided on the config/
directory in YAML
format.
Hyperparameters used in the experiments are specified in the Appendix C of our paper.
We provide the commands for the following tasks: Generic Graph Generation and Molecule Generation.
To train the score models, first modify config/${dataset}.yaml
accordingly, then run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type train --config ${train_config} --seed ${seed}
for example,
CUDA_VISIBLE_DEVICES=0 python main.py --type train --config community_small --seed 42
and
CUDA_VISIBLE_DEVICES=0,1 python main.py --type train --config zinc250k --seed 42
To generate graphs using the trained score models, run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type sample --config sample_qm9
or
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type sample --config sample_zinc250k
We provide checkpoints of the pretrained models on the checkpoints/
directory, which are used in the main experiments.
ego_small/gdss_ego_small.pth
community_small/gdss_community_small.pth
ENZYMES/gdss_enzymes.pth
grid/gdss_grid.pth
QM9/gdss_qm9.pth
ZINC250k/gdss_zinc250k.pth
We also provide a checkpoint of improved GDSS that uses GMH blocks instead of GCN blocks in ScoreNetworkX_GMH
instead of ScoreNetworkX
). The numbers of training epochs are 800 and 1000 for snr
as 0.2 and scale_eps
as 0.8.
ZINC250k/gdss_zinc250k_v2.pth
If you found the provided code with our paper useful in your work, we kindly request that you cite our work.
@article{jo2022GDSS,
author = {Jaehyeong Jo and
Seul Lee and
Sung Ju Hwang},
title = {Score-based Generative Modeling of Graphs via the System of Stochastic
Differential Equations},
journal = {arXiv:2202.02514},
year = {2022},
url = {https://arxiv.org/abs/2202.02514}
}