This software package implements the Crystal Graph Convolutional Neural Networks (CGCNN) that takes an arbitary crystal structure to predict material properties.
The package provides two major functions:
- Train a CGCNN model with a customized dataset.
- Predict material properties of new crystals with a pre-trained CGCNN model.
The following paper describes the details of the CGCNN framework:
Please cite the following work if you want to use CGCNN.
@article{PhysRevLett.120.145301,
title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
author = {Xie, Tian and Grossman, Jeffrey C.},
journal = {Phys. Rev. Lett.},
volume = {120},
issue = {14},
pages = {145301},
numpages = {6},
year = {2018},
month = {Apr},
publisher = {American Physical Society},
doi = {10.1103/PhysRevLett.120.145301},
url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}
This package requires:
If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn
and install all prerequisites:
conda upgrade conda
conda create -n cgcnn python=3.6 scikit-learn pytorch=0.3.1 torchvision pymatgen -c pytorch -c matsci
Note: since PyTorch introduced some breaking changes in v0.4.0, this code only works up to v0.3.1
This creates a conda environment for running CGCNN. Before using CGCNN, activate the environment by:
source activate cgcnn
Then, in directory cgcnn
, you can test if all the prerequisites are installed properly by running:
python main.py -h
python predict.py -h
This should display the help messages for main.py
and predict.py
. If you find no error messages, it means that the prerequisites are installed properly.
After you finished using CGCNN, exit the environment by:
source deactivate
To input crystal structures to CGCNN, you will need to define a customized dataset. Note that this is required for both training and predicting.
Before defining a customized dataset, you will need:
- CIF files recording the structure of the crystals that you are interested in
- The target properties for each crystal (not needed for predicting, but you need to put some random numbers in
id_prop.csv
)
You can create a customized dataset by creating a directory root_dir
with the following files:
-
id_prop.csv
: a CSV file with two columns. The first column recodes a uniqueID
for each crystal, and the second column recodes the value of target property. If you want to predict material properties withpredict.py
, you can put any number in the second column. (The second column is still needed.) -
atom_init.json
: a JSON file that stores the initialization vector for each element. An example ofatom_init.json
isdata/sample-regression/atom_init.json
, which should be good for most applications. -
ID.cif
: a CIF file that recodes the crystal structure, whereID
is the uniqueID
for the crystal.
The structure of the root_dir
should be:
root_dir
├── id_prop.csv
├── atom_init.json
├── id0.cif
├── id1.cif
├── ...
There are two examples of customized datasets in the repository: data/sample-regression
for regression and data/sample-classification
for classification.
For advanced PyTorch users
The above method of creating a customized dataset uses the CIFData
class in cgcnn.data
. If you want a more flexible way to input crystal structures, PyTorch has a great Tutorial for writing your own dataset class.
Before training a new CGCNN model, you will need to:
- Define a customized dataset at
root_dir
to store the structure-property relations of interest.
Then, in directory cgcnn
, you can train a CGCNN model for your customized dataset by:
python main.py root_dir
You can set the number of training, validation, and test data with labels --train-size
, --val-size
, and --test-size
. For instance, data/sample-regression
has 10 data points in total. You can train a model by:
python main.py --train-size 6 --val-size 2 --test-size 2 data/sample-regression
You can also train a classification model with label --task classification
. For instance, you can use data/sample-classification
by:
python main.py --task classification --train-size 5 --val-size 2 --test-size 3 data/sample-classification
After training, you will get three files in cgcnn
directory.
model_best.pth.tar
: stores the CGCNN model with the best validation accuracy.checkpoint.pth.tar
: stores the CGCNN model at the last epoch.test_results.csv
: stores theID
, target value, and predicted value for each crystal in test set.
Before predicting the material properties, you will need to:
- Define a customized dataset at
root_dir
for all the crystal structures that you want to predict. - Obtain a pre-trained CGCNN model named
pre-trained.pth.tar
.
Then, in directory cgcnn
, you can predict the properties of the crystals in root_dir
:
python predict.py pre-trained.pth.tar root_dir
For instace, you can predict the formation energies of the crystals in data/sample-regression
:
python predict.py pre-trained/formation-energy-per-atom.pth.tar data/sample-regression
And you can also predict if the crystals in data/sample-classification
are metal (1) or semiconductors (0):
python predict.py pre-trained/semi-metal-classification.pth.tar data/sample-classification
Note that for classification, the predicted values in test_results.csv
is a probability between 0 and 1 that the crystal can be classified as 1 (metal in the above example).
After predicting, you will get one file in cgcnn
directory:
test_results.csv
: stores theID
, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset inid_prop.csv
, which is not important.
This software was primarily written by Tian Xie who was advised by Prof. Jeffrey Grossman.
CGCNN is released under the MIT License.