Skip to content

This repository contains code related to the automation of modeling efforts for the PINK project, focusing on training Graph Neural Networks (GNNs) for molecular property prediction from SMILES.

Notifications You must be signed in to change notification settings

ntua-unit-of-control-and-informatics/gnn-molecular-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fbcd3af · Jun 28, 2024
 
 
 
 
 
 
 
 

Repository files navigation

Automated Graph Neural Network Training for Molecular Property Prediction from SMILES

[Official Python implementation]

A repository with code concerning the automation of the modelling efforts for the PINK project.

Project Leader: Haralambos Sarimveis (hsarimv@central.ntua.gr)

Contributors: Giannis Pitoskas (jpitoskas@gmail.com),

Project Directory Structure

Project_dir/
    ├── data/
    ├── experiments/
    ├── models/
    └── src/

Source Code

The src/ directory contains the source code for training Graph Neural Networks (GNNs) using SMILES representations of molecules. For more detailed information about the source code and its usage, please refer to the internal README file located inside the src/ directory.

Models Implementation

The models/ directory contains class implementations for different types of graph neural networks (GNNs), designed to be easily configurable.

These implementations provide a flexible framework for constructing and training GNNs, allowing users to experiment with different architectures and hyperparameters to suit their specific needs.

Data Directory

A data/ directory is expected to be included in the project's root directory. This directory is intended to store datasets for different molecular properties (endpoints).

Each property is organized into its own subdirectory, and dataset files follow a consistent naming convention:

  • Subdirectory Format: Dataset subdirectories should follow the format data/{property}/
  • Naming Convention: Dataset files should follow the format {property}_dataset.csv

An example is given below:

data/
├── propertyA/
│   └── propertyA_dataset.csv
├── propertyB/
│   └── propertyB_dataset.csv
│
└── ...

Experiments Directory

The experiments/ directory is where training logs and metadata are stored. For more detailed information about the contents of this directory, please refer to the internal README file located inside the src/ directory.

Python Packages

Requirements

The project requires the following Python version:

  • Python: Version 3.10.9 or higher

The project requires the following Python packages:

  • NumPy: Version 1.24.1
  • Pandas: Version 1.5.3
  • Torch: Version 2.0.0+cu117
  • Torch Geometric: Version 2.4.0
  • Tqdm: Version 4.64.1
  • Scikit-learn: Version 1.4.1
  • RDKit: Version 2022.9.5

About

This repository contains code related to the automation of modeling efforts for the PINK project, focusing on training Graph Neural Networks (GNNs) for molecular property prediction from SMILES.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages