ToolBoxSF - Robustly interrogating machine learning based scoring functions: what are they learning?

Introduction

This repository contains the raw code and built singularity containers for the scoring function platform and toolbox described in the preprint "Robustly interrogating machine learning based scoring functions: what are they learning?" by Guy Durant, Fergus Boyles, Kris Birchall, Brian Marsden, and Charlotte M. Deane.

Installation

Singularity containers for each scoring function and baseline models in the paper, the data for benchmarks for any docking or modified poses and training scripts can be found at the following link:

Zenodo

The singularity containers can be built from scratch using .def files provided in the singularity_defs folder with the following commands:

singularity build <SINGULARITY_CONTAINER_NAME>.sif <SINGULARITY_RECIPE_FILE>.def

The original code can also be accessed rom the model_repos folder.

Note that the PDBBind crystal structures are not provided to download and instead require running of the scripts described in the data section below.

Data processing

PDBbind 2020 General data can be downloaded from the following link:

PDBBind Website

To process the raw files, use the following command, which requires the latest update of RDKit and tqdm to be installed:

python scripts/pdbbind_processing.py --pdbbind_dir <PDBBIND_DATA_DIR> --output_dir <OUTPUT_DIR>

The data can be processed as described in the paper using the following scripts:

python scripts/pdbbind_processing.py --pdbbind_dir <RAW_PDBBIND_DATA_DIR> --output_dir <OUTPUT_DIR>

Usage

It is recommended to use the singularity containers to run the scoring functions, they should be downloaded into their own separate folders. Running the training and/or validation will create a data folder, within which will be the computed features, saved models and results.

N.B. PointVS requires the pre-trained weights (48_compact__0 at ) for pose classification to be in the same folder.

The following commands can be used to run the scoring functions:

For training:

singularity exec --nv --home $(dirname $PWD) <SINGULARITY_CONTAINER_NAME>.sif bash toolboxsf --train --csv_file ../toolboxsf_training_csvs/casf_2016_train.csv --data_dir ../pdbbind_2020_general --model_name <MODEL_NAME>

For validation:

singularity exec --nv --home $(dirname $PWD) <SINGULARITY_CONTAINER_NAME>.sif bash toolboxsf --predict --val_csv_file ../toolboxsf_benchmarks/csv_files/casf_2016_test.csv --val_data_dir ../pdbbind_2020_general --model_name <TRAINED_MODEL_NAME>

Note OnionNet-2, SIGN, Pafnucy and PointVS require GPUs to run. All other models/containers can be run on CPUs.

Result CSV files produced can be (found in data/results) can be analysed using the following command:

python scripts/bootstrap_pearsonsr.py --filename <RESULT_CSV_FILE>

Creating training/test splits

The training/test splits used in the paper can be found in the toolboxsf_training_csvs folder and toolboxsf_benchmarks/csv_files folder respectively (zipped in the Zenodo). To create new training/test splits, create csv files with the following columns:

key: Unique key for each protein-ligand pair (typically a PDB code)
pk: Binding affinity as a pK value
protein: Relative path to protein structure PDB file (when combined with the data_dir argument, should point to the full path)
ligand: Relative path to ligand structure SDF file (when combined with the data_dir argument, should point to the full path)

The data_dir argument should point to the upper directory containing both the protein and ligand structure files.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
model_repos		model_repos
scripts		scripts
singularity_defs		singularity_defs
.gitmodules		.gitmodules
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToolBoxSF - Robustly interrogating machine learning based scoring functions: what are they learning?

Introduction

Installation

Data processing

Usage

Creating training/test splits

About

Releases

Packages

Languages

guydurant/toolboxsf

Folders and files

Latest commit

History

Repository files navigation

ToolBoxSF - Robustly interrogating machine learning based scoring functions: what are they learning?

Introduction

Installation

Data processing

Usage

Creating training/test splits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages