Active Learning in Drug Discovery

Installation

We recommend creating a conda environment to manage the dependencies. Assumes Anaconda installation. Clone this repository:

git clone https://github.com/gitter-lab/active-learning-drug-discovery.git
cd active-learning-drug-discovery

Setup the active_learning_dd conda environment using the conda_env.yml file:

conda env create -f conda_env.yml
conda activate active_learning_dd

If you do not want GPU support, you can replace conda_env.yml with conda_cpu_env.yml.

Finally, install active_learning_dd with pip:

pip install -e .

Now check the installation is working correctly by running the sample data test:

cd chtc_runners
python sample_data_runner.py \
        --pipeline_params_json_file=../param_configs/sample_data_config.json \
        --hyperparams_json_file=../param_configs/experiment_PstP_hyperparams/sampled_hyparams/ClusterBasedWCSelector_609.json \
        --iter_max=5 \
        --no-precompute_dissimilarity_matrix \
        --initial_dataset_file=../datasets/sample_data/training_data/iter_0.csv.gz

You should see the following last prompt:

Finished testing sample dataset. Verified that hashed selection matches stored hash.

datasets

The datasets used in this study are: PriA-SSB target, 107 PubChem BioAssay targets, and PstP target.

The datasets will be uploaded to Zenodo in the near future.

The repository also contains a small dataset for testing: datasets/sample_data/.

active_learning_dd

The active_learning_dd subdirectory contains the main codebase for the iterative batched screening components. Consult the README in that subdirectory for details.

param_configs

This subdirectory contains json config files for strategies and experiments used in the thesis document. Consult the README in that subdirectory for details.

analysis_notebooks

This subdirectory contains Jupyter notebooks that preprocess the datasets, debug methods, analyze the results, and produce result images.

runner scripts

chtc_runners/ contains runner scripts for the experiments in the thesis document. chtc_runners/simulation_runner.py can be used as a starting template for your own runner script. chtc_runners/simulation_utils.py contains helper functions for pre- and post-processing iteration selections for retrospective experiments. Consult the README in that subdirectory for details.

Implemented Iterative Strategies

The following are the currently implemented strategies in active_learning_dd/next_batch_selector/ (see thesis document and hyperapameter examples in param_configs/):

ClusterBasedWeightSelector (CBWS): assigns exploitation-exploration weights to every cluster, splits the budget between exploit-explore, then select compounds from most exploitable clusters, followed by selecting most explorable clusters.
ClusterBasedRandom: randomly samples a cluster, then randomly samples compounds from within clusters.
InstanceBasedRandom: randomly samples compounds from the pool.
ClusterBasedDissimilar: samples clusters dissimilarly according to a dissimilarity measure which is by default fingerprint based.
InstanceBasedDissimilar: samples compounds dissimilarly from the pool.
MABSelector: Upper-Confidence-Bound (UCB) style solution from Multi-Armed Bandits (MAB). Assigns every cluster an upper-bound estimate of the reward that is a combination of a exploitation term and an exploration term. Samples clusters with the highest rewards.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
active_learning_dd		active_learning_dd
analysis_notebooks		analysis_notebooks
chtc_runners		chtc_runners
datasets/sample_data		datasets/sample_data
param_configs		param_configs
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_cpu_env.yml		conda_cpu_env.yml
conda_env.yml		conda_env.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active Learning in Drug Discovery

Installation

datasets

active_learning_dd

param_configs

analysis_notebooks

runner scripts

Implemented Iterative Strategies

About

Releases

Packages

Contributors 2

Languages

License

gitter-lab/active-learning-drug-discovery

Folders and files

Latest commit

History

Repository files navigation

Active Learning in Drug Discovery

Installation

datasets

active_learning_dd

param_configs

analysis_notebooks

runner scripts

Implemented Iterative Strategies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages