Skip to content

Commit

Permalink
Add support for 3D RNA Graphs (#165)
Browse files Browse the repository at this point in the history
* add RNA atomic edge funcs

* add RNA 3D constants

* add bpRNA parser

* add Ryan's Nussinov algo implementation

* add radius of gyration feature

* add positional encoding of sequence

* update docstrings and add interface subgraphs

* docstring

* rm unused import

* add rich as a base requirement

* update version string

* black

* add additional RNA tests

* update rna docs

* finalise RNA tutorial with 3D structures

* merge

* add edge funcs

* add RNA graphs

* add rna subgraphs

* add rna features

* update rna constants

* resolve circular import

* move edge funcs to submodule

* make subgraphs editable downstream

* set node size multiplier to a saner default

* add smiles test file

* use rich-click for cli, minor linting

* add distance edges

* add RNA visualisation

* add nussinov test

* use rich-click instead of click

* add rna test structure

* skip nussinov cell to avoid timeout

* simplify af2 structure retrieval #168

* add plddt colouring for AF2

* linting

* bugfixes for AF2

* add edge distance func

* add graph feats to docs

* add fully connected and distance window edges

* remove erroneous import

* add molecule features

* add conformer generation and fragment graphs

* update top level init with version and loguru logging

* update changelog

* black, isort

* add loguru as dependency

* fix edge construction funcs

* fix rdkit util tests

* fix tests

* fix tests

* try converting everything to tensor

* fix edge distance func

* fix conversion

* add zinc & chembl utils

* add molecule modelling tutorial.ipynb

* add smilite dependency

* fix notebook test

* remove tests for molecule tutorial

* update requirements

* isort

* add sequence homology splitting

* update docs

* rollback nx version

* fix blast

* fix blast

* add antibody tutorial

* update notebook for docs

* fix dataset readme tables

* add notebooks

* update notebooks and docs

* remove antibody_dev example

* add line graph

* update rna tests

* update CLI

* update ML utils

* add dynamics features

* add degree oh

* fix naming

* add naming changes to RNA

* update molecule utils

* update chain graph tutorial

* update intersphinx mappings

* isort

* black

* fix test

* skip inconsistent mol test

* skip zinc tests

* add skip to ppi graph test if HGNC unavailable

* remove debug cell

* pin nx version

* remove nx version pin

* pin mpl chord dependency

* proper HETATM handling

* add pyg visualisation

* resolve dataframe handling

* update to tutorial for df handling

* restore plddt vis

* pin scipy version for plots

* unpin scipy dependency

* skip API calls in tutorials

* update changelog

* update version to 1.5.0rc1
  • Loading branch information
a-r-j authored Jun 30, 2022
1 parent 11407a1 commit f9ebad1
Show file tree
Hide file tree
Showing 87 changed files with 745,405 additions and 20,719 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ jobs:
run: conda install -c salilab dssp
- name: Install PyTorch
run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install BLAST
run: sudo apt install ncbi-blast+
- name: Install Graphein
run: pip install -e .
- name: Install Extras
Expand Down
5 changes: 3 additions & 2 deletions .requirements/base.in
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
biopandas
biopandas>=0.4.1
biopython
bioservices
click
deepdiff
loguru
matplotlib>=3.4.3
multipledispatch
networkx
Expand All @@ -11,6 +11,7 @@ pandas
plotly
pydantic
rich
rich-click
seaborn
pyyaml>=5.1,<6.*
scikit-learn
Expand Down
6 changes: 4 additions & 2 deletions .requirements/extras.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
biovec
propy3
pyaaisc
mpl_chord_diagram
rdkit-pypi
mpl_chord_diagram==0.3.2
rdkit-pypi
selfies
smilite
38 changes: 36 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,48 @@
### 1.5.0 - UNRELEASED

* [Patch] - [#178](https://github.com/a-r-j/graphein/pull/178) Fixes [#171](https://github.com/a-r-j/graphein/pull/171) and optimizes `graphein.protein.features.nodes.dssp`. Contribution by @avivko.
#### Protein

* [Feature] - #165 adds support for direct AF2 graph construction.
* [Feature] - #165 adds support for selecting model indices from PDB files.
* [Feature] - #165 adds support for extracting interface subgraphs from complexes.
* [Feature] - #165 adds support for computing the radius of gyration of a structure.
* [Feature] - #165 adds support for adding distances to protein edges.
* [Feature] - #165 adds support for fully connected edges in protein graphs.
* [Feature] - #165 adds support for distance window-based edges for protein graphs.
* [Feature] - #165 adds support for transformer-like positional encoding of protein sequences.
* [Feature] - #165 adds support for plddt-like colouring of AF2 graphs
* [Feature] - #165 adds support for plotting PyG Data object (e.g. for logging to WandB).
* [Feature] - [#170](https://github.com/a-r-j/graphein/pull/170) Adds support for viewing edges in `graphein.protein.visualisation.asteroid_plot`. Contribution by @avivko.
* [Feature] - #163 Adds support for conformer generation for SMILE inputs to molecule graph construction.
* [Patch] - [#178](https://github.com/a-r-j/graphein/pull/178) Fixes [#171](https://github.com/a-r-j/graphein/pull/171) and optimizes `graphein.protein.features.nodes.dssp`. Contribution by @avivko.
* [Patch] - [#174](https://github.com/a-r-j/graphein/pull/174) prevents insertions always being removed. Resolves [#173](https://github.com/a-r-j/graphein/issues/173). Contribution by @OliverT1.
* [Patch] - #165 Refactors HETATM selections.

#### Molecules

* [Feature] - #165 adds additional graph-level molecule features.
* [Feature] - #165 adds support for generating conformers (and 3D graphs) from SMILES inputs
* [Feature] - #163 Adds support for molecule graph generation from an RDKit.Chem.Mol input.
* [Feature] - #163 Adds support for multiprocess molecule graph construction.

#### RNA

* [Feature] - #165 adds support for 3D RNA graph construction.
* [Feature] - #165 adds support for generating RNA SS from sequence using the Nussinov Algorithm.

#### Changes

* [Patch] - #163 uses tqdm.contrib.process_map insteap of multiprocessing.Pool.map to provide progress bars in multiprocessing.
* [Fix] - #165 makes returned subgraphs editable objects rather than views
* [Fix] - #165 fixes global logging set to "debug".
* [Fix] - #165 uses rich progress for protein graph construction.
* [Fix] - #165 sets saner default for node size in 3d plotly plots
* [Dependency] - #165 Changes CLI to use rich-click instead of click for prettier formatting.
* [Package] - #165 Adds support for logging with loguru and rich
* [Package] - Pin BioPandas version to 0.4.1 to support additional parsing features.

#### Breaking Changes

* #165 adds RNA SS edges into graphein.protein.edges.base_pairing
* #163 changes separate filetype input paths to `graphein.molecule.graphs.construct_graph`. Interface is simplified to simply `path="some/path.extension"` instead of separate inputs like `mol2_path=...` and `sdf_path=...`.

### 1.4.0 - UNRELEASED
Expand Down
4 changes: 4 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ RUN apt-get update \
RUN apt-get update && apt-get install -y iputils-ping && apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Install BLAST
RUN apt-get update && apt-get install -y ncbi-blast+ && apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ENV CONDA_ALWAYS_YES=true


Expand Down
2 changes: 2 additions & 0 deletions datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ These datasets are unions of structural protein-ligand / protein-metal / protein
### [PROTEINS_NUCLEOTIDES](https://github.com/a-r-j/graphein/tree/master/datasets/proteins_nucleotides)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/a-r-j/graphein/blob/master/datasets/proteins_nucleotides/parse_dataset.ipynb)

|PDB Ligand |Ligand Name| # Proteins | # Residues | Dataset |
|---|---|---|---|---|
|[ATP](https://www.rcsb.org/ligand/ATP)| Adenosine Triphosphate | 313 | 127493 | [ATP313](https://github.com/a-r-j/graphein/blob/master/datasets/proteins_nucleotides/ATP.csv) |
Expand All @@ -54,6 +55,7 @@ These datasets are unions of structural protein-ligand / protein-metal / protein
### [PROTEINS_METAL](https://github.com/a-r-j/graphein/tree/master/datasets/proteins_metal)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/a-r-j/graphein/blob/master/datasets/proteins_metal/parse_dataset.ipynb)

|PDB Ligand |Ligand Name| # Proteins | # Residues | Dataset |
|---|---|---|---|---|
|[Fe](https://www.rcsb.org/ligand/FE)| Iron | 215 | 69779 | [Fe215](https://github.com/a-r-j/graphein/blob/master/datasets/proteins_metal/FE.csv) |
Expand Down
6 changes: 5 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
author = "Arian Jamasb"

# The full version, including alpha/beta/rc tags
release = "1.4.0"
release = "1.5.0rc1"


# -- General configuration ---------------------------------------------------
Expand Down Expand Up @@ -71,8 +71,12 @@
"xarray": ("https://xarray.pydata.org/en/stable/", None),
"pandas": ("https://pandas.pydata.org/docs/", None),
"scikit-learn": ("https://scikit-learn.org/stable/", None),
"sklearn": ("https://scikit-learn.org/stable/", None),
"scipy": ("https://docs.scipy.org/doc/scipy/reference/", None),
"Sphinx": ("https://www.sphinx-doc.org/en/stable/", None),
"networkx": ("https://networkx.github.io/documentation/stable/", None),
"nx": ("https://networkx.github.io/documentation/stable/", None),
"torch": ("https://pytorch.org/docs/master/", None),
}

mathjax_path = "https://cdn.jsdelivr.net/npm/mathjax@2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
Expand Down
1 change: 1 addition & 0 deletions docs/source/dataset_readme.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.. mdinclude:: ../../datasets/README.md
1 change: 1 addition & 0 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Datasets
==============

.. include:: dataset_readme.rst


Summaries
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ The repository can be found at `a-r-j/graphein <https://www.github.com/a-r-j/gra
:caption: Machine Learning

datasets
ml_examples

.. toctree::
:glob:
Expand Down
22 changes: 22 additions & 0 deletions docs/source/ml_examples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
==============
Examples
==============


Proteins
---------
.. toctree::
:maxdepth: 2
:glob:

notebooks/tdc_developability.nblink


Molecules
----------
.. toctree::
:maxdepth: 2
:glob:

notebooks/molecule_model_tutorial_tox.nblink
notebooks/splitting_a_dataset.nblink
15 changes: 15 additions & 0 deletions docs/source/modules/graphein.ml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,18 @@ Diffusion Matrices
-------------------
.. automodule:: graphein.ml.diffusion
:members:

Datasets
------------
.. automodule:: graphein.ml.datasets.torch_geometric_dataset
:members:

Dataset Splitting
-----------------
.. automodule:: graphein.ml.clustering
:members:

Utils
-------
.. automodule:: graphein.ml.utils
:members:
17 changes: 17 additions & 0 deletions docs/source/modules/graphein.molecule.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,23 @@ Visualisation
.. automodule:: graphein.molecule.visualisation
:members:


Utils
------
Utils
^^^^^^
.. automodule:: graphein.molecule.utils

ZINC
^^^^^
.. automodule:: graphein.molecule.zinc
:members:

ChEMBL
^^^^^^^^
.. automodule:: graphein.molecule.chembl
:members:

Constants
----------
.. automodule:: graphein.molecule.atoms
Expand Down
5 changes: 5 additions & 0 deletions docs/source/modules/graphein.protein.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ Node
.. automodule:: graphein.protein.features.nodes.geometry
:members:

Graph
^^^^^
.. automodule:: graphein.protein.features.graph.structure
:members:

Sequence
^^^^^^^^^
.. automodule:: graphein.protein.features.sequence.embeddings
Expand Down
37 changes: 36 additions & 1 deletion docs/source/modules/graphein.rna.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,56 @@
graphein.rna
================

Config
------
.. automodule:: graphein.rna.config
:members:

Graphs
------
.. automodule:: graphein.rna.graphs
:members:

Edges
-----
.. automodule:: graphein.rna.edges
Atomic
^^^^^^^
.. automodule:: graphein.rna.edges.atomic
:members:

Base Pairing
^^^^^^^^^^^^^
.. automodule:: graphein.rna.edges.base_pairing
:members:

Distance
^^^^^^^^^
.. automodule:: graphein.rna.edges.distance
:members:

Features
---------
Node
^^^^^
.. automodule:: graphein.rna.features.atom
:members:

Subgraphs
-----------
.. automodule:: graphein.rna.subgraphs
:members:


Visualisation
--------------
.. automodule:: graphein.rna.visualisation
:members:

Utils
-------
.. automodule:: graphein.rna.utils
:members:

Constants
---------
.. automodule:: graphein.rna.constants
Expand Down
3 changes: 2 additions & 1 deletion docs/source/molecule_notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ Molecules
:maxdepth: 2
:glob:

notebooks/molecule_tutorial.nblink
notebooks/molecule_tutorial.nblink
notebooks/molecules_from_zinc_and_chembl.nblink
3 changes: 3 additions & 0 deletions docs/source/notebooks/molecule_model_tutorial_tox.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/molecule_model_tutorial_tox.ipynb"
}
3 changes: 3 additions & 0 deletions docs/source/notebooks/molecules_from_zinc_and_chembl.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/molecules_from_zinc_and_chembl.ipynb"
}
3 changes: 3 additions & 0 deletions docs/source/notebooks/splitting_a_dataset.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/splitting_a_dataset.ipynb"
}
3 changes: 3 additions & 0 deletions docs/source/notebooks/tdc_developability.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../notebooks/tdc_developability.ipynb"
}
20 changes: 12 additions & 8 deletions graphein/__init__.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
from graphein.utils.utils import *

# Graphein
# Author: Arian Jamasb <[email protected]>
# License: BSD 3 clause
# Code Repository: https://github.com/a-r-j/graphein
from .protein import *
from .rna import *
from .testing import *
from loguru import logger
from rich.logging import RichHandler

# from ._version import get_versions
from graphein.utils.utils import *

# from .protein import *
# from .rna import *
from .testing import *

__author__ = "Arian Jamasb <[email protected]>"
__version__ = "1.5.0rc1"


__version__ = "1.4.0" # get_versions()["version"]
# del get_versions
logger.configure(
handlers=[
{"sink": RichHandler(rich_tracebacks=True), "format": "{message}"}
]
)
9 changes: 3 additions & 6 deletions graphein/cli.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""Command line interface for graphein."""
import pathlib

import click
import networkx as nx
import rich_click as click

from graphein import __version__
from graphein.protein.graphs import construct_graph
Expand Down Expand Up @@ -37,14 +37,11 @@
)
def main(config_path, pdb_path, output_path):
"""Build the graphs and save them in output dir."""
config = None
if config_path:
config = parse_config(path=config_path)

config = parse_config(path=config_path) if config_path else None
if pdb_path.is_file():
pdb_paths = [pdb_path]
elif pdb_path.is_dir():
pdb_paths = [pdb for pdb in pdb_path.glob("*.pdb")]
pdb_paths = list(pdb_path.glob("*.pdb"))
else:
raise NotImplementedError(
"Given PDB path needs to point to either a pdb file or a directory with pdb files."
Expand Down
2 changes: 2 additions & 0 deletions graphein/ml/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from .clustering import *
from .conversion import GraphFormatConvertor
from .utils import add_labels_to_graph

try:
from .datasets import (
Expand Down
Loading

0 comments on commit f9ebad1

Please sign in to comment.