pyMolNetEnhancer is a python module integrating chemical class and substructure information within mass spectral molecular networks created through the Global Natural Products Social Molecular Networking (GNPS) platform. An analogous R package is available at https://github.com/madeleineernst/RMolNetEnhancer.
- Installation
- Map MS2LDA substructural information to mass spectral molecular networks (classical)
- Map MS2LDA substructural information to mass spectral molecular networks (feature based)
- Map chemical class information to mass spectral molecular networks
- Map chemical class and MS2LDA substructural information to mass spectral molecular networks
- Dependencies
- Main citation
- Other citations
- License
Install pyMolNetEnhancer with:
pip install pyMolNetEnhancer
In order to map substructural information to a mass spectral molecular network you need to:
- Create a molecular network through the Global Natural Products Social Molecular Networking (GNPS) platform
- Create an LDA experiment on http://ms2lda.org/ using the MGF clustered spectra downloaded from GNPS:
Then execute the code in Example_notebooks/Mass2Motifs_2_Network_Classical.ipynb line by line. The only things you need to specify are:
- Your GNPS job ID
- Your MS2LDA job ID
Note: Depending on the size of this file, a server connection timeout may occur. Alternatively, you may download the file manually at http://ms2lda.org/:
- User-defined parameters for mapping the Mass2Motifs onto the network
prob: minimal probability score for a Mass2Motif to be included. Default is 0.01.
overlap: minimal overlap score for a Mass2Motif to be included. Default is 0.3.
Important: The probability and overlap thresholds can be set within the ms2lda.org app as well under the Experimental Options tab. It is recommendable to do so when inspecting results in the web app. Importantly, the summary table contains filtered motif-document relations using the set thresholds in the web app.
top: This parameter specifies how many most shared motifs per molecular family (network component index) should be shown. Default is 5.
To visualize results import the .graphml output file into Cytoscape. To color edges based on shared Mass2Motifs in between nodes select 'Stroke Color' in the 'Edge' tab to the left and choose 'interaction' as Column and 'Discrete Mapping' as Mapping Type:
To color nodes by the most shared Mass2Motifs per molecular family (network component index) select 'Image/Chart' in the 'Node' tab to the left and select Mass2Motifs shown in 'TopSharedMotifs' in the Edge Table:
Alternatively the edges and nodes output files can also be loaded separately into Cytoscape. To this end import the 'Mass2Motifs_Edges_Classical.tsv' output file as network into Cytoscape. Select column 'CLUSTERID1' as Source Node, column 'interact' as Interaction Type and 'CLUSTERID2' as Target Node:
Then import the 'Mass2Motifs_Nodes_Classical.tsv' output file as table:
In order to map substructural information to a mass spectral molecular network created through the feature based workflow you need to:
- Create a feature based molecular network through the Global Natural Products Social Molecular Networking (GNPS) platform
- Create an LDA experiment on http://ms2lda.org/ using the MGF file created within MZmine (see GNPS documentation)
Then execute the code in Example_notebooks/Mass2Motifs_2_Network_FeatureBased.ipynb line by line. The only things you need to specify are:
- Your GNPS job ID
- Your MS2LDA job ID
Note: Depending on the size of this file, a server connection timeout may occur. Alternatively, you may download the file manually at http://ms2lda.org/:
- User-defined parameters for mapping the Mass2Motifs onto the network
prob: minimal probability score for a Mass2Motif to be included. Default is 0.01.
overlap: minimal overlap score for a Mass2Motif to be included. Default is 0.3.
Important: The probability and overlap thresholds can be set within the ms2lda.org app as well under the Experimental Options tab. It is recommendable to do so when inspecting results in the web app. Importantly, the summary table contains filtered motif-document relations using the set thresholds in the web app.
top: This parameter specifies how many most shared motifs per molecular family (network component index) should be shown. Default is 5.
To visualize results import the .graphml output file into Cytoscape. To color edges based on shared Mass2Motifs in between nodes select 'Stroke Color' in the 'Edge' tab to the left and choose 'interaction' as Column and 'Discrete Mapping' as Mapping Type:
To color nodes by the most shared Mass2Motifs per molecular family (network component index) select 'Image/Chart' in the 'Node' tab to the left and select Mass2Motifs shown in 'TopSharedMotifs' in the Edge Table:
Alternatively the edges and nodes output files can also be loaded separately into Cytoscape. To this end import the 'Mass2Motifs_Edges_Classical.tsv' output file as network into Cytoscape. Select column 'CLUSTERID1' as Source Node, column 'interact' as Interaction Type and 'CLUSTERID2' as Target Node:
Then import the 'Mass2Motifs_Nodes_Classical.tsv' output file as table:
In order to map chemical class information to a mass spectral molecular network you need to:
- Create a molecular network using the classical or feature based workflow through the Global Natural Products Social Molecular Networking (GNPS) platform
- Perform in silico structure annotation using Network Annotation Propagation (NAP), DEREPLICATOR or another tool of preference for in silico structure annotation
Then execute the code in Example_notebooks/ChemicalClasses_2_Network_Classical.ipynb or Example_notebooks/ChemicalClasses_2_Network_FeatureBased.ipynb line by line. The only things you need to specify are:
You can specify as many in silico annotation outputs as you wish. If you import results from applications different than NAP or DEREPLICATOR make sure that your input file is tab separated and includes a column named 'Scan' containing numeric identifiers matching the numeric node identifiers in the GNPS network and a column named 'SMILES' containing SMILES structures. Make sure that you include all results as dataframe list items in the 'matches' object. The object 'gnpslib' contains all GNPS library hits:
matches = [gnpslib, nap, derep]
In this notebook we use ChemAxon's molconvert to convert SMILES to InChIKeys. You can download a platform independent version of ChemAxon's Marvin here. Make sure to have molconvert installed and add the path to the environment:
path = '/Applications/MarvinSuite/bin/'
os.environ['PATH'] += ':'+path
To visualize results import the .graphml output file into Cytoscape. To color nodes based on the chemical subclass select 'Fill Color' in the 'Node' tab to the left and choose 'CF_subclass' as Column and 'Discrete Mapping' as Mapping Type:
To color nodes based on the chemical subclass select 'Fill Color' in the 'Node' tab to the left and choose 'CF_subclass_score' as Column and 'Continuous Mapping' as Mapping Type:
All columns related to chemical class information are labeled with 'CF_', and chemical class information at other hierarchical levels of the chemical taxonomy can be mapped analogously (e.g. CF_superclass, CF_superclass_score, CF_class, etc.). The .txt output file can also be imported as table into an already existing network in Cytoscape.
In order to map chemical class and MS2LDA substructural information to a mass spectral molecular network follow steps described in Map MS2LDA substructural information to mass spectral molecular networks (classical) and Map chemical class information to mass spectral molecular networks for classical molecular networking and steps described in Map MS2LDA substructural information to mass spectral molecular networks (feature based) and Map chemical class information to mass spectral molecular networks for feature based molecular networking. To create a graphml file containing both Mass2Motif as well as chemical class information do:
graphML_classy = make_classyfire_graphml(MG,final)
nx.write_graphml(graphML_classy, "Motif_ChemicalClass_Network_Classical.graphml", infer_numeric_types = True)
where 'MG' corresponds to the network with mapped Mass2Motifs and 'final' to the dataframe output created when mapping chemical class information. An example is shown in Example_notebooks/Mass2Motifs_2_Network_Classical.ipynb and Example_notebooks/Mass2Motifs_2_Network_FeatureBased.ipynb. To visualize the network in Cytoscape proceed as described in Map MS2LDA substructural information to mass spectral molecular networks (classical) and Map chemical class information to mass spectral molecular networks for classical molecular networking and steps described in Map MS2LDA substructural information to mass spectral molecular networks (feature based) and Map chemical class information to mass spectral molecular networks for feature based molecular networking.
python 3.6.5, collections 0.6.1, csv 1.0, functools, joblib 0.13.0, json 2.0.9, multiprocessing, networkx 2.1, operator, os, pandas 0.22.0, rdkit, re 2.2.1, requests 2.18.4, sys, time
https://www.biorxiv.org/content/10.1101/654459v1
https://github.com/madeleineernst/pyMolNetEnhancer
MolNetEnhancer uses molecular networking through GNPS:
Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; et al. Sharing and Community Curation of Mass Spectrometry Data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016, 34 (8), 828–837.
https://www.nature.com/articles/nbt.3597
MolNetEnhancer uses untargeted substructure exploration through MS2LDA:
van der Hooft, J.J.J.; Wandy, J.; Barrett, M.P.; Burgess, K.E.V.; Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. PNAS 2016, 113 (48), 13738-13743.
https://www.pnas.org/content/113/48/13738
MolNetEnhancer uses Network Annotation Propagation (NAP):
da Silva, R. R.; Wang, M.; Nothias, L.-F.; van der Hooft, J. J. J.; Caraballo-Rodríguez, A. M.; Fox, E.; Balunas, M. J.; Klassen, J. L.; Lopes, N. P.; Dorrestein, P. C. Propagating Annotations of Molecular Networks Using in Silico Fragmentation. PLoS Comput. Biol. 2018, 14 (4), e1006089.
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006089
MolNetEnhancer uses DEREPLICATOR:
Mohimani, H.; Gurevich, A.; Mikheenko, A.; Garg, N.; Nothias, L.-F.; Ninomiya, A.; Takada, K.; Dorrestein, P.C.; Pevzner, P.A. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 2017, 13, 30-37.
https://www.nature.com/articles/nchembio.2219
MolNetEnhancer uses automated chemical classification through ClassyFire:
Feunang, Y. D.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D.S. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 2016, 8, 61.
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-016-0174-y
This repository is available under the following license https://github.com/madeleineernst/pyMolNetEnhancer/blob/master/LICENSE