-
Notifications
You must be signed in to change notification settings - Fork 118
Biochemical and Chemical Similarity Networks
###Building Biochemical and Chemical Similarity Networks Easiest way to follow along with example is to take a look at the accessory files:
-
power point presentation detailing all the steps
-
examples of an edge list, node attributes file and a cytoscape network file
or optionally download the full tutorial HERE.
Take a look at the power point presentation (see above), follow along with the calculations done in R (see below) and finally visualize the network in Cytoscape.
source("http://pastebin.com/raw.php?i=Y0YYEBia")
Take a look at some chemical identifiers here
Use PubChem compound identifier (CID).
#Pubchem CIDs = cids
cids # overview
nrow(cids) # how many
str(cids) # structure, wan't numeric
cids<-as.numeric(as.character(unlist(cids))) # hack to break factor
Based on KEGG reactant pairs (RPAIRS)
#making an edge list based on CIDs from KEGG reactant pairs
KEGG.edge.list<-CID.to.KEGG.pairs(cid=cids,database=get.KEGG.pairs(),lookup=get.CID.KEGG.pairs())
head(KEGG.edge.list)
dim(KEGG.edge.list) # a two column list with CID to CID connections based on KEGG RPAIS
# how did I get this?
#1) convert from CID to KEGG using get.CID.KEGG.pairs(), which is a table stored:https://gist.github.com/dgrapov/4964546
#2) get KEGG RPAIRS using get.KEGG.pairs() which is a table stored:https://gist.github.com/dgrapov/4964564
#3) return CID pairs
tanimoto.edges<-CID.to.tanimoto(cids=cids, cut.off = .7, parallel=FALSE)
head(tanimoto.edges)
# how did I get this?
#1) Use R package ChemmineR to querry Pubchem PUG to get molecular fingerprints
#2) calculate simialrity coefficient
#3) return edges with similarity above cut.off
After a little bit of formatting make combined KEGG + tanimoto edge list.
Now upload this and a node attributes table to Cytoscape to make an amazing network.
Here is an example of a network connected based chemical relationships (green edges) and structural similarities (gray edges). This network displays the results from a multivariate classification model to discriminate between two groups, whose individual values for key factors are shown as box plots within the nodes.