From 50e99340343d4efd36096d462b019ed778485869 Mon Sep 17 00:00:00 2001 From: sonali Date: Mon, 14 Aug 2023 16:47:41 +0530 Subject: [PATCH 1/7] add bib file, add KEGGgraph.Rmd --- vignettes/KEGGgraph.Rmd | 495 ++++++++++++++++++++++++++++++++++++++++ vignettes/KEGGgraph.bib | 59 +++++ 2 files changed, 554 insertions(+) create mode 100644 vignettes/KEGGgraph.Rmd create mode 100644 vignettes/KEGGgraph.bib diff --git a/vignettes/KEGGgraph.Rmd b/vignettes/KEGGgraph.Rmd new file mode 100644 index 0000000..8a47da3 --- /dev/null +++ b/vignettes/KEGGgraph.Rmd @@ -0,0 +1,495 @@ +--- +title: "KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor" +author: + - name: "Jitao David Zhang" + - name: "Stefan Wiemann" +date: "`r format(Sys.time(), '%B %d, %Y')`" +output: + BiocStyle::html_document +bibliography: KEGGgraph.bib +reference-section-title: References +link-citations: yes +abstract: > + We demonstrate the capabilities of the `r Biocpkg("KEGGgraph")` package, an interface between KEGG pathways and graph model in R as well as a collection of tools for these graphs. Superior to preceding approaches, `r Biocpkg("KEGGgraph")` maintains the pathway topology and allows further analysis or dissection of pathway graphs. It parses the regularly updated KGML (KEGG XML) files into graph models maintaining all essential pathway attributes. +vignette: > + %\VignetteIndexEntry{ KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor } + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +# Introduction + +Since its first introduction in 1995, KEGG PATHWAY has been widely used as a +reference knowledge base for understanding biological pathways and functions of +cellular processes. The knowledge from KEGG has proven of great value by +numerous work in a wide range of fields [@Kanehisa08]. + +Pathways are stored and presented as graphs on the KEGG server side, where nodes +are molecules (protein, compound, etc) and edges represent relation types +between the nodes, e.g. activation or phosphorylation. The graph nature raised +our interest to investigate them with powerful graph tools implemented in **R** +and **Bioconductor** [@Gentleman04], including `r Biocpkg("graph")`, +`r Biocpkg("RBGL")` and `r Biocpkg("Rgraphviz")` [@Carey05]. While it is barely +possible to query the graph characteristics by manual parsing, a native and +straightforward client-side tool is currently missing to parse pathways and +analyze them consequently with tools for graph in **R**. + +To address this problem, we developed the open-source software package +`r Biocpkg("KEGGgraph")`, an interface between KEGG pathway and graph object as +well as a collection of tools to analyze, dissect and visualize these graphs. + +The package requires KGML (KEGG XML) files, which can be downloaded from KEGG +REST web service ((https://rest.kegg.jp/)) without license permission for +academic purposes. To demonstrate the functionality, in 'extdata/' sub-directory +of `r Biocpkg("KEGGgraph")` we have pre-installed several KGML files. + +# Software features + +`r Biocpkg("KEGGgraph")` offers the following functionalities: + +*Parsing*: It should be noted that one 'node' in KEGG pathway does not necessarily +map to merely one gene product, for example the node 'ERK' in the human TGF-Beta +signaling pathway contains two homologues, MAPK1 and MAPK3. Therefore, among +several parsing options, user can set whether to expand these nodes +topologically. Beyond facilitating the interpretation of pathways in a +gene-oriented manner, the approach also entitles unique identifiers to nodes, +enabling merging graphs from different pathways. + +*Graph operations*: Two common operations on graphs are subset and merge. A +sub-graph of selected nodes and the edges in between are returned when subsetting, +while merging produces a new graph that contains nodes and edges of individual +ones. Both are implemented in `r Biocpkg("KEGGgraph")`. + +*Visualization*: `r Biocpkg("KEGGgraph")` provides functions to visualize KEGG +graphs with custom style. Nevertheless users are not restricted by them, +alternatively they are free to render the graph with other tools like the ones in +`r Biocpkg("Rgraphviz")`. + +Besides the functionalities described above, `r Biocpkg("KEGGgraph")` also has +tools for remote KGML file retrieval, graph feature study and other related +tasks. We will demonstrate them later in this vignette. + +# Case studies + +We load the `r Biocpkg("KEGGgraph")` by typing or pasting the following codes in +R command line: + +```{r lib, message=FALSE} +library(KEGGgraph) +``` + +## Get KGML files + +There are at least two possibilities to get KGML (KEGG XML) files: + + * Manual download from KEGG REST API, documented at at (https://www.kegg.jp/kegg/rest/keggapi.html). + * Automatic retrieval from KEGG REST API, available at (https://rest.kegg.jp), + with the function `retrieveKGML`. + +To retrieve KGML file automatically, one has to know the pathway identifier. +KEGG pathway identifiers have the form of [a-z]3-4[0-9]5, where the +three- or four-alphabet code represent the organism and the five digits +represent pathways. The same pathway in different species share the five-digit +code, which is the first parameter accepted by *retrieveKGML*. The second +parameter required is the alphabet-code of the organism. + +For example, the following codes retrieve p53 signaling pathway, which has the +digit-code *04115*, of the species *Macaca fascicularis* , or crab-eating +macaque, which has the three-alphabet code mcf: + +```{r remoteRetrieval, echo=TRUE, eval=TRUE} +tmp <- tempfile() +retrieveKGML("04115", organism="mcf", destfile=tmp, method="auto", quiet=TRUE) +``` + +Pathways and their identifiers can be browsed in the KEGG website +(https://genome.jp/kegg), or found with any search engine. + +## Parsing and graph feature query + +First we read in KGML file for human MAPK signaling pathway (with KEGG ID +hsa04010): + +```{r library, echo=TRUE} +mapkKGML <- system.file("extdata/hsa04010.xml", + package="KEGGgraph") +``` + +Once the file is ready, we can either parse them into an object of `KEGGPathway` +or an object of `graph`. `KEGGPathway` object maintains the information of the +pathway (title, link, organism, etc), while `graph` objects are more natural +approach and can be directly plugged in many other tools. To convert the pathway +into graph, we use + +```{r parsemapk, echo=TRUE, results='verbatim'} +mapkG <- parseKGML2Graph(mapkKGML,expandGenes=TRUE) +mapkG +``` + +Alternatively we can parse the KGML file first into an object of `KEGGpathway`, which can be later converted into the graph object, as the following lines show: + +```{r parsemapk2, echo=TRUE, results='verbatim'} +mapkpathway <- parseKGML(mapkKGML) +mapkpathway +mapkG2 <- KEGGpathway2Graph(mapkpathway, expandGenes=TRUE) +mapkG2 +``` + +There is no difference between graph objects derived from two approaches. + +The option 'expandGenes' in parsing controls whether the nodes of paralogues in +pathways should be expanded or not. Since one 'node' in KEGG pathway does not +necessarily map to only one gene/gene product (e.g. 'ERK' maps to MAPK1 and +MAPK3), the option allows expanding these nodes and takes care of copying +existing edges. + +Another option users may find useful is 'genesOnly', when set to TRUE, the nodes +of other types than 'gene' (compounds, for example) are neglected and the result +graph consists only gene products. This is especially desired when we want to +query network characteristics of gene products. Its value is set to 'TRUE' by +default. + +The following commands extract node and edge information: + +```{r nodeandedge, echo=TRUE, results='verbatim'} +mapkNodes <- nodes(mapkG) +nodes(mapkG)[1:3] +mapkEdges <- edges(mapkG) +edges(mapkG)[1] +``` + +Edges in KEGG pathways are directional, that is, an edge starting at node A +pointing to node B does not guarantee a reverse relation, although reciprocal +edges are also allowed. When listing edges, a list indexed with node names is +returned. Each item in the list records the nodes pointed to. + +We can also extract the node attributes specified by KEGG with `getKEGGnodeData`: + +```{r keggnodedata, echo=TRUE, results='verbatim'} +mapkGnodedata <- getKEGGnodeData(mapkG) +mapkGnodedata[[2]] +``` + +An alternative to use `gettKEGGnodeData` is + +```{r keggnodedataalt, echo=TRUE, eval=TRUE} +getKEGGnodeData(mapkG, 'hsa:5924') +``` + +, returning identical results. + +Similarly the `getKEGGedgeData` is able to extract edge information: + +```{r keggedgedata, echo=TRUE, results='verbatim'} +mapkGedgedata <- getKEGGedgeData(mapkG) +mapkGedgedata[[4]] +``` + +Alternatively the query above can be written as: + +```{r keggedgedataalt, echo=TRUE, eval=TRUE} +getKEGGedgeData(mapkG,'hsa:627~hsa:4915') +``` + +For `KEGGNode` and `KEGGEdge` objects, methods are implemented to fetch their +attributes, for example `getName`, `getType` and `getDisplayName`. Guides to use +these methods as well as examples can be found in help pages. + +This case study finishes with querying the degree attributes of the nodes in the +graph. We ask the question which nodes have the highest out- or in-degrees. +Roughly speaking the out-degree (number of out-going edges) reflects the +regulatory role, while the in-degree (number of in-going edges) suggests the +subjectability of the protein to intermolecular regulations. + +```{r inout, echo=TRUE, results='verbatim'} +mapkGoutdegrees <- sapply(edges(mapkG), length) +mapkGindegrees <- sapply(inEdges(mapkG), length) +topouts <- sort(mapkGoutdegrees, decreasing=T) +topins <- sort(mapkGindegrees, decreasing=T) +topouts[1:3] +topins[1:3] +``` + +## Graph subset and merge + +{#txt:subset}We demonstrate the subsetting of the graph with 25 randomly chosen +nodes of MAPK pathway graph: + +```{r subsetprepare, message=FALSE, results='verbatim'} +library(Rgraphviz) +set.seed(123) +randomNodes <- sample(nodes(mapkG), 25) +mapkGsub <- subGraph(randomNodes, mapkG) +mapkGsub +``` + +The subgraph is visualized in figure [1](fig:01), where nodes with in-degree or +out-degree in red and others in grey [^1]. And in the example we also +demonstrate how to convert KEGG ID into other other identifiers via the Entrez +GeneID. More details on the conversion of IDs can be found on page +\pageref{convertId}. + +```{r makeattr, echo=FALSE} +makeAttr <- function(graph, default, valNodeList) { + tmp <- nodes(graph) + x <- rep(default, length(tmp)); names(x) <- tmp + + if(!missing(valNodeList)) { + stopifnot(is.list(valNodeList)) + allnodes <- unlist(valNodeList) + stopifnot(all(allnodes %in% tmp)) + for(i in seq(valNodeList)) { + x[valNodeList[[i]]] <- names(valNodeList)[i] + } + } + return(x) +} +``` + + +```{r subsetplot, echo=TRUE, fig.cap="A random subgraph of MAPK signaling pathway", message=FALSE, fig=TRUE} +outs <- sapply(edges(mapkGsub), length) > 0 +ins <- sapply(inEdges(mapkGsub), length) > 0 +ios <- outs | ins + +## translate the KEGG IDs into Gene Symbol +if(require(org.Hs.eg.db)) { + ioGeneID <- translateKEGGID2GeneID(names(ios)) + nodesNames <- sapply(mget(ioGeneID, org.Hs.egSYMBOL, ifnotfound=NA), "[[",1) +} else { + nodesNames <- names(ios) +} +names(nodesNames) <- names(ios) + +nAttrs <- list(); +nAttrs$fillcolor <- makeAttr(mapkGsub, "lightgrey", list(orange=names(ios)[ios])) +nAttrs$label <- nodesNames +plot(mapkGsub, "neato", nodeAttrs=nAttrs, + attrs=list(node=list(fillcolor="lightgreen", + width="0.75", shape="ellipse"), + edge=list(arrowsize="0.7"))) +``` + +Another common operation on graphs is merging, that is, combining different +graphs together. It is inspired by the fact that many KEGG pathways embed other +pathway, for example MAPK signaling pathway embeds 6 pathways including Wnt +signaling pathway. `mergeGraphs` provides the possibility to merge them into one +graph for further analysis. Next we merge MAPK and Wnt signaling pathway into +one graph. The graphs to be merged should be organized into a list, and it is +commandary to use 'expandGenes=TRUE' option when parsing to make sure the nodes +are unique and indexed by KEGGID. + +```{r mergedemo, echo=TRUE, results='verbatim'} +wntKGML <- system.file("extdata/hsa04310.xml",package="KEGGgraph") +wntG <- parseKGML2Graph(wntKGML) +graphs <- list(mapk=mapkG, wnt=wntG) +merged <- mergeGraphs(graphs) +merged +``` + +We observe that the node number in the merged graph (`r numNodes(merged)`) is +less than the sum of two graphs (`r numNodes(mapkG)` and `r numNodes(wntG)` for +MAPK and Wnt pathway respectively), reflecting the crosstalk between the +pathways by sharing nodes. + +## Using other graph tools + +In **R** and **Bioconductor** there'are powerful tools for graph algorithms and +operations, including `r Biocpkg("graph")`, `r Biocpkg("Rgraphviz")` and +`r Biocpkg("RBGL")`. The KEGG graphs can be analyzed with their functionalities to +describe characterisitcs of the pathway and to answer biological relevant +questions. + +Here we demonstrate the use of other graph tools with asking the question which +nodes are of the highest importance in MAPK signalling pathway. To this end we +turn to relative betweenness centrality [@Aittokallio06], [@Carey05]. Betweenness +is a centrality measure of a node within a graph. Nodes that occur on many +shortest paths between other vertices have higher betweenness than those that do +not. It is scaled by the factor of (n-1)(n-2)/2 to get relative betweenness +centrality, where n is the number of nodes in the graph. Both measurements +estimate the importance or the role of the node in the graph. + +With the function implemented in `r Biocpkg("RBGL")`, our aim is to identify +most important nodes in MAPK signalling pathway. + +*Note*: the current version of `RBGL` (version 1.59.5) reports the error that +`BGL_brandes_betweeness_centrality` not available for `.Call()` for package +`r Biocpkg("RBGL")`. Therefore the execution has been suppressed for now. + +```{r bcc, echo=TRUE, eval=TRUE, results='verbatim'} +library(RBGL) +bcc <- brandes.betweenness.centrality(mapkG) +rbccs <- bcc$relative.betweenness.centrality.vertices[1L,] +toprbccs <- sort(rbccs,decreasing=TRUE)[1:4] +toprbccs +``` + +We identify the top 4 important nodes judged by betweenness centrality as MAP3K1 +(hsa:4214), GRB2 (hsa:2885), MAP2K2 (hsa:5605) and MAP2K1 (hsa:5604) (the +mapping between KEGG ID and gene symbol is done using `r Biocpkg("biomaRt")`, +see page \pageref{txt:biomaRt}). In figure [2](fig:02) we illustrate them as +well as their interacting partners in MAPK pathway. + + +```{r bccplot, eval=TRUE, echo=TRUE, fig.cap="Nodes with the highest relative betweenness centrality in MAPK pathway (in orange) and their interacting partners (in blue).", warning=FALSE, fig=TRUE} +toprbccName <- names(toprbccs) +toprin <- sapply(toprbccName, function(x) inEdges(mapkG)[x]) +toprout <- sapply(toprbccName, function(x) edges(mapkG)[x]) +toprSubnodes <- unique(unname(c(unlist(toprin), unlist(toprout), toprbccName))) +toprSub <- subGraph(toprSubnodes, mapkG) + +nAttrs <- list() +tops <- c("MAPK3K1","GRB2","MAP2K2","MAP2K1") +topLabels <- lapply(toprbccName, function(x) x); names(topLabels) <- tops +nAttrs$label <- makeAttr(toprSub, "", topLabels) +nAttrs$fillcolor <- makeAttr(toprSub, "lightblue", list(orange=toprbccName)) +nAttrs$width <- makeAttr(toprSub,"",list("0.8"=toprbccName)) + +plot(toprSub, "twopi", nodeAttrs=nAttrs, attrs=list(graph=list(start=2))) +``` + +# Other funtionalities + +Besides the ability to parse and operate on KEGG PATHWAY graph objects, the +`r Biocpkg("KEGGgraph")` package also provides functionalities to complement tasks +related to deal with KEGG pathways. We introduce some of them here, for a full +list of functions please see the package help file: + +```{r help, echo=TRUE, eval=FALSE} +help(package=KEGGgraph) +``` + +## Parsing chemical compound reaction network + +KEGG PATHWAY captures two kinds of network:the protein network and the chemical +network. The protein network consists *relations* (*edges*) between gene +products, while the chemical network illustrate the *reactions* between +chemical compounds. Since the metabolic pathway can be viewed both as a network +of proteins (enzymes) and as a network of chemical compounds, metabolic pathways +can be viewed as both protein networks and chemical networks, whereas regulatory +pathways are always viewed as protein networks only.`KEGGPathway` provides +methods to access this network. + +We show the example of Glycine, serine and threonine metabolism pathway. + + +```{r reactionexample, fig.cap="Reaction network built of chemical compounds: the orange nodes are the three compounds with maximum out-degree in this network.", echo=TRUE} +mapfile <- system.file("extdata/map00260.xml",package="KEGGgraph") +map <- parseKGML(mapfile) +map +reactions <- getReactions(map) +reactions[[1]] +``` + +Figure [3](fig:03) shows how to extract reactions from the pathway and to +build a directed graph with them. + +## Expand embedded pathways + +Function `parseKGMLexpandMaps` is a function to handle with pathways embedding +other pathways. For example, pancreatic cancer pathway embeds 9 other pathways +including MAPK and ErbB signaling pathway, cell cycle and apoptosis pathway, +etc. To parse them into one graph, the users only have to download the KGML file +and feed the file name to `parseKGMLexpandMaps`, the function parses the file, +analyze the embedded pathways, download their files from KEGG REST API service +automatically (alternatively a local repository can be specified for KGML files) +and merge the individual pathways into a single graph. For example, the +following single line parses MAPK signaling pathway with all its embedded +pathways + +```{r mapk14expand, echo=TRUE, eval=TRUE} +mapkGembed <- parseKGMLexpandMaps(mapkKGML) +``` + +As its name suggests, function `subGraphByNodeType` subsets the graph by node +type, the nodes to subset are those of the type given by the user. It is useful +when the KGML file was parsed with 'genesOnly=FALSE' option and later on the +user wants only certain kind of node, 'gene' for example, remained. The +following example shows how to use it. + +```{r subgraphbynode, echo=TRUE, results='verbatim'} +mapkGall <- parseKGML2Graph(mapkKGML,genesOnly=FALSE) +mapkGall +mapkGsub <- subGraphByNodeType(mapkGall, "gene") +mapkGsub +``` + +## Annotation + +`translateKEGGID2GeneID` translates KEGG identifiers (KEGGID) into Entrez +GeneID. For example, if we want to find the Entrez GeneID of the nodes in MAPK +pathway having the highest relative betweenness centrality, the following codes +do the job. + +```{r biomart, echo=TRUE, eval=FALSE, results='verbatim'} +toprbccKEGGID <- names(toprbccs) +toprbccKEGGID +toprbccGeneID <- translateKEGGID2GeneID(toprbccKEGGID) +toprbccGeneID +``` + +To convert GeneID to other identifiers, we recommend genome wide annotation +packages, for human it is `r Biocpkg("org.Hs.eg.db")` and the packages for other +organisms can be fount at +(http://www.bioconductor.org/packages/release/data/annotation/). To demonstrate +its use, we draw the sub-network in the figure [2](fig:02) again, whereas nodes +are now labeled with gene symbols. {#convertId} + +```{r orgHuman, message=FALSE, fig=TRUE} +if(require(org.Hs.eg.db)) { + tnodes <- nodes(toprSub) + tgeneids <- translateKEGGID2GeneID(tnodes) + tgenesymbols <- sapply(mget(tgeneids, org.Hs.egSYMBOL, ifnotfound=NA), "[[",1) + toprSubSymbol <- toprSub + nodes(toprSubSymbol) <- tgenesymbols + plot(toprSubSymbol, "neato",attrs=list(node=list(font=5, fillcolor="lightblue"))) +} +``` + +Alternatively, users could use R package `r Biocpkg("biomaRt")` [@Durinck05], +[@Huber08] for ID conversion, whereas it assumes that the user has an internet +connection. The following example shows how to translate the node hits we +acquired in the example above into HGNC symbols: + +```{r biomart2, eval=FALSE} +library(biomaRt) +hsapiens <- useMart("ensembl","hsapiens_gene_ensembl" ) +filters <- listFilters(hsapiens) +getBM(attributes=c("entrezgene","hgnc_symbol"), + filters="entrezgene", + values=toprbccGeneID, mart=hsapiens) +``` + +# Acknowledgement + +We thank Vincent Carey, Holger Frohlich and Wolfgang Huber for comments and +suggestions on the package, and the reviewers from the Bioconductor community. + +Many users have provided very helpful feedback. Here is an incomplete list of +them: Martin Morgan, Paul Shannon, Juliane Manitz and Alexander Gulliver +Bjørnholt Grønning. + +# Conclusion + +Before the release of `r Biocpkg("KEGGgraph")`, several **R/Bioconductor** +packages have been introduced and proven their usefulness in understanding +biological pathways with KEGG. However, `r Biocpkg("KEGGgraph")` is the first +package able to parse any KEGG pathways from KGML files into graphs. In +comparison, existing tools can not achieve the results we present here. They +either neglects the graph topology (*KEGG.db*), do not parse pathway networks +(*keggorth*), or are specialized for certain pathways (`r Biocpkg("cMAP")` and +`r Biocpkg("pathRender")`). + +With `r Biocpkg("KEGGgraph")`, we contribute a direct and natural approach to +KEGG pathways, and the possibilities to study them in **R** and +**Bioconductor**. + +# Session Info + +The script runs within the following session: + +```{r sessionInfo, echo=FALSE, results='verbatim'} +sessionInfo() +``` + +[^1]: The `makeAttr` function is used to assign nodes with rendering attributes, whose code can be found in the Rnw file. \ No newline at end of file diff --git a/vignettes/KEGGgraph.bib b/vignettes/KEGGgraph.bib new file mode 100644 index 0000000..47e4d3f --- /dev/null +++ b/vignettes/KEGGgraph.bib @@ -0,0 +1,59 @@ +@article{Gentleman04, + author = {Gentleman, Robert and others}, + title = {Bioconductor: open software development for computational biology and bioinformatics}, + journal = {Genome Biology}, + year = {2004}, + volume = {5}, + pages = {R80} +} + +@article{Carey05, + author = {Carey, Vincent and others}, + title = {Network structures and algorithms in Bioconductor}, + journal = {Bioinformatics}, + year = {2005}, + volume = {21}, + pages = {135-136} +} + +@article{Kanehisa08, + author = {Kanehisa, Minoru and others}, + title = {KEGG for linking genomes to life and the environment}, + journal = {Nucleic Acids Research, Database issue}, + year = {2008}, + volume = {36}, + pages = {480-484} +} + +@article{Klukas07, + author = {Klukas and Schreiber}, + title = {Dynamic exploration and editing of KEGG pathway diagrams}, + journal = {Bioinformatics}, + year = {2007}, + volume = {23}, + pages = {344-350} +} + +@article{Aittokallio06, + author = {Aittokallio and Schwikowski}, + title = {Graph-based methods for analysing networks in cell biology}, + journal = {Briefings in Bioinformatics}, + year = {2006}, + volume = {7}, + pages = {243-255} +} + +@article{Durinck05, + author = {Durinck and others}, + title = {BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis}, + journal = {Bioinformatics}, + year = {2005}, + volume = {21}, + pages = {3439-3440} +} + +@article{Huber08, + author = {Durinck and Huber}, + title = {R/Bioconductor package {\it biomaRt}}, + year = {2008} +} From a1fc2f3147174e1e1cf1d5387578204c73eb6eb2 Mon Sep 17 00:00:00 2001 From: sonali Date: Mon, 14 Aug 2023 16:53:41 +0530 Subject: [PATCH 2/7] add bioinformatics.csl --- vignettes/bioinformatics.csl | 134 +++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 vignettes/bioinformatics.csl diff --git a/vignettes/bioinformatics.csl b/vignettes/bioinformatics.csl new file mode 100644 index 0000000..7d2c861 --- /dev/null +++ b/vignettes/bioinformatics.csl @@ -0,0 +1,134 @@ + + From 0a9276485449fd5a8fbb9a6bd294909a8a637d4a Mon Sep 17 00:00:00 2001 From: sonali Date: Mon, 14 Aug 2023 16:59:25 +0530 Subject: [PATCH 3/7] added csl in .yaml --- vignettes/KEGGgraph.Rmd | 1 + 1 file changed, 1 insertion(+) diff --git a/vignettes/KEGGgraph.Rmd b/vignettes/KEGGgraph.Rmd index 8a47da3..8317f60 100644 --- a/vignettes/KEGGgraph.Rmd +++ b/vignettes/KEGGgraph.Rmd @@ -7,6 +7,7 @@ date: "`r format(Sys.time(), '%B %d, %Y')`" output: BiocStyle::html_document bibliography: KEGGgraph.bib +csl: bioinformatics.csl reference-section-title: References link-citations: yes abstract: > From 239993b60241f822ad4cb198d9f9ebbcd9f92455 Mon Sep 17 00:00:00 2001 From: sonali Date: Fri, 18 Aug 2023 19:22:12 +0530 Subject: [PATCH 4/7] removed .Rmd file, fix plot --- vignettes/KEGGgraph.Rmd | 66 ++++--- vignettes/KEGGgraph.Rnw | 426 ---------------------------------------- 2 files changed, 43 insertions(+), 449 deletions(-) delete mode 100644 vignettes/KEGGgraph.Rnw diff --git a/vignettes/KEGGgraph.Rmd b/vignettes/KEGGgraph.Rmd index 8317f60..00d86f9 100644 --- a/vignettes/KEGGgraph.Rmd +++ b/vignettes/KEGGgraph.Rmd @@ -11,7 +11,12 @@ csl: bioinformatics.csl reference-section-title: References link-citations: yes abstract: > - We demonstrate the capabilities of the `r Biocpkg("KEGGgraph")` package, an interface between KEGG pathways and graph model in R as well as a collection of tools for these graphs. Superior to preceding approaches, `r Biocpkg("KEGGgraph")` maintains the pathway topology and allows further analysis or dissection of pathway graphs. It parses the regularly updated KGML (KEGG XML) files into graph models maintaining all essential pathway attributes. + We demonstrate the capabilities of the `r Biocpkg("KEGGgraph")` package, + an interface between KEGG pathways and graph model in R as well as a collection + of tools for these graphs. Superior to preceding approaches, `r Biocpkg("KEGGgraph")` + maintains the pathway topology and allows further analysis or dissection of pathway + graphs. It parses the regularly updated KGML (KEGG XML) files into graph models + maintaining all essential pathway attributes. vignette: > %\VignetteIndexEntry{ KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor } %\VignetteEngine{knitr::rmarkdown} @@ -116,9 +121,9 @@ mapkKGML <- system.file("extdata/hsa04010.xml", package="KEGGgraph") ``` -Once the file is ready, we can either parse them into an object of `KEGGPathway` -or an object of `graph`. `KEGGPathway` object maintains the information of the -pathway (title, link, organism, etc), while `graph` objects are more natural +Once the file is ready, we can either parse them into an object of *KEGGPathway* +or an object of *graph*. *KEGGPathway* object maintains the information of the +pathway (title, link, organism, etc), while *graph* objects are more natural approach and can be directly plugged in many other tools. To convert the pathway into graph, we use @@ -127,7 +132,9 @@ mapkG <- parseKGML2Graph(mapkKGML,expandGenes=TRUE) mapkG ``` -Alternatively we can parse the KGML file first into an object of `KEGGpathway`, which can be later converted into the graph object, as the following lines show: +Alternatively we can parse the KGML file first into an object of +`KEGGpathway`, which can be later converted into the graph object, +as the following lines show: ```{r parsemapk2, echo=TRUE, results='verbatim'} mapkpathway <- parseKGML(mapkKGML) @@ -213,8 +220,8 @@ topins[1:3] ## Graph subset and merge -{#txt:subset}We demonstrate the subsetting of the graph with 25 randomly chosen -nodes of MAPK pathway graph: +We demonstrate the subsetting of the graph with 25 randomly chosen nodes of MAPK +pathway graph: ```{r subsetprepare, message=FALSE, results='verbatim'} library(Rgraphviz) @@ -224,11 +231,10 @@ mapkGsub <- subGraph(randomNodes, mapkG) mapkGsub ``` -The subgraph is visualized in figure [1](fig:01), where nodes with in-degree or +The subgraph is visualized in figure [1](#fig:01), where nodes with in-degree or out-degree in red and others in grey [^1]. And in the example we also demonstrate how to convert KEGG ID into other other identifiers via the Entrez -GeneID. More details on the conversion of IDs can be found on page -\pageref{convertId}. +GeneID. More details on the conversion of IDs can be found on section [Annotation](#convertId) ```{r makeattr, echo=FALSE} makeAttr <- function(graph, default, valNodeList) { @@ -248,7 +254,7 @@ makeAttr <- function(graph, default, valNodeList) { ``` -```{r subsetplot, echo=TRUE, fig.cap="A random subgraph of MAPK signaling pathway", message=FALSE, fig=TRUE} +```{r subsetplot, echo=TRUE, fig.cap="A random subgraph of MAPK signaling pathway", message=FALSE, warning=FALSE} outs <- sapply(edges(mapkGsub), length) > 0 ins <- sapply(inEdges(mapkGsub), length) > 0 ios <- outs | ins @@ -280,7 +286,7 @@ one graph. The graphs to be merged should be organized into a list, and it is commandary to use 'expandGenes=TRUE' option when parsing to make sure the nodes are unique and indexed by KEGGID. -```{r mergedemo, echo=TRUE, results='verbatim'} +```{r mergedemo, results='verbatim'} wntKGML <- system.file("extdata/hsa04310.xml",package="KEGGgraph") wntG <- parseKGML2Graph(wntKGML) graphs <- list(mapk=mapkG, wnt=wntG) @@ -317,7 +323,7 @@ most important nodes in MAPK signalling pathway. `BGL_brandes_betweeness_centrality` not available for `.Call()` for package `r Biocpkg("RBGL")`. Therefore the execution has been suppressed for now. -```{r bcc, echo=TRUE, eval=TRUE, results='verbatim'} +```{r bcc, eval=TRUE, results='verbatim'} library(RBGL) bcc <- brandes.betweenness.centrality(mapkG) rbccs <- bcc$relative.betweenness.centrality.vertices[1L,] @@ -327,9 +333,9 @@ toprbccs We identify the top 4 important nodes judged by betweenness centrality as MAP3K1 (hsa:4214), GRB2 (hsa:2885), MAP2K2 (hsa:5605) and MAP2K1 (hsa:5604) (the -mapping between KEGG ID and gene symbol is done using `r Biocpkg("biomaRt")`, -see page \pageref{txt:biomaRt}). In figure [2](fig:02) we illustrate them as -well as their interacting partners in MAPK pathway. +mapping between KEGG ID and gene symbol is done using `r Biocpkg("biomaRt")`). +In figure [2](#fig:02) we illustrate them as well as their interacting partners +in MAPK pathway. ```{r bccplot, eval=TRUE, echo=TRUE, fig.cap="Nodes with the highest relative betweenness centrality in MAPK pathway (in orange) and their interacting partners (in blue).", warning=FALSE, fig=TRUE} @@ -356,7 +362,7 @@ Besides the ability to parse and operate on KEGG PATHWAY graph objects, the related to deal with KEGG pathways. We introduce some of them here, for a full list of functions please see the package help file: -```{r help, echo=TRUE, eval=FALSE} +```{r help, eval=FALSE} help(package=KEGGgraph) ``` @@ -373,8 +379,7 @@ methods to access this network. We show the example of Glycine, serine and threonine metabolism pathway. - -```{r reactionexample, fig.cap="Reaction network built of chemical compounds: the orange nodes are the three compounds with maximum out-degree in this network.", echo=TRUE} +```{r reactionexample} mapfile <- system.file("extdata/map00260.xml",package="KEGGgraph") map <- parseKGML(mapfile) map @@ -382,9 +387,24 @@ reactions <- getReactions(map) reactions[[1]] ``` -Figure [3](fig:03) shows how to extract reactions from the pathway and to +Figure [3](#fig:03) shows how to extract reactions from the pathway and to build a directed graph with them. + +```{r cnexample, fig.cap="Reaction network built of chemical compounds: the orange nodes are the three compounds with maximum out-degree in this network."} +chemicalGraph <- KEGGpathway2reactionGraph(map) + +outDegrees <- sapply(edges(chemicalGraph), length) +maxout <- names(sort(outDegrees,decreasing=TRUE))[1:3] + +nAttrs <- list() +maxoutlabel <- as.list(maxout); names(maxoutlabel) <- maxout +nAttrs$label <- makeAttr(chemicalGraph, "", maxoutlabel) +nAttrs$fillcolor <- makeAttr(chemicalGraph, "lightblue", list(orange=maxout)) +nAttrs$width <- makeAttr(chemicalGraph,"0.8", list("1.8"=maxout)) +plot(chemicalGraph, nodeAttrs=nAttrs) +``` + ## Expand embedded pathways Function `parseKGMLexpandMaps` is a function to handle with pathways embedding @@ -415,7 +435,7 @@ mapkGsub <- subGraphByNodeType(mapkGall, "gene") mapkGsub ``` -## Annotation +## Annotation {#convertId} `translateKEGGID2GeneID` translates KEGG identifiers (KEGGID) into Entrez GeneID. For example, if we want to find the Entrez GeneID of the nodes in MAPK @@ -434,9 +454,9 @@ packages, for human it is `r Biocpkg("org.Hs.eg.db")` and the packages for other organisms can be fount at (http://www.bioconductor.org/packages/release/data/annotation/). To demonstrate its use, we draw the sub-network in the figure [2](fig:02) again, whereas nodes -are now labeled with gene symbols. {#convertId} +are now labeled with gene symbols. -```{r orgHuman, message=FALSE, fig=TRUE} +```{r orgHuman, message=FALSE, warning=FALSE} if(require(org.Hs.eg.db)) { tnodes <- nodes(toprSub) tgeneids <- translateKEGGID2GeneID(tnodes) diff --git a/vignettes/KEGGgraph.Rnw b/vignettes/KEGGgraph.Rnw deleted file mode 100644 index 84243de..0000000 --- a/vignettes/KEGGgraph.Rnw +++ /dev/null @@ -1,426 +0,0 @@ -%\VignetteIndexEntry{KEGGgraph: graph approach to KEGG PATHWAY} -%\VignetteDepends{graph, XML, Rgraphviz, RBGL, org.Hs.eg.db} -%\VignettePackage{KEGGgraph} - -\documentclass[11pt]{article} - -\usepackage{times} -\usepackage{hyperref} -\usepackage{geometry} -\usepackage{longtable} -\usepackage[pdftex]{graphicx} -\usepackage[utf8]{inputenc} -\SweaveOpts{keep.source=TRUE,eps=FALSE,pdf=TRUE,prefix=TRUE} - -% R part -\newcommand{\R}[1]{{\textsf{#1}}} -\newcommand{\Rfunction}[1]{{\texttt{#1}}} -\newcommand{\Robject}[1]{{\texttt{#1}}} -\newcommand{\Rpackage}[1]{{\textit{#1}}} -\newcommand{\Rclass}[1]{{\textit{#1}}} -\newcommand{\Metas}[1]{{\texttt{#1}}} - -\begin{document} -\SweaveOpts{concordance=TRUE} -\setkeys{Gin}{width=0.8\textwidth} -\title{KEGGgraph: a graph approach to KEGG PATHWAY in R and Bioconductor} -\author{Jitao David Zhang and Stefan Wiemann} - -\maketitle - -\begin{abstract} -We demonstrate the capabilities of the \Rpackage{KEGGgraph} package, an interface between KEGG pathways and graph model in R as well as a collection of tools for these graphs. Superior to preceding approaches, \Rpackage{KEGGgraph} maintains the pathway topology and allows further analysis or dissection of pathway graphs. It parses the regularly updated KGML (KEGG XML) files into graph models maintaining all essential pathway attributes. -\end{abstract} - -\section{Introduction} -Since its first introduction in 1995, KEGG PATHWAY has been widely used as a reference knowledge base for understanding biological pathways and functions of cellular processes. The knowledge from KEGG has proven of great value by numerous work in a wide range of fields \cite{Kanehisa08}. - -Pathways are stored and presented as graphs on the KEGG server side, where nodes are molecules (protein, compound, etc) and edges represent relation types between the nodes, e.g. activation or phosphorylation. The graph nature raised our interest to investigate them with powerful graph tools implemented in \R{R} and \R{Bioconductor} \cite{Gentleman04}, including \Rpackage{graph}, \Rpackage{RBGL} and \Rpackage{Rgraphviz} \cite{Carey05}. While it is barely possible to query the graph characteristics by manual parsing, a native and straightforward client-side tool is currently missing to parse pathways and analyze them consequently with tools for graph in \R{R}. - -To address this problem, we developed the open-source software package \Rpackage{KEGGgraph}, an interface between KEGG pathway and graph object as well as a collection of tools to analyze, dissect and visualize these graphs. - -The package requires KGML (KEGG XML) files, which can be downloaded from KEGG REST web service (\url{https://rest.kegg.jp/}) without license permission for academic purposes. To demonstrate the functionality, in 'extdata/' sub-directory of \Rpackage{KEGGgraph} we have pre-installed several KGML files. - -\section{Software features} - -\Rpackage{KEGGgraph} offers the following functionalities: - -\textit{Parsing}: It should be noted that one 'node' in KEGG pathway does not necessarily map to merely one gene product, for example the node 'ERK' in the human TGF-Beta signaling pathway contains two homologues, MAPK1 and MAPK3. Therefore, among several parsing options, user can set whether to expand these nodes topologically. Beyond facilitating the interpretation of pathways in a gene-oriented manner, the approach also entitles unique identifiers to nodes, enabling merging graphs from different pathways. - -\textit{Graph operations}: Two common operations on graphs are subset and merge. A sub-graph of selected nodes and the edges in between are returned when subsetting, while merging produces a new graph that contains nodes and edges of individual ones. Both are implemented in \Rpackage{KEGGgraph}. - -\textit{Visualization}: \Rpackage{KEGGgraph} provides functions to visualize KEGG graphs with custom style. Nevertheless users are not restricted by them, alternatively they are free to render the graph with other tools like the ones in \Rpackage{Rgraphviz}. - -Besides the functionalities described above, \Rpackage{KEGGgraph} also has tools for remote KGML file retrieval, graph feature study and other related tasks. We will demonstrate them later in this vignette. - -\section{Case studies} -We load the \Rpackage{KEGGgraph} by typing or pasting the following codes in R -command line: -<>= -library(KEGGgraph) -@ - -\subsection{Get KGML files} -There are at least two possibilities to get KGML (KEGG XML) files: -\begin{itemize} - \item Manual download from KEGG REST API, documented at at \url{https://www.kegg.jp/kegg/rest/keggapi.html}. - \item Automatic retrieval from KEGG REST API, available at \url{https://rest.kegg.jp}, with the function \Rfunction{retrieveKGML}. -\end{itemize} - -To retrieve KGML file automatically, one has to know the pathway identifier. -KEGG pathway identifiers have the form of \textrm{[a-z]{3-4}[0-9]{5}}, where the -three- or four-alphabet code represent the organism and the five digits -represent pathways. The same pathway in different species share the five-digit -code, which is the first parameter accepted by \textit{retrieveKGML}. The second -parameter required is the alphabet-code of the organism. - -For example, the following codes retrieve p53 signaling pathway, which has the -digit-code \textit{04115}, of the species \textit{Macaca fascicularis} -, or crab-eating macaque, which has the three-alphabet code \textrm{mcf}: - -<>= -tmp <- tempfile() -retrieveKGML("04115", organism="mcf", destfile=tmp, method="auto", quiet=TRUE) -@ - -Pathways and their identifiers can be browsed in the KEGG website -\url{https://genome.jp/kegg}, or found with any search engine. - -\subsection{Parsing and graph feature query} - -First we read in KGML file for human MAPK signaling pathway (with KEGG ID hsa04010): -<>= -mapkKGML <- system.file("extdata/hsa04010.xml", - package="KEGGgraph") -@ -Once the file is ready, we can either parse them into an object of \Rclass{KEGGPathway} or an object of \Rclass{graph}. \Rclass{KEGGPathway} object maintains the information of the pathway (title, link, organism, etc), while \Rclass{graph} objects are more natural approach and can be directly plugged in many other tools. To convert the pathway into graph, we use -<>= -mapkG <- parseKGML2Graph(mapkKGML,expandGenes=TRUE) -mapkG -@ - -Alternatively we can parse the KGML file first into an object of \Rclass{KEGGpathway}, which can be later converted into the graph object, as the following lines show: -<>= -mapkpathway <- parseKGML(mapkKGML) -mapkpathway -mapkG2 <- KEGGpathway2Graph(mapkpathway, expandGenes=TRUE) -mapkG2 -@ -There is no difference between graph objects derived from two approaches. - -The option 'expandGenes' in parsing controls whether the nodes of paralogues in pathways should be expanded or not. Since one 'node' in KEGG pathway does not necessarily map to only one gene/gene product (e.g. 'ERK' maps to MAPK1 and MAPK3), the option allows expanding these nodes and takes care of copying existing edges. - -Another option users may find useful is 'genesOnly', when set to TRUE, -the nodes of other types than 'gene' (compounds, for example) are neglected and -the result graph consists only gene products. This is especially -desired when we want to query network characteristics of gene -products. Its value is set to 'TRUE' by default. - -The following commands extract node and edge information: -<>= -mapkNodes <- nodes(mapkG) -nodes(mapkG)[1:3] -mapkEdges <- edges(mapkG) -edges(mapkG)[1] -@ - -Edges in KEGG pathways are directional, that is, an edge starting at node A pointing to node B does not guarantee a reverse relation, although reciprocal edges are also allowed. When listing edges, a list indexed with node names is returned. Each item in the list records the nodes pointed to. - -We can also extract the node attributes specified by KEGG with \Rfunction{getKEGGnodeData}: -<>= -mapkGnodedata <- getKEGGnodeData(mapkG) -mapkGnodedata[[2]] -@ -An alternative to use \Rfunction{gettKEGGnodeData} is - -<>= -getKEGGnodeData(mapkG, 'hsa:5924') -@ - -, returning identical results. - -Similarly the \Rfunction{getKEGGedgeData} is able to extract edge information: -<>= -mapkGedgedata <- getKEGGedgeData(mapkG) -mapkGedgedata[[4]] -@ -Alternatively the query above can be written as: -<>= -getKEGGedgeData(mapkG,'hsa:627~hsa:4915') -@ - -For \Robject{KEGGNode} and \Robject{KEGGEdge} objects, methods are -implemented to fetch their attributes, for example -\Rfunction{getName}, \Rfunction{getType} and -\Rfunction{getDisplayName}. Guides to use these methods as well as -examples can be found in help pages. - -This case study finishes with querying the degree attributes of the nodes in the graph. We ask the question which nodes have the highest out- or in-degrees. Roughly speaking the out-degree (number of out-going edges) reflects the regulatory role, while the in-degree (number of in-going edges) suggests the subjectability of the protein to intermolecular regulations. -<>= -mapkGoutdegrees <- sapply(edges(mapkG), length) -mapkGindegrees <- sapply(inEdges(mapkG), length) -topouts <- sort(mapkGoutdegrees, decreasing=T) -topins <- sort(mapkGindegrees, decreasing=T) -topouts[1:3] -topins[1:3] -@ - - -\subsection{Graph subset and merge} -\label{txt:subset}We demonstrate the subsetting of the graph with 25 randomly chosen nodes of MAPK pathway graph: - -<>= -library(Rgraphviz) -set.seed(123) -randomNodes <- sample(nodes(mapkG), 25) -mapkGsub <- subGraph(randomNodes, mapkG) -mapkGsub -@ - -The subgraph is visualized in figure \ref{fig:01}, where nodes with -in-degree or out-degree in red and others in grey.\footnote{The \Rfunction{makeAttr} -function is used to assign nodes with rendering attributes, whose code -can be found in the Rnw file.}. And in the example we also demonstrate how to convert KEGG ID into other other identifiers via the Entrez GeneID. More details on the conversion of IDs can be found on page \pageref{convertId}. - -<>= -makeAttr <- function(graph, default, valNodeList) { - tmp <- nodes(graph) - x <- rep(default, length(tmp)); names(x) <- tmp - - if(!missing(valNodeList)) { - stopifnot(is.list(valNodeList)) - allnodes <- unlist(valNodeList) - stopifnot(all(allnodes %in% tmp)) - for(i in seq(valNodeList)) { - x[valNodeList[[i]]] <- names(valNodeList)[i] - } - } - return(x) -} -@ - -\begin{figure}[!htbp] - \begin{center} -<>= -outs <- sapply(edges(mapkGsub), length) > 0 -ins <- sapply(inEdges(mapkGsub), length) > 0 -ios <- outs | ins - -## translate the KEGG IDs into Gene Symbol -if(require(org.Hs.eg.db)) { - ioGeneID <- translateKEGGID2GeneID(names(ios)) - nodesNames <- sapply(mget(ioGeneID, org.Hs.egSYMBOL, ifnotfound=NA), "[[",1) -} else { - nodesNames <- names(ios) -} -names(nodesNames) <- names(ios) - -nAttrs <- list(); -nAttrs$fillcolor <- makeAttr(mapkGsub, "lightgrey", list(orange=names(ios)[ios])) -nAttrs$label <- nodesNames -plot(mapkGsub, "neato", nodeAttrs=nAttrs, - attrs=list(node=list(fillcolor="lightgreen", - width="0.75", shape="ellipse"), - edge=list(arrowsize="0.7"))) -@ -\caption{A random subgraph of MAPK signaling pathway}\label{fig:01} -\end{center} -\end{figure} - -Another common operation on graphs is merging, that is, combining different graphs together. It is inspired by the fact that many KEGG pathways embed other pathway, for example MAPK signaling pathway embeds 6 pathways including Wnt signaling pathway. \Rfunction{mergeGraphs} provides the possibility to merge them into one graph for further analysis. Next we merge MAPK and Wnt signaling pathway into one graph. The graphs to be merged should be organized into a list, and it is commandary to use 'expandGenes=TRUE' option when parsing to make sure the nodes are unique and indexed by KEGGID. -<>= -wntKGML <- system.file("extdata/hsa04310.xml",package="KEGGgraph") -wntG <- parseKGML2Graph(wntKGML) -graphs <- list(mapk=mapkG, wnt=wntG) -merged <- mergeGraphs(graphs) -merged -@ - -We observe that the node number in the merged graph -(\Sexpr{numNodes(merged)}) is less than the sum of two graphs -(\Sexpr{numNodes(mapkG)} and \Sexpr{numNodes(wntG)} for MAPK and Wnt -pathway respectively), reflecting the crosstalk between the pathways -by sharing nodes. - -\subsection{Using other graph tools} - -In \R{R} and \R{Bioconductor} there'are powerful tools for graph -algorithms and operations, including \Rpackage{graph}, -\Rpackage{Rgraphviz} and \Rpackage{RBGL}. The KEGG graphs can be -analyzed with their functionalities to describe characterisitcs of the -pathway and to answer biological relevant questions. - -Here we demonstrate the use of other graph tools with asking the -question which nodes are of the highest importance in MAPK signalling -pathway. To this end we turn to relative betweenness -centrality \cite{Aittokallio06,Carey05}. Betweenness is a centrality measure of a node within a -graph. Nodes that occur on many shortest paths between other vertices -have higher betweenness than those that do not. It is scaled by the -factor of (n-1)(n-2)/2 to get relative betweenness centrality, where n -is the number of nodes in the graph. Both -measurements estimate the importance or the role of the node in the graph. - -With the function implemented in \Rpackage{RBGL}, our aim is to -identify most important nodes in MAPK signalling pathway. - -\emph{Note}: the current version of \texttt{RBGL} (version 1.59.5) reports the error that \texttt{BGL\_brandes\_betweeness\_centrality} not available for \texttt{.Call()} for package ``RBGL''. Therefore the execution has been suppressed for now. - -<>= -library(RBGL) -bcc <- brandes.betweenness.centrality(mapkG) -rbccs <- bcc$relative.betweenness.centrality.vertices[1L,] -toprbccs <- sort(rbccs,decreasing=TRUE)[1:4] -toprbccs -@ -We identify the top 4 important nodes judged by betweenness centrality -as MAP3K1 (hsa:4214), GRB2 (hsa:2885), MAP2K2 (hsa:5605) and MAP2K1 (hsa:5604) (the mapping between KEGG ID and gene symbol is done using \Rpackage{biomaRt}, see page \pageref{txt:biomaRt}). In figure -\ref{fig:02} we illustrate them as well as their interacting partners -in MAPK pathway. - -\begin{figure}[!hp] -\begin{center} -<>= -toprbccName <- names(toprbccs) -toprin <- sapply(toprbccName, function(x) inEdges(mapkG)[x]) -toprout <- sapply(toprbccName, function(x) edges(mapkG)[x]) -toprSubnodes <- unique(unname(c(unlist(toprin), unlist(toprout), toprbccName))) -toprSub <- subGraph(toprSubnodes, mapkG) - -nAttrs <- list() -tops <- c("MAPK3K1","GRB2","MAP2K2","MAP2K1") -topLabels <- lapply(toprbccName, function(x) x); names(topLabels) <- tops -nAttrs$label <- makeAttr(toprSub, "", topLabels) -nAttrs$fillcolor <- makeAttr(toprSub, "lightblue", list(orange=toprbccName)) -nAttrs$width <- makeAttr(toprSub,"",list("0.8"=toprbccName)) - -plot(toprSub, "twopi", nodeAttrs=nAttrs, attrs=list(graph=list(start=2))) -@ -\end{center} -\caption{Nodes with the highest relative betweenness centrality in - MAPK pathway (in orange) and their interacting partners (in blue).}\label{fig:02} -\end{figure} - -\section{Other funtionalities} -Besides the ability to parse and operate on KEGG PATHWAY graph objects, the \Rpackage{KEGGgraph} package also provides functionalities to complement tasks related to deal with KEGG pathways. We introduce some of them here, for a full list of functions please see the package help file: - -<>= -help(package=KEGGgraph) -@ - -\subsection{Parsing chemical compound reaction network} -KEGG PATHWAY captures two kinds of network:the protein network and the -chemical network. The protein network consists \emph{relations} -(\emph{edges}) between gene products, while the chemical network -illustrate the \emph{reactions} between chemical compounds. Since the -metabolic pathway can be viewed both as a network of proteins -(enzymes) and as a network of chemical compounds, metabolic pathways -can be viewed as both protein networks and chemical networks, whereas -regulatory pathways are always viewed as protein networks -only.\Robject{KEGGPathway} provides methods to access this -network. - -We show the example of Glycine, serine and threonine metabolism -pathway. -<>= -mapfile <- system.file("extdata/map00260.xml",package="KEGGgraph") -map <- parseKGML(mapfile) -map -reactions <- getReactions(map) -reactions[[1]] -@ - -Figure \ref{fig:03} shows how to extract reactions from -the pathway and to build a directed graph with them. -\begin{figure}[!htbp] -\begin{center} -<>= -chemicalGraph <- KEGGpathway2reactionGraph(map) - -outDegrees <- sapply(edges(chemicalGraph), length) -maxout <- names(sort(outDegrees,decreasing=TRUE))[1:3] - -nAttrs <- list() -maxoutlabel <- as.list(maxout); names(maxoutlabel) <- maxout -nAttrs$label <- makeAttr(chemicalGraph, "", maxoutlabel) -nAttrs$fillcolor <- makeAttr(chemicalGraph, "lightblue", list(orange=maxout)) -nAttrs$width <- makeAttr(chemicalGraph,"0.8", list("1.8"=maxout)) -plot(chemicalGraph, nodeAttrs=nAttrs) -@ -\end{center} -\caption{Reaction network built of chemical compounds: the orange nodes are the three compounds with maximum out-degree in this network.}\label{fig:03} -\end{figure} - -\subsection{Expand embedded pathways} -Function \Rfunction{parseKGMLexpandMaps} is a function to handle with pathways embedding other pathways. For example, pancreatic cancer pathway embeds 9 other pathways including MAPK and ErbB signaling pathway, cell cycle and apoptosis pathway, etc. To parse them into one graph, the users only have to download the KGML file and feed the file name to \Rfunction{parseKGMLexpandMaps}, the function parses the file, analyze the embedded pathways, download their files from KEGG REST API service automatically (alternatively a local repository can be specified for KGML files) and merge the individual pathways into a single graph. For example, the following single line parses MAPK signaling pathway with all its embedded pathways -<>= -mapkGembed <- parseKGMLexpandMaps(mapkKGML) -@ - -As its name suggests, function \Rfunction{subGraphByNodeType} subsets the graph by node type, the nodes to subset are those of the type given by the user. It is useful when the KGML file was parsed with 'genesOnly=FALSE' option and later on the user wants only certain kind of node, 'gene' for example, remained. The following example shows how to use it. -<>= -mapkGall <- parseKGML2Graph(mapkKGML,genesOnly=FALSE) -mapkGall -mapkGsub <- subGraphByNodeType(mapkGall, "gene") -mapkGsub -@ - -\subsection{Annotation} -\Rfunction{translateKEGGID2GeneID} translates KEGG identifiers (KEGGID) into Entrez GeneID. For example, if we want to find the Entrez GeneID of the nodes in MAPK pathway having the highest relative betweenness centrality, the following codes do the job. -<>= -toprbccKEGGID <- names(toprbccs) -toprbccKEGGID -toprbccGeneID <- translateKEGGID2GeneID(toprbccKEGGID) -toprbccGeneID -@ - -To convert GeneID to other identifiers, we recommend genome wide annotation packages, for human it is \Rpackage{org.Hs.eg.db} and the packages for other organisms can be fount at \url{http://www.bioconductor.org/packages/release/data/annotation/}. To demonstrate its use, we draw the sub-network in the figure \ref{fig:02} again, whereas nodes are now labeled with gene symbols. -\label{convertId} -<>= -if(require(org.Hs.eg.db)) { - tnodes <- nodes(toprSub) - tgeneids <- translateKEGGID2GeneID(tnodes) - tgenesymbols <- sapply(mget(tgeneids, org.Hs.egSYMBOL, ifnotfound=NA), "[[",1) - toprSubSymbol <- toprSub - nodes(toprSubSymbol) <- tgenesymbols - plot(toprSubSymbol, "neato",attrs=list(node=list(font=5, fillcolor="lightblue"))) -} -@ - -Alternatively, users could use R package \Rpackage{biomaRt} \cite{Durinck05, Huber08}\label{txt:biomaRt} for ID conversion, whereas it assumes that the user has an internet connection. The following example shows how to translate the node hits we acquired in the example above into HGNC symbols: - -<>= -library(biomaRt) -hsapiens <- useMart("ensembl","hsapiens_gene_ensembl" ) -filters <- listFilters(hsapiens) -getBM(attributes=c("entrezgene","hgnc_symbol"), - filters="entrezgene", - values=toprbccGeneID, mart=hsapiens) -@ - -\section{Acknowledgement} -We thank Vincent Carey, Holger Fr\"ohlich and Wolfgang Huber for comments and suggestions on the package, and the reviewers from the Bioconductor community. - -Many users have provided very helpful feedback. Here is an incomplete list of them: Martin Morgan, Paul Shannon, Juliane Manitz and Alexander Gulliver Bjørnholt Grønning. - -\section{Conclusion} -Before the release of \Rpackage{KEGGgraph}, several \R{R/Bioconductor} packages have been introduced and proven their usefulness in understanding biological pathways with KEGG. However, \Rpackage{KEGGgraph} is the first package able to parse any KEGG pathways from KGML files into graphs. In comparison, existing tools can not achieve the results we present here. They either neglects the graph topology (\Rpackage{KEGG.db}), do not parse pathway networks (\Rpackage{keggorth}), or are specialized for certain pathways (\Rpackage{cMAP} and \Rpackage{pathRender}). - -With \Rpackage{KEGGgraph}, we contribute a direct and natural approach to KEGG pathways, and the possibilities to study them in \R{R} and \R{Bioconductor}. - -\section{Session Info} -The script runs within the following session: -<>= -sessionInfo() -@ - -\begin{thebibliography}{} -\bibitem[Gentleman {\it et~al}., 2004]{Gentleman04} Gentleman {\it et~al}. (2004) Bioconductor: open software development for computational biology and bioinformatics, {\it Genome Biology}, {\bf 5}, R80. -\bibitem[Carey {\it et~al}., 2005]{Carey05} Carey {\it et~al}. (2005) Network structures and algorithms in Bioconductor, {\it Bioinformatics}, {\bf 21}, 135-136. -\bibitem[Kanehisa {\it et~al}., 2008]{Kanehisa08} Kanehisa {\it et~al}. (2008) KEGG for linking genomes to life and the environment, {\it Nucleic Acids Research, Database issue}, {\bf 36}, 480-484. -\bibitem[Klukas and Schreiber, 2007]{Klukas07} Klukas and Schreiber. (2007) Dynamic exploration and editing of KEGG pathway diagrams, {\it Bioinformatics}, {\bf 23}, 344-350. -\bibitem[Aittokallio and Schwikowski, 2006]{Aittokallio06} Aittokallio and Schwikowski (2006) Graph-based methods for analysing networks in cell biology, {\it Briefings in Bioinformatics}, {\bf 7}, 243-255. -\bibitem[Durinck {\it et~al}., 2005]{Durinck05} Durinck {\it et~al}. (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis, {\it Bioinformatcs}, {\bf 21}, 3439-3440. -\bibitem[Durinck and Huber, 2008]{Huber08} Durinck and Huber (2008) R/Bioconductor package {\it biomaRt}, 2008 -\end{thebibliography} - -\end{document} From 5c50c67d962714406cf46cb0a9cee80459f8c745 Mon Sep 17 00:00:00 2001 From: sonali Date: Sat, 19 Aug 2023 12:20:34 +0530 Subject: [PATCH 5/7] update discription, made requested changes --- DESCRIPTION | 3 ++- vignettes/KEGGgraph.Rmd | 14 +++++++------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 7231df2..75ed31a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -18,9 +18,10 @@ LazyData: yes Depends: R (>= 3.5.0) Imports: methods, XML (>= 2.3-0), graph, utils, RCurl, Rgraphviz Suggests: RBGL, testthat, RColorBrewer, org.Hs.eg.db, - hgu133plus2.db, SPIA + hgu133plus2.db, SPIA, BiocStyle, knitr Collate: kegg2graph-functions.R parse.R annotation.R graph.R kgmlfile.R misc.R vis.R URL: http://www.nextbiomotif.com biocViews: Pathways, GraphAndNetwork, Visualization, KEGG Encoding: UTF-8 +VignetteBuilder: knitr \ No newline at end of file diff --git a/vignettes/KEGGgraph.Rmd b/vignettes/KEGGgraph.Rmd index 00d86f9..92d6a0a 100644 --- a/vignettes/KEGGgraph.Rmd +++ b/vignettes/KEGGgraph.Rmd @@ -45,7 +45,7 @@ To address this problem, we developed the open-source software package well as a collection of tools to analyze, dissect and visualize these graphs. The package requires KGML (KEGG XML) files, which can be downloaded from KEGG -REST web service ((https://rest.kegg.jp/)) without license permission for +REST web service (https://rest.kegg.jp/) without license permission for academic purposes. To demonstrate the functionality, in 'extdata/' sub-directory of `r Biocpkg("KEGGgraph")` we have pre-installed several KGML files. @@ -88,8 +88,8 @@ library(KEGGgraph) There are at least two possibilities to get KGML (KEGG XML) files: - * Manual download from KEGG REST API, documented at at (https://www.kegg.jp/kegg/rest/keggapi.html). - * Automatic retrieval from KEGG REST API, available at (https://rest.kegg.jp), + * Manual download from KEGG REST API, documented at at https://www.kegg.jp/kegg/rest/keggapi.html. + * Automatic retrieval from KEGG REST API, available at https://rest.kegg.jp, with the function `retrieveKGML`. To retrieve KGML file automatically, one has to know the pathway identifier. @@ -109,7 +109,7 @@ retrieveKGML("04115", organism="mcf", destfile=tmp, method="auto", quiet=TRUE) ``` Pathways and their identifiers can be browsed in the KEGG website -(https://genome.jp/kegg), or found with any search engine. +https://genome.jp/kegg, or found with any search engine. ## Parsing and graph feature query @@ -234,7 +234,7 @@ mapkGsub The subgraph is visualized in figure [1](#fig:01), where nodes with in-degree or out-degree in red and others in grey [^1]. And in the example we also demonstrate how to convert KEGG ID into other other identifiers via the Entrez -GeneID. More details on the conversion of IDs can be found on section [Annotation](#convertId) +GeneID. More details on the conversion of IDs can be found in the [Annotation section](#convertId). ```{r makeattr, echo=FALSE} makeAttr <- function(graph, default, valNodeList) { @@ -452,8 +452,8 @@ toprbccGeneID To convert GeneID to other identifiers, we recommend genome wide annotation packages, for human it is `r Biocpkg("org.Hs.eg.db")` and the packages for other organisms can be fount at -(http://www.bioconductor.org/packages/release/data/annotation/). To demonstrate -its use, we draw the sub-network in the figure [2](fig:02) again, whereas nodes +http://www.bioconductor.org/packages/release/data/annotation/. To demonstrate +its use, we draw the sub-network in the figure [2](#fig:02) again, whereas nodes are now labeled with gene symbols. ```{r orgHuman, message=FALSE, warning=FALSE} From 573a878b37ed5e29c584c5ce562ea72462d4cc99 Mon Sep 17 00:00:00 2001 From: sonali Date: Fri, 25 Aug 2023 05:37:25 +0530 Subject: [PATCH 6/7] added sonali as cnt,updated disc file --- DESCRIPTION | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 75ed31a..aca7a58 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -4,8 +4,13 @@ Title: KEGGgraph: A graph approach to KEGG PATHWAY in R and Bioconductor Version: 1.59.3 Date: 2022-12-18 -Author: Jitao David Zhang, with inputs from Paul Shannon and Hervé Pagès -Maintainer: Jitao David Zhang +Authors@R: c( + person("Paul", "Shannon", role = "ctb"), + person("Hervé", "Pagès", role = "ctb"), + person("Sonali", "Kumari", role = "ctb", + comment = "Converted KEGGgraph.Rnw vignette from Sweave to RMarkdown / HTML."), + person("Jitao David", "Zhang", + email = "jitao_david.zhang@roche.com", role = c("aut", "cre"))) Description: KEGGGraph is an interface between KEGG pathway and graph object as well as a collection of tools to analyze, dissect and visualize these graphs. It parses the regularly updated KGML From 456f6b65617fc0860be07ba4eaedf89200b4d4a2 Mon Sep 17 00:00:00 2001 From: sonali Date: Fri, 25 Aug 2023 05:41:53 +0530 Subject: [PATCH 7/7] removed .Rnw from comment in desc file --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index aca7a58..392371e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -8,7 +8,7 @@ Authors@R: c( person("Paul", "Shannon", role = "ctb"), person("Hervé", "Pagès", role = "ctb"), person("Sonali", "Kumari", role = "ctb", - comment = "Converted KEGGgraph.Rnw vignette from Sweave to RMarkdown / HTML."), + comment = "Converted KEGGgraph vignette from Sweave to RMarkdown / HTML."), person("Jitao David", "Zhang", email = "jitao_david.zhang@roche.com", role = c("aut", "cre"))) Description: KEGGGraph is an interface between KEGG pathway and graph