-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
158 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
@article{Gu, | ||
doi = {https://doi.org/10.1002/imt2.43}, | ||
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/imt2.43}, | ||
year = {2022}, | ||
pages = {e43}, | ||
title = {Complex heatmap visualization}, | ||
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/imt2.43}, | ||
number = {3}, | ||
volume = {1}, | ||
journal = {iMeta}, | ||
abstract = {Abstract Heatmap is a widely used statistical visualization method on | ||
matrix-like data to reveal similar patterns shared by subsets of rows and | ||
columns. In the R programming language, there are many packages that make | ||
heatmaps. Among them, the ComplexHeatmap package provides the richest toolset | ||
for constructing highly customizable heatmaps. ComplexHeatmap can easily | ||
establish connections between multisource information by automatically | ||
concatenating and adjusting a list of heatmaps as well as complex | ||
annotations, which makes it widely applied in data analysis in many fields, | ||
especially in bioinformatics, to find hidden structures in the data. In this | ||
article, we give a comprehensive introduction to the current state of | ||
ComplexHeatmap, including its modular design, its rich functionalities, and | ||
its broad applications. | ||
}, | ||
keywords = {bioconductor, clustering, complex heatmap, R package, visualization}, | ||
author = {Gu, Zuguang} | ||
} | ||
|
||
@article{Mayakonda, | ||
doi = {10.1101/gr.239244.118}, | ||
url = {https://dx.doi.org/10.1101/gr.239244.118}, | ||
issn = {1088-9051, 1549-5469}, | ||
year = {2018}, | ||
month = {OCT}, | ||
pages = {1747--1756}, | ||
title = {Maftools: efficient and comprehensive analysis of somatic variants in cancer}, | ||
volume = {28}, | ||
journal = {Genome Research}, | ||
publisher = {Cold Spring Harbor Laboratory}, | ||
author = {Anand Mayakonda and De-Chen Lin and Yassen Assenov and Christoph Plass and H. | ||
Phillip Koeffler | ||
} | ||
} | ||
|
||
@article{Skidmore, | ||
doi = {10.1093/bioinformatics/btw325}, | ||
url = {https://dx.doi.org/10.1093/bioinformatics/btw325}, | ||
issn = {1367-4811, 1367-4803}, | ||
year = {2016}, | ||
month = {JUN}, | ||
pages = {3012--3014}, | ||
title = {GenVisR: Genomic Visualizations in R}, | ||
volume = {32}, | ||
journal = {Bioinformatics}, | ||
publisher = {Oxford University Press (OUP)}, | ||
author = {Zachary L. Skidmore and Alex H. Wagner and Robert Lesurf and Katie M. Campbell and Jason Kunisaki and Obi L. Griffith and Malachi Griffith} | ||
} | ||
|
||
@misc{gohel, | ||
title = {ggiraph: Make 'ggplot2' Graphics Interactive}, | ||
author = {David Gohel and Panagiotis Skintzos}, | ||
year = {2024}, | ||
note = {R package version 0.8.10}, | ||
url = {https://davidgohel.github.io/ggiraph/}, | ||
} | ||
|
||
@misc{pedersen, | ||
title = {patchwork: The Composer of Plots}, | ||
author = {Thomas Lin Pedersen}, | ||
year = {2024}, | ||
note = {R package version 1.2.0.9000, https://github.com/thomasp85/patchwork}, | ||
url = {https://patchwork.data-imaginist.com}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
--- | ||
title: 'ggoncoplot: an R package for visualisation of somatic mutation data from cancer patient cohorts ' | ||
tags: | ||
- R | ||
- cancer | ||
- genomics | ||
- visualisation | ||
- oncoplot | ||
authors: | ||
- name: Sam El-Kamand | ||
orcid: 0000-0003-2270-8088 | ||
affiliation: 1 | ||
- name: Julian M.W. Quinn | ||
orcid: add_julians_orcid | ||
affiliation: "1, 2" | ||
- name: Mark J. Cowley | ||
affiliation: "1, 3" | ||
orcid: add_marks_orcid | ||
corresponding: true | ||
affiliations: | ||
- name: Children’s Cancer Institute, Australia | ||
index: 1 | ||
- name: Skeletal Research Program, Garvan Institute of Medical Research, Australia | ||
index: 2 | ||
- name: School of Clinical Medicine, UNSW Medicine & Health, Australia | ||
index: 3 | ||
date: 17 June 2024 | ||
bibliography: paper.bib | ||
--- | ||
|
||
# Summary | ||
|
||
The ggoncoplot R package generates interactive oncoplots (also called oncoprints) that visualize mutational patterns across patient cancer cohorts (\autoref{fig:oncoplot}). These plots reveal patterns of mutation co-occurrence in a cohort, with marginal plots that indicate correlations between gene mutations and specific tumour characteristics. These tumour characteristics can include user-supplied annotations such as patient clinical data, cancer subtypes and histological features. | ||
|
||
ggoncoplot offers several features to enhance utility and user experience. These include automatic colour palette selection for common mutation impact dictionaries, customizable tooltips, and automatic rendering of clinical annotations as barplots or heatmaps for quantitative or qualitative data, respectively. ggoncoplot supports visualisation of mutation-level data in tidy, tabular formats, making it easy to run on existing large somatic mutation datasets stored as MAF files or in relational databases. The ggoncoplot package is available from github at https://github.com/selkamand/ggoncoplot. | ||
|
||
![Default ggoncoplot output visualising mutational trends in the TCGA glioblastoma multiforme cohort. Individual patient samples are plotted on the x-axis, ordered by ggoncoplot. The plot indicates (y-axis, sorted by genes mutation frequency) that PTEN is the most recurrently mutated gene in the cohort, followed by TP53. Marginal plots indicate the total number of mutations per sample (top), and the number of samples showing mutations in each gene, coloured by mutation type (right). A range of clinical features, including histological type and gender, are shown on the marginal plot at the bottom. This default settings output can be altered as required..\label{fig:oncoplot}](oncoplot.png) | ||
|
||
# Statement of Need | ||
|
||
Oncoplots effectively visualize cohort-level mutation but are challenging to generate with the major R plotting systems (base, lattice, or ggplot2) due to their algorithmic and graphical complexity. Simplifying the process would make oncoplots more accessible to researchers. Packages like ComplexHeatmap [@Gu:2022], maftools [@Mayakonda:2018], and genVisR [@Skidmore:2016] make static oncoplots easier to create, but there remains a significant and unmet need to easily create oncoplots with: | ||
|
||
### Interactive Features | ||
- Customizable tooltips | ||
- Cross-selection of samples across different plots | ||
- Auto-copying of sample identifiers on click | ||
|
||
### Support for Tidy Datasets | ||
- Compatibility with tidy, tabular mutation-level formats (MAF files or relational databases), typical of cancer cohort datasets | ||
|
||
### Auto Colouring | ||
- Automatic selection of color palettes for datasets with consequence annotations aligned with standard variant effect dictionaries (PAVE, SO, or MAF) | ||
|
||
### Versatility | ||
- The ability to visualize entities beyond gene mutations, including noncoding features (e.g., enhancers) and non-genomic entities (e.g., microbial presence in microbiome datasets) | ||
|
||
We developed ggoncoplot as the first R package that addresses all these challenges simultaneously (Table 1). Examples of all key features are available in the [ggoncplot manual](https://selkamand.github.io/ggoncoplot/articles/manual.html). | ||
|
||
# Table 1. Comparison of R packages for creating oncoplots | ||
|
||
| Property | complexheatmap | maftools | GenVisR | ggoncoplot | | ||
|--------------------------------------------------------------------------|----------------------------------|--------------------------------|--------------------------------|--------------------------------| | ||
| Sample sorting algorithm | memo sort | hierarchical sort | hierarchical sort | hierarchical sort | | ||
| Underlying plotting system | BaseR | BaseR | ggplot2 | ggplot2 | | ||
| Automatic rendering of clinical annotations as bar or tile plots based on datatype | No | No | No | Yes | | ||
| Works on tabular, tidy, long-form input data as would be stored in large databases | No | Yes | Yes | Yes | | ||
| Interactive | Yes<sup>1</sup> | No | No | Yes | | ||
| Customisable tooltips | No | No | No | Yes | | ||
| Allows any mutation dictionary to be used? | Yes | No | Yes | Yes | | ||
| Automatic colour palette selection when mutation impact dictionary conforms to known ontologies | No | Yes (MAF only) | No | Yes (MAF, SO, or PAVE) | | ||
| Approach for resolving genes with multiple mutations | Different Visualisation on Plot<sup>3</sup> | Flags as Multi-Hit | Picks more severe consequence or leaves to user<sup>3</sup> | Flags as Multi-Hit | | ||
| Supports a mutation level dataset as input | No<sup>4</sup> | Yes | Yes | Yes | | ||
| Native Support for Faceting by Pathway | No | Yes | No | Yes | | ||
| Supports marginal plots describing TMB, gene mutation recurrence, and clinical annotations | Yes | Yes | Yes | Yes | | ||
|
||
<sup>1</sup> Can be made interactive and displayed in a shiny app using the interactiveComplexHeatmap package. | ||
<sup>2</sup> Mutations must already be summarised at the gene level. Expects a sample X genes character matrix with different mutations separated by semicolons. | ||
<sup>3</sup> If a MAF is supplied it will choose the most severe consequence. For non-MAF dataset user can choose to define the mutation impact hierarchy. | ||
<sup>4</sup> If multiple mutations are of different types, it can be rendered in different ways on plot (user-controlled) - if identical, non-unique mutation types are treated as one observation. | ||
|
||
|
||
# Acknowledgements | ||
|
||
We extend our gratitude to the developers of the packages integral to ggoncoplot. We owe special thanks to David Gohel for his work on ggiraph [@gohel:2024], which enables the interactivity of ggoncoplot, and to Thomas Lin Pedersen for his contributions to patchwork [@pedersen:2024] and the maintenance of ggplot2. We also acknowledge Hadley Wickham and all contributors to the ggplot2 package [@wickam:2016]. Additionally, we thank Dr. Marion Mateos for her insightful feedback during the early stages of ggoncoplot development. | ||
|
||
# References |