Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev: Joss Paper Updates #139

Merged
merged 2 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 48 additions & 11 deletions paper/jats/paper.jats
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
<article-id pub-id-type="publisher-id">0</article-id>
<article-id pub-id-type="doi">N/A</article-id>
<title-group>
<article-title>ggoncoplot: an R package for visualising somatic mutation
data from cancer patient cohorts</article-title>
<article-title>ggoncoplot: an R package for interactive visualisation of
somatic mutation data from cancer patient cohorts</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
Expand Down Expand Up @@ -105,9 +105,9 @@ a Creative Commons Attribution 4.0 International License (CC BY
types, including gene expression t-SNE plots or methylation UMAPs. The
simplest and most intuitive approach to examining such relations is to
link plots dynamically such that samples selected in an oncoplot can
be highlighted in other plots. There are, however, no existing
oncoplot-generating R packages that support dynamic data linkage
between different plots. To address this gap and enable rapid
be highlighted in other plots, and vice versa. There are, however, no
existing oncoplot-generating R packages that support dynamic data
linkage between different plots. To address this gap and enable rapid
exploration of a variety of data types we constructed the ggoncoplot
package for the production of oncoplots that are easily integrated
with custom visualisations and that support synchronised
Expand Down Expand Up @@ -140,12 +140,15 @@ a Creative Commons Attribution 4.0 International License (CC BY
samples were automatically highlighted on the UMAP and oncoplot.
This reveals that samples which cluster on the left of the t-SNE
plot also cluster in the oncoplot, chiefly containing mutations in
TP53 but wild type PIK3CA. The plots of progesterone, estrogen, HER2
TP53 and wild type PIK3CA. The plots of progesterone, estrogen, HER2
status and triple negative classification show that the samples
selected in the t-SNE are virtually all triple negative breast
selected in the t-SNE are enriched for triple negative breast
cancers. In contrast to the oncoplot, the methylation UMAP shows no
strong clustering, in line with knowledge of methylation patterns in
triple negative breast cancer.
strong clustering, consistent with knowledge of methylation patterns
in triple negative breast cancer. Expression and methylation plots
were produced using the
<ext-link ext-link-type="uri" xlink:href="https://github.com/selkamand/express">express</ext-link>
package.
<styled-content id="figU003Amultimodal_selection"></styled-content></p></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="multimodal_selection_with_lasso.png" />
</fig>
Expand Down Expand Up @@ -182,8 +185,8 @@ a Creative Commons Attribution 4.0 International License (CC BY
visualised in an oncoplot.</p>
</list-item>
<list-item>
<p><bold>Auto-colouring</bold>: Automatic selection of colour
palettes for datasets where the consequence annotations are
<p><bold>Auto-colouring</bold>: Automatic selection of accessible
colour palettes for datasets where the consequence annotations are
aligned with standard variant effect dictionaries (PAVE, SO, or
MAF).</p>
</list-item>
Expand Down Expand Up @@ -220,6 +223,12 @@ a Creative Commons Attribution 4.0 International License (CC BY
</sec>
<sec id="acknowledgements">
<title>Acknowledgements</title>
<p>The results shown here are in whole or part based upon data
generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Methylation, expression, and genome datasets were obtained from the
Xena TCGA Pan-Cancer Atlas Hub
(<xref alt="Goldman et al., 2020" rid="ref-goldmanU003A2020" ref-type="bibr">Goldman
et al., 2020</xref>).</p>
<p>We thank the developers of the packages integral to ggoncoplot,
especially David Gohel for ggiraph
(<xref alt="Gohel &amp; Skintzos, 2024" rid="ref-gohelU003A2024" ref-type="bibr">Gohel
Expand Down Expand Up @@ -329,6 +338,34 @@ a Creative Commons Attribution 4.0 International License (CC BY
<uri>https://ggplot2.tidyverse.org</uri>
</element-citation>
</ref>
<ref id="ref-goldmanU003A2020">
<element-citation publication-type="article-journal">
<person-group person-group-type="author">
<name><surname>Goldman</surname><given-names>Mary J.</given-names></name>
<name><surname>Craft</surname><given-names>Brian</given-names></name>
<name><surname>Hastie</surname><given-names>Mim</given-names></name>
<name><surname>Repečka</surname><given-names>Kristupas</given-names></name>
<name><surname>McDade</surname><given-names>Fran</given-names></name>
<name><surname>Kamath</surname><given-names>Akhil</given-names></name>
<name><surname>Banerjee</surname><given-names>Ayan</given-names></name>
<name><surname>Luo</surname><given-names>Yunhai</given-names></name>
<name><surname>Rogers</surname><given-names>Dave</given-names></name>
<name><surname>Brooks</surname><given-names>Angela N.</given-names></name>
<name><surname>Zhu</surname><given-names>Jingchun</given-names></name>
<name><surname>Haussler</surname><given-names>David</given-names></name>
</person-group>
<article-title>Visualizing and interpreting cancer genomics data via the xena platform</article-title>
<source>Nature Biotechnology</source>
<year iso-8601-date="2020">2020</year>
<volume>38</volume>
<issue>6</issue>
<isbn>1546-1696</isbn>
<uri>https://doi.org/10.1038/s41587-020-0546-8</uri>
<pub-id pub-id-type="doi">10.1038/s41587-020-0546-8</pub-id>
<fpage>675</fpage>
<lpage>678</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</article>
34 changes: 26 additions & 8 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,29 @@ @misc{pedersen:2024
url = {https://patchwork.data-imaginist.com}
}

@Book{wickham:2016,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2016},
isbn = {978-3-319-24277-4},
url = {https://ggplot2.tidyverse.org},
}
@Book{wickham:2016,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2016},
isbn = {978-3-319-24277-4},
url = {https://ggplot2.tidyverse.org},
}


@article{goldman:2020,
author = {Goldman, Mary J. and Craft, Brian and Hastie, Mim and Repe{\v c}ka, Kristupas and McDade, Fran and Kamath, Akhil and Banerjee, Ayan and Luo, Yunhai and Rogers, Dave and Brooks, Angela N. and Zhu, Jingchun and Haussler, David},
date = {2020/06/01},
date-added = {2024-08-05 12:48:11 +1000},
date-modified = {2024-08-05 12:48:11 +1000},
doi = {10.1038/s41587-020-0546-8},
id = {Goldman2020},
isbn = {1546-1696},
journal = {Nature Biotechnology},
number = {6},
pages = {675--678},
title = {Visualizing and interpreting cancer genomics data via the Xena platform},
url = {https://doi.org/10.1038/s41587-020-0546-8},
volume = {38},
year = {2020},
bdsk-url-1 = {https://doi.org/10.1038/s41587-020-0546-8}}
Binary file modified paper/paper.docx
Binary file not shown.
11 changes: 6 additions & 5 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: 'ggoncoplot: an R package for visualising somatic mutation data from cancer
patient cohorts'
title: 'ggoncoplot: an R package for interactive visualisation of somatic mutation data from cancer patient cohorts'
tags:
- R
- cancer
Expand Down Expand Up @@ -30,12 +29,12 @@ affiliations:

# Summary

The ggoncoplot R package generates interactive oncoplots to visualize mutational patterns across patient cancer cohorts (\autoref{fig:oncoplot}). Oncoplots, also called oncoprints, reveal patterns of gene co-mutation and include marginal plots that indicate co-occurrence of gene mutations with tumour and clinical features. It is useful to relate gene mutation patterns seen in an oncoplot to patterns in other plot types, including gene expression t-SNE plots or methylation UMAPs. The simplest and most intuitive approach to examining such relations is to link plots dynamically such that samples selected in an oncoplot can be highlighted in other plots. There are, however, no existing oncoplot-generating R packages that support dynamic data linkage between different plots. To address this gap and enable rapid exploration of a variety of data types we constructed the ggoncoplot package for the production of oncoplots that are easily integrated with custom visualisations and that support synchronised data-selections across plots (\autoref{fig:multimodal_selection}). ggoncoplot is available on GitHub at <https://github.com/selkamand/ggoncoplot>.
The ggoncoplot R package generates interactive oncoplots to visualize mutational patterns across patient cancer cohorts (\autoref{fig:oncoplot}). Oncoplots, also called oncoprints, reveal patterns of gene co-mutation and include marginal plots that indicate co-occurrence of gene mutations with tumour and clinical features. It is useful to relate gene mutation patterns seen in an oncoplot to patterns in other plot types, including gene expression t-SNE plots or methylation UMAPs. The simplest and most intuitive approach to examining such relations is to link plots dynamically such that samples selected in an oncoplot can be highlighted in other plots, and vice versa. There are, however, no existing oncoplot-generating R packages that support dynamic data linkage between different plots. To address this gap and enable rapid exploration of a variety of data types we constructed the ggoncoplot package for the production of oncoplots that are easily integrated with custom visualisations and that support synchronised data-selections across plots (\autoref{fig:multimodal_selection}). ggoncoplot is available on GitHub at <https://github.com/selkamand/ggoncoplot>.

![ggoncoplot output visualising mutational trends in the TCGA breast carcinoma cohort. Individual patient samples are plotted on the x-axis, hierarchically sorted so that samples with the most frequent gene mutations appear on the leftmost side. The plot indicates that PIK3CA is the most frequently mutated gene, followed by TP53. Marginal plots indicate the total number of mutations per sample (top), and the number of samples showing mutations in each gene, coloured by mutation type (right). A range of clinical features, including progesterone and estrogen receptor status are shown on the marginal plot at the bottom. A detailed description of the ggoncoplot sorting algorithm is available [here](https://selkamand.github.io/ggoncoplot/articles/sorting_algorithm.html). \label{fig:oncoplot}](oncoplot.pdf)


![Example of the ggoncoplot shown in Figure 1, where the oncoplot has been dynamically cross-linked to a gene expression t-SNE plot (top left) and a methylation UMAP (top right). Here, the lasso tool was used to select a cluster of gene expression data points (i.e., individual samples) in the t-SNE plot. Selected samples were automatically highlighted on the UMAP and oncoplot. This reveals that samples which cluster on the left of the t-SNE plot also cluster in the oncoplot, chiefly containing mutations in TP53 but wild type PIK3CA. The plots of progesterone, estrogen, HER2 status and triple negative classification show that the samples selected in the t-SNE are virtually all triple negative breast cancers. In contrast to the oncoplot, the methylation UMAP shows no strong clustering, in line with knowledge of methylation patterns in triple negative breast cancer. \label{fig:multimodal_selection}](multimodal_selection_with_lasso.png)
![Example of the ggoncoplot shown in Figure 1, where the oncoplot has been dynamically cross-linked to a gene expression t-SNE plot (top left) and a methylation UMAP (top right). Here, the lasso tool was used to select a cluster of gene expression data points (i.e., individual samples) in the t-SNE plot. Selected samples were automatically highlighted on the UMAP and oncoplot. This reveals that samples which cluster on the left of the t-SNE plot also cluster in the oncoplot, chiefly containing mutations in TP53 and wild type PIK3CA. The plots of progesterone, estrogen, HER2 status and triple negative classification show that the samples selected in the t-SNE are enriched for triple negative breast cancers. In contrast to the oncoplot, the methylation UMAP shows no strong clustering, consistent with knowledge of methylation patterns in triple negative breast cancer. Expression and methylation plots were produced using the [express](https://github.com/selkamand/express) package. \label{fig:multimodal_selection}](multimodal_selection_with_lasso.png)



Expand All @@ -47,7 +46,7 @@ Oncoplots are highly effective for visualising mutation data in cancer cohorts b

- **Support for tidy datasets**: Compatibility with tidy, tabular mutation-level formats (MAF files or relational databases), typical of cancer cohort datasets. This greatly improves the range of datasets that can be quickly and easily visualised in an oncoplot.

- **Auto-colouring**: Automatic selection of colour palettes for datasets where the consequence annotations are aligned with standard variant effect dictionaries (PAVE, SO, or MAF).
- **Auto-colouring**: Automatic selection of accessible colour palettes for datasets where the consequence annotations are aligned with standard variant effect dictionaries (PAVE, SO, or MAF).

- **Versatility**: The ability to visualize entities other than gene mutations, such as noncoding features (e.g., promoter or enhancer mutations) and non-genomic entities (e.g., microbial presence in microbiome datasets).

Expand All @@ -59,6 +58,8 @@ We developed ggoncoplot as the first R package to address all these challenges t

# Acknowledgements

The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga. Methylation, expression, and genome datasets were obtained from the Xena TCGA Pan-Cancer Atlas Hub [@goldman:2020].

We thank the developers of the packages integral to ggoncoplot, especially David Gohel for ggiraph [@gohel:2024], which enables its interactivity, and Thomas Lin Pedersen for patchwork [@pedersen:2024] and ggplot2 maintenance. We also acknowledge Hadley Wickham and all contributors to ggplot2 [@wickham:2016].
Additionally, we thank Dr. Marion Mateos for her insightful feedback during the development of ggoncoplot.

Expand Down
Binary file modified paper/paper.pdf
Binary file not shown.
Loading