Skip to content
Brian Haas edited this page Sep 7, 2023 · 68 revisions

 

The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) aims to provide tools for leveraging RNA-Seq to gain insights into the biology of cancer transcriptomes. Bioinformatics tool support is provided for mutation detection, fusion transcript identification, de novo transcript assembly of cancer-specific transcripts, lncRNA classification, and foreign transcript detection (viruses, microbes). CTAT is funded by the National Cancer Institute Informatics Technology for Cancer Research (NCI ITCR) program. Software tools and pipelines developed as components of Trinity CTAT are described below with links to the corresponding open source software, documentation, and tutorials.

Mutation Detection via CTAT-Mutations Pipeline

CTAT-Mutations Pipeline is a variant calling pipeline focussed on detecting mutations from RNA sequencing (RNA-seq) data. It integrates GATK Best Practices along with downstream steps to annotate, filter, and prioritize cancer mutations. This includes leveraging the RADAR and RediPortal databases for identifying likely RNA-editing events, dbSNP for excluding common variants, and COSMIC to highlight known cancer mutations. Finally, CRAVAT is leveraged to annotate and prioritize variants according to likely biological impact and relevance to cancer.

The Trinity CTAT Mutations Pipeline is available at https://github.com/NCIP/ctat-mutations/wiki

Neoantigen prediction from RNA-seq

Extending the CTAT-Mutations pipeline to explore neoantigen prediction with pVACseq, we have the CTAT-pVACseq pipeline.

Fusion Transcript Detection

Fusion Transcript Detection with Illumina RNA-seq

Detection of cancer fusion transcripts in CTAT is a multi-pronged process involving the use of several alternative individual methods for predicting fusions followed by in silico validation and annotation. Software tools developed as part of CTAT include STAR-Fusion as a highly efficient reference genome read-mapping approach, and TrinityFusion to leverage de novo transcriptome assembly for fusion detection.

All predicted fusions can be subject to in silico validation using our CTAT FusionInspector, which re-evaluates the evidence for fusions predicted by any of the above methods, re-scores the predictions, and uses Trinity to de novo reconstruct likely fusion transcript sequences.

FusionInspector ships with STAR-Fusion above as a companion module, but can also be downloaded and installed separately if needed. FusionInspector can be found at https://github.com/FusionInspector/FusionInspector/wiki

STAR-Fusion and TrinityFusion are published in Genome Biology volume 20, Article number: 213 (2019).

An example Terra workspace is available here.

Fusion Detection Using Long Reads (PacBio Iso-seq, MAS-seq, or ONT transcriptomes)

We provide ctat-LR-fusion for detecting fusion transcripts using long read transcriptome sequencing data. This includes PacBio Iso-seq, MAS-seq, or ONT transcriptome sequencing data. Like FusionInspector, igv-reports is used to provide interactive visualizations for navigating the supporting evidence for detected fusion transcripts.

If you additionally have matched Illumina data, you can include those to assist with fusion transcript expression quantification and fusion splicing isoform resolution.

CTAT-Splicing Detection and Annotation of Aberrant Splicing Isoforms in Cancer

Certain introns are more likely to be relevant to cancer biology, representing cancer-specific isoforms that may result from alternative splicing or stem from intra-gene genomic deletions. For example, EGFR-vIII, EGFR-IVa, and EGFR-IVb are known oncogenic isoforms of the EGFR gene that are often found in glioblastomas and result from intra-gene deletion of exons that are observed as skipped in expressed isoforms. Another well-known example is a deletion of exon 19 in the MET gene, frequently found in lung cancers. Other splicing patterns that are relevant to cancer biology are evident from comparing large transcriptome data sets of tumor and normal tissues.

Our CTAT-Splicing Module interfaces nicely with other components of Trinity CTAT and you can run it as a post-process to mutation and/or fusion detection to generate cancer splicing reports.

Single cell cancer transcriptome analysis

Analysis of single cell transcriptome data to better understand cancer heterogeneity is a growing focus of Trinity CTAT. We are working to update our existing computational components to better leverage single cell transcriptome data, including identifying mutations and fusion transcripts that contribute to tumor heterogeneity.

Among these efforts, we developed an application inferCNV to identify largescale copy number variations (CNV) evident from single cell expression data. Many more contributions are expected to follow shortly.

De novo transcriptome Assembly of Cancer-specific Transcripts

We developed DISCASM to assist in the de novo assembly of cancer-specific transcripts. DISCASM restricts de novo transcriptome assembly to those reads that map discordantly or fail to map to the reference genome sequence. Such transcripts are enriched for those that target regions that are restructured or altogether missing from the human reference genome, such as fusion transcripts or those derived from foreign sources (viruses, microbes). Installation of this tool can be through conda or Galaxy toolshed as well.

The Trinity CTAT DISCASM Pipeline is available at https://github.com/DISCASM/DISCASM/wiki

DISCASM is a module used by our TrinityFusion software, and has been demonstrated of reconstructing tumor viruses present within cancer RNA-seq data sets.

Virus Content and Genome Integration Site Identification

We developed VirusIntegrationFinder to identify sites of viral integration in the human genome. This is particularly relevant to human papillomavirus (HPV) but can be used to explore other types of viruses too.

Installation

For Trinity CTAT applications, we aim to enable installation from a variety of sources:

  • Software releases from GitHub (minimum for all projects)
  • git cloning the 'master' branch from GitHub, which should reflect the most current release.
  • Docker, Singularity
  • the Terra cloud computing framework

See each of the separate project repos for tool-specific installation options.

Questions, comments, etc?

Contact us via our google group: https://groups.google.com/forum/#!forum/trinity_ctat_users

Funding

Trinity CTAT is funded by the National Cancer Institute Informatics Technology for Cancer Research

Our efforts related to building a Trinity Cancer Transcriptome Analysis Toolkit are described in this Youtube screencast:

Trinity CTAT on Youtube