Skip to content
M. Brown edited this page Sep 20, 2021 · 43 revisions

 

InfercnvApp

 

Intro

infercnvApp

This is an R shiny app created as a user friendly way to run the R package infercnv. Infercnv is used to explore tumor single cell RNA-Seq data to identify evidence for somatic large-scale chromosomal copy number alterations, such as gains or deletions of entire chromosomes or large segments of chromosomes. More information on infercnv can be found on the infercnv Github Page

Outline

 

Installing

Installing within R

InfercnvApp can be installed within R using devtools. You can use the following commands from within R to do so:

library("devtools")
devtools::install_github("broadinstitute/infercnvApp")

GitHub source installation

Alternatively, the infercnvApp package repository can be cloned from GitHub and installed like so:

git clone https://github.com/broadinstitute/infercnvApp.git
cd infercnvApp
R
> install.packages("./", repos=NULL, type="source")

Requirement

Infercnv must be installed in order to run infercnvApp. Installation information for infercnv can be found here

 

Running

In order to start the shiny app, the following function needs to be ran in R:

infercnvApp::infercnvApp()

Once the function is ran, the app will open in a browser and main page will appear like so:

At the top of the app, there are there tabs; Home, Upload Files And Settings, Analysis Output

Home

The Home tab provides information about running infercnv and about each step of the analysis process in several sub-tabs.

The Running infercnv tab includes insight into the many settings and options available in running infercnv, from the basic to the more advanced settings and options.

Infercnv Figures has information on interpreting the figure that is output by the infercnv analysis.

Example Data tab gives the final output when running infercnv using the provided example data. It also provides links to additional example data with the output figures for those as well.

Citation contains the citation information for infercnv, along with several other citations that were used for the creation of infercnv.

Session Info shows the information about the current R session, providing information about R, the Operating System, and loaded or attached R packages.

Upload Files And Settings

Upload Files And Settings is where infercnv's 2-step protocol is preformed, the user can upload their data files and run infercnv.

Step1: Upload Files

There are several files that may be needed depending on the analysis. Here users can upload their Raw Counts Matrix, Sample Annotation File, and Gene Order File. Additional information about these input files can be found here

Raw Counts Matrix

InferCNV is compatible with both smart-seq2 and 10x single cell transcriptome data, and presumably other methods (not tested). The counts matrix can be generated using any conventional single cell transcriptome quantification pipeline, yielding a matrix of genes (rows) vs. cells (columns) containing assigned read counts.

Sample Annotation File

The sample annotation file is used to define the different cell types, and optionally, indicating how the cells should be grouped according to sample (ie. patient). The format is simply two columns, tab-delimited, and there is no column header.

Gene Order Files

The gene ordering file provides the chromosomal location for each gene. The format is tab-delimited and has no column header, simply providing the gene name, chromosome, and gene span.

Example Data

Example data is provided with infercnv. The example data can be selected by clicking on the Example Data tab in the Input Files To Initiate InferCNV and checking the Upload Example Data box.

Step 2: Settings And Run Analysis

After the users files are uploaded in the first step, the user can adjust the settings and run infercnv. The basic settings are under the Options tab, while more advanced user setting are found under the Advacned Options tab. The Misc tab holds additional miscellaneous settings.

Analysis Output

The Analysis Output tab is where the outputs for infercnv can be viewed.

Main Analysis Output

The Main Analysis Output sub-tab is where the final output, along with the preliminary output figures can be viewed. The Median Filter Output. sub-tab is an option for an add-on median filtering that can be applied to smooth the visual output of inferCNV. The filtering takes into account chromosomes and the clusters or subclusters that have been defined as boundaries. It also keeps the hierarchical clustering previously defined intact in order for it to be representative of how it was obtained.

HMM Analysis Output

If infercnv is ran with the HMM option, the HMM Analysis Output tab will output a figure revealing CNV states as predicted by the Hidden Markov Model (HMM). Infercnv currently supports two models for HMM-based CNA prediction, what we refer to as the i3 and i6 models.

Bayesian Analysis Output

If infercnv was ran with the Hidden Markov Model option, then a subsequent bayesian analysis is preformed and the Bayesian Analysis Output outputs a figure that shows the probability of each CNA not being normal.

Additionally, posterior probability plots are generated and and viewable in the Probability Plots tab. For each predicted CNA region, the posterior probability of the entire CNA region belonging to each of the 6 or 3 states is plotted in cnvProbs.pdf, along with posterior probability of each cell line belonging to each state in cellProbs.pdf. More information can be found here

Dynamic Plots

CNA regions identified by the HMM are filtered out if the CNA region's posterior probability of being normal exceeds a specified threshold. This combats possibility of miss identified CNAs by removing CNAs that are most likely to be normal and not a true CNA events. This threshold can be adjusted by setting the Bayes Max Probability of Normal State argument to a value between 0 and 1 in InferCNV's analysis options. The Dynamic Plots allows the user to adjust this threshold value and visualize how the new value changes what CNAs are kept and removed.

Supplemental

If infercnv was ran with the HMM and Bayesian mixture model option, diagnostic plots are created. The Diagnostic Plots tabs shows several different kinds of Bayesian diagnostic plots used to assess performance, convergence, and providing credibility intervals of the Bayesian mixture model on the identified CNAs. These plots tend to be more complex and difficult to interpret.