-
Notifications
You must be signed in to change notification settings - Fork 4
Home
This is an R shiny app created as a user friendly way to run the R package infercnv. Infercnv is used to explore tumor single cell RNA-Seq data to identify evidence for somatic large-scale chromosomal copy number alterations, such as gains or deletions of entire chromosomes or large segments of chromosomes. More information on infercnv can be found on the infercnv Github Page
InfercnvApp can be installed within R using devtools. You can use the following commands from within R to do so:
library("devtools")
devtools::install_github("broadinstitute/infercnvApp")
Alternatively, the infercnvApp package repository can be cloned from GitHub and installed like so:
git clone https://github.com/broadinstitute/infercnvApp.git
cd infercnvApp
R
> install.packages("./", repos=NULL, type="source")
Infercnv must be installed in order to run infercnvApp. Installation information for infercnv can be found here
In order to start the shiny app, the following function needs to be ran in R:
infercnvApp::infercnvApp()
Once the function is ran, the app will open in a browser and main page will appear like so:
At the top of the app, there are there tabs; Home, Upload Files And Settings, Analysis Output
The Home tab provides information about running infercnv and about each step of the analysis process in several sub-tabs.
The Running infercnv tab includes insight into the many settings and options available in running infercnv, from the basic to the more advanced settings and options.
Infercnv Figures has information on interpreting the figure that is output by the infercnv analysis.
Example Data tab gives the final output when running infercnv using the provided example data. It also provides links to additional example data with the output figures for those as well.
Citation contains the citation information for infercnv, along with several other citations that were used for the creation of infercnv.
Session Info shows the information about the current R session, providing information about R, the Operating System, and loaded or attached R packages.
Upload Files And Settings is where infercnv's 2-step protocol is preformed, the user can upload their data files and run infercnv.
There are several files that may be needed depending on the analysis. Here users can upload their Raw Counts Matrix, Sample Annotation File, and Gene Order File. Additional information about these input files can be found here
InferCNV is compatible with both smart-seq2 and 10x single cell transcriptome data, and presumably other methods (not tested). The counts matrix can be generated using any conventional single cell transcriptome quantification pipeline, yielding a matrix of genes (rows) vs. cells (columns) containing assigned read counts.
The sample annotation file is used to define the different cell types, and optionally, indicating how the cells should be grouped according to sample (ie. patient). The format is simply two columns, tab-delimited, and there is no column header.
The gene ordering file provides the chromosomal location for each gene. The format is tab-delimited and has no column header, simply providing the gene name, chromosome, and gene span.
Example data is provided with infercnv. The example data can be selected by clicking on the Example Data tab in the Input Files To Initiate InferCNV and checking the Upload Example Data box.
After the users files are uploaded in the first step, the user can adjust the settings and run infercnv. The basic settings are under the Options tab, while more advanced user setting are found under the Advacned Options tab. The Misc tab holds additional miscellaneous settings.
The Analysis Output tab is where the outputs for infercnv can be viewed.
The Main Analysis Output sub-tab is where the final output, along with the preliminary output figures can be viewed. The Median Filter Output. sub-tab is an option for an add-on median filtering that can be applied to smooth the visual output of inferCNV. The filtering takes into account chromosomes and the clusters or subclusters that have been defined as boundaries. It also keeps the hierarchical clustering previously defined intact in order for it to be representative of how it was obtained.
If infercnv is ran with the HMM option, the HMM Analysis Output tab will output a figure revealing CNV states as predicted by the Hidden Markov Model (HMM). Infercnv currently supports two models for HMM-based CNA prediction, what we refer to as the i3 and i6 models.
If infercnv was ran with the Hidden Markov Model option, then a subsequent bayesian analysis is preformed and the Bayesian Analysis Output outputs a figure that shows the probability of each CNA not being normal.
Additionally, posterior probability plots are generated and and viewable in the Probability Plots tab. For each predicted CNA region, the posterior probability of the entire CNA region belonging to each of the 6 or 3 states is plotted in cnvProbs.pdf, along with posterior probability of each cell line belonging to each state in cellProbs.pdf. More information can be found here
CNA regions identified by the HMM are filtered out if the CNA region's posterior probability of being normal exceeds a specified threshold. This combats possibility of miss identified CNAs by removing CNAs that are most likely to be normal and not a true CNA events. This threshold can be adjusted by setting the Bayes Max Probability of Normal State argument to a value between 0 and 1 in InferCNV's analysis options. The Dynamic Plots allows the user to adjust this threshold value and visualize how the new value changes what CNAs are kept and removed.
If infercnv was ran with the HMM and Bayesian mixture model option, diagnostic plots are created. The Diagnostic Plots tabs shows several different kinds of Bayesian diagnostic plots used to assess performance, convergence, and providing credibility intervals of the Bayesian mixture model on the identified CNAs. These plots tend to be more complex and difficult to interpret.