Skip to content

Bayesian Network Latent Mixture Model

M. Brown edited this page Mar 14, 2019 · 24 revisions

Intro

Posterior Probabilities for CNV States as per Bayesian Network

Given the identified Copy Number Variation regions from the Hidden Markov Model, a Bayesian latent mixture model is implemented to identify the posterior probabilities of alteration status in each cell and whole CNV region. This method is leveraged to combat possible miss identification by the HMM of CNVs or cells that might not be true CNVs (false positives).

Outline

Methods

CNV regions are defined as a subset of adjacent cells and genes that are labeled by the HMM as a single, non-normal CNV state. HMM provides CNV state predictions on an individual gene basis, the multi-state Bayesian mixture model compliments the HMM predictions by providing predictions on a individual cell basis. It is presumed that a single CNV event belongs to a single state, as you cannot have an amplification or deletion of different magnitude occur simultaneously in a single event. Given this notion, multi-state Bayesian mixture model allows for the contrast in probabilities of single CNV regions belonging to each individual CNV state. By default, CNVs with a probability of being normal (represented in the model by the normal state) above a user specified threshold are relabeled as normal states.

The Bayesian latent mixture model is written in BUGS (Bayesian Inference Using Gibbs Sampling) and implemented using the R package rJAGS for Monte Carlo Markov Chain simulations (Plummer, M. 2013). rjags preforms 1000 iterations with a burn in of 500.

Graphical representation of the Bayesian Network mixture model used to estimate the posterior probability of each cell containing a given CNV. The node GeneExp is the likelihood of the observed gene expression data given parameters μk (mean expression for CNV state k) and τk (precision = 1/var for CNV state k). C and G represents the number of cells and genes respectively. εj is the latent variable representing cell specific CNV level (state membership), and θ is a hyperparameter corresponding to the probability of each CNV level. Squared values are observed variables, circles correspond to random variables. The εj (cell-specific CNV whole region state prediction) is the main target of our prediction efforts.




Probability Plots:

By default InferCNV generates several posterior probability plots. For each predicted CNV region, the posterior probability of the entire CNV region belonging to each of the 6 states is plotted in cnvProbs.pdf, along with posterior probability of each cell line belonging to each state in cellProbs.pdf. The plot NormalProbabilities.png visualizes the predicted CNVs on the heat map with color intensities corresponding to the posterior probability of that CNV region not being normal (1-P(cnv=normal)).

The 6 states correspond to the following CNV events:

  • State 1; 0x, complete loss.
  • State 2; 0.5x, loss of one copy.
  • State 3; 1x, neutral
  • State 4; 1.5x, addition of one copy
  • State 5; 2x, addition of two copies
  • State 6; 3x, essentially a placeholder for >2x copies but modeled as 3x.

Box plots showing the posterior probability of each CNV belonging to each possible state. The arrows point to which CNV regions the box plots correspond to on the heat map.




Heat map showing the probability of each CNV being not being normal. Color intensities correspond to the posterior probability between 0 and 1 of the CNV regions not being normal (1 - P(CNV=normal)). The darker the red, the greater the probability that CNV is not normal.







Filtering out low-probability CNVs

CNV regions identified by the HMM are filtered out if the CNV region's posterior probability of being normal exceeds a specified threshold. This combats possibility of miss identified CNV's by removing CNV's that are most likely to be normal and not a true CNV events. By default this threshold is set to 0.5, given this any CNV region that has a posterior probability of being of a normal state greater than 0.5 is relabeled as "normal" and no longer considered an identified CNV region. A threshold of 0.5 was chosen for default as it tends to be more lenient threshold. This threshold can be adjusted by setting the R BayesMaxPNormal argument to a value between 0 and 1 in InferCNV's "R run()" function. The Bayesian network latent mixture model can be completely avoided by setting R BayesMaxPNormal to 0.

The following figure shows predicted CNVs and how CNV identification changes as the normal posterior probability threshold changes. The arrows indicate what threshold value is being applied in the visualization. As the threshold decreases, more CNVs are filtered out as more CNVs have normal probabilities above the new decreased threshold. The colors correspond to the level of the deletion or amplification event (states); dark blue:0x, light blue:0.5x, white:1.0x (normal), light red:1.5x, red:2.0x, dark red:3x. These CNVs come from the InferCNV example dataset.
Clone this wiki locally