-
Notifications
You must be signed in to change notification settings - Fork 17
5. Posterior summary
The following code generates a synthetic example to illustrate BASiCS:
Data <- makeExampleBASiCS_Data()
Chain <- BASiCS_MCMC(Data = Data, N = 20000, Thin = 20, Burn = 10000,
PrintProgress = FALSE)
To access the MCMC chains associated to individual parameter use the function displayChainBASiCS
. For example,
displayChainBASiCS(Chain, Param = "mu")[1:5,1:5]
As a summary of the posterior distribution, the Summary
methods calculates
posterior medians and the High Posterior Density (HPD) intervals for each model
parameter. As a default option, HPD intervals contain 0.95 probability.
ChainSummary <- Summary(Chain)
Subsequently, the displaySummaryBASiCS
method can extract posterior summaries for individual
parameters. For example
head(displaySummaryBASiCS(ChainSummary, Param = "mu"))
Above,
- First column: posterior medians
- Second column: lower limit of the 95% HPD interval
- Third column: upper limit of the 95% HPD interval
The following figures display posterior medians and the corresponding HPD 95%
intervals for gene-specific parameters
par(mfrow = c(2,2))
plot(ChainSummary, Param = "mu", main = "All genes", log = "y")
plot(ChainSummary, Param = "mu", Genes = 1:10, main = "First 10 genes")
plot(ChainSummary, Param = "delta", main = "All genes")
plot(ChainSummary, Param = "delta", Genes = c(2,5,10,50), main = "5 custom genes")
It is also possible to obtain similar summaries for the normalising constants
par(mfrow = c(1,2))
plot(ChainSummary, Param = "phi")
plot(ChainSummary, Param = "s", Cells = 1:5)
Finally, it is also possible to create a scatterplot of posterior estimates for gene-specific parameters. Typically, this plot will exhibit the confounding effect that is observed between mean and over-dispersion.
par(mfrow = c(1,2))
plot(ChainSummary, Param = "mu", Param2 = "delta", log = "x", SmoothPlot = FALSE)
plot(ChainSummary, Param = "mu", Param2 = "delta", log = "x", SmoothPlot = TRUE)
The option SmoothPlot = TRUE
is generally recommended as this plot will
contain thousands of genes when analysing real datasets.