add post on when low ESS is ok

CompEvol · Sep 26, 2024 · 473b9fd · 473b9fd
1 parent ffe08d9
commit 473b9fd
Show file tree

Hide file tree

Showing 2 changed files with 53 additions and 0 deletions.
diff --git a/_posts/2024-06-01-simulators-in-BEAST.md b/_posts/2024-06-01-simulators-in-BEAST.md
@@ -16,9 +16,13 @@ The `SimulatedAlignment` can be used as a replacement of `Alignment`.
 This can be useful in [well calibrated simulation studies](https://github.com/rbouckaert/DeveloperManual) (Mendes et al, 2024).
 
 The `SimulatedAlignment` takes in a standard site model and allows for discretised gamma rate heterogeneity.
+
+For simulating *codon* sequences, the [CodonSubstModel](https://github.com/BEAST2-Dev/codonsubstmodels) package has a [`SimulatedCodonAlignment`](https://github.com/BEAST2-Dev/codonsubstmodels/blob/master/src/codonmodels/evolution/alignment/SimulatedCodonAlignment) class that can replace a `CodonAlignment` and there is an [example XML](https://raw.githubusercontent.com/BEAST2-Dev/codonsubstmodels/refs/heads/master/examples/testSimulatedCodonAlignment.xml).
+
 If you want to simulate under continuous gamma rate heterogeneity, the `rbbeast.evolution.util.ContinuousGammaSimulatedAlignment` in the RBS package can be used.
 ([example XML](https://raw.githubusercontent.com/BEAST2-Dev/rb-beast/refs/heads/master/examples/testContinuousSimulatedAlignment.xml)).
 
+
 ## `DirectSimulator`
 
 The `beast.base.inference.DirectSimulator` provides a simulator that is more efficient than sampling from MCMC, and uses independent implementations for directly simulating parameter values from parametric distributions.

diff --git a/_posts/2024-10-01-when-low-ess-is-ok.md b/_posts/2024-10-01-when-low-ess-is-ok.md
@@ -0,0 +1,49 @@
+---
+layout: post
+title: When low ESSs in Tracer can be OK
+tags: []
+---
+<p style="color:gray">1 October 2024 by <a href='mailto:[email protected]'>Remco Bouckaert</a></p>
+
+Usually, it is recommended for effective sample sizes ([ESSs](https://www.beast2.org/what-is-ess/)) to be at least 200, especially for posterior, prior and likelihood, and there are severals [tips](https://www.beast2.org/increasing-esss/) and [tricks](https://www.beast2.org/2019/08/01/increasing-ess.html) to increase the ESS.
+However, there are some instances where an ESS of less than 200 is acceptable, but be careful in judging when these situations apply.
+Here are a few examples where low ESSs are OK:
+
+## Skyline plot population and group sizes
+
+One such case is the parameters for the Bayesian skyline plot. 
+The population sizes and groups sizes together with the tree determine the population function.
+It is not uncommon for the ESSs of population and group sizes to be quite low, due to multi-modality of these parameters.
+This can habben when two consecutive population sizes are quite close in one mode and then one of these population size parameters to be grouped with the next group in the other mode.
+Since both group sizes and population sizes have to move at the same time to switch between modes, this can happen rather infrequently causing low ESSs.
+
+The population function defined by these different modes usually is not that different.
+Since it is the population function that is of interest, low ESSs for the individual dimensions of the populations size and group size parameters can be ignored.
+
+The group and population sizes still need to be logged for doing a demographic reconstruction (e.g. in Tracer), so they still end up in the trace log and will be visible in Tracer.
+
+
+## Indicator parameters that are stuck
+
+In stochastic variable selection, an indicator variabel is used to determine whether part of the model should be used or not.
+For example, in the [bModelTest](https://github.com/BEAST2-Dev/bModelTest/wiki) site model there are indicator variables to decide whether the gamma rate heterogeneity model should be used or not.
+Likewise, the discrete trait model uses indicator variables to decide which rates to set to non-zero in the rate matrix.
+
+When the data pushes very hard towards one model over the other, the indicator variable will be stuck at either true or false, and the ESS cannot be determined.
+In this case it is acceptable for the ESS to be ignored.
+Note that when the indicator variable is only very occasionally (perhaps even for a single sample) switched to the other value, the ESS will be very high, possibly close to the number of samples in the trace log.
+
+## Posteriors from nested sampling
+
+Nested sampling implemented in the [NS](https://github.com/BEAST2-Dev/nested-sampling/) package provides an alternative to creating a sample from the prior.
+However, nested sampling samples likelihoods in increasing order, and the generated posteriors therefore has a trace for the likelihood that is monotonically increasing.
+This results in ESS estimates in Tracer that are very low, since it is based on autocorrelation and all samples are independent samples in a posterior obtained from nested sampling.
+The actual ESS is therefore the total number of samples in the trace log, and ESS estimates in Tracer should be ignored.
+
+NB Burn-in in Tracer should be set to zero for nested sampling posteriors.
+
+## Nuisance parameters
+
+Sometimes ESSs can be slightly low for parameters that are not of interest, like the kappa parameter for the HKY model when the research question revolves around the age of the tree.
+Care must be taken to let these somewhat reduced ESSs slip though: make sure the distributions of individual MCMC runs match.
+It is much preferred though to fix the problem by increasing the weight of operators that do proposals for these nuisance parameters, run the MCMC a bit longer, or apply any of the other techniques [to increase ESS](http://www.beast2.org/2019/08/01/increasing-ess.html).