From 4d84ee004c00c0a64526883fe4559072a2a61f89 Mon Sep 17 00:00:00 2001 From: Adam Howes Date: Mon, 21 Oct 2024 11:19:26 +0100 Subject: [PATCH] Issue #394: Add information about predictions to FAQ (#395) * Add package links * Add documentation on how to do prediction --- vignettes/faq.Rmd | 42 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd index e0df64174..78a3c0c58 100644 --- a/vignettes/faq.Rmd +++ b/vignettes/faq.Rmd @@ -29,6 +29,7 @@ library(dplyr) library(ggplot2) library(scales) library(tidyr) +library(tidybayes) set.seed(1) @@ -57,7 +58,7 @@ fit <- epidist( ## I would like to work with the samples output The output of a call to `epidist` is compatible with typical Stan workflows. -We recommend use of the `posterior` package for working with samples from MCMC or other sampling algorithms. +We recommend use of the [`posterior`](https://mc-stan.org/posterior/) package for working with samples from MCMC or other sampling algorithms. For example, the function `posterior::as_draws_df()` may be used to obtain a dataframe of MCMC draws for specified parameters. ```{r message = FALSE} @@ -69,7 +70,7 @@ head(draws) ## How can I assess if sampling has converged? The output of a call to `epidist` is compatible with typical Stan workflows. -We recommend use of the `bayesplot` package for sampling diagnostic plots. +We recommend use of the [`bayesplot`](http://mc-stan.org/bayesplot/) package for sampling diagnostic plots. For example, the function `bayesplot::mcmc_trace()` can be used to produce traceplots for specified parameters. ```{r message = FALSE} @@ -85,7 +86,7 @@ epidist_diagnostics(fit) ## I'd like to run a simulation study -We recommend use of the `purrr` package for running many `epidist` models, for example as a part of a simulation study. +We recommend use of the [`purrr`](https://purrr.tidyverse.org/) package for running many `epidist` models, for example as a part of a simulation study. We particularly highlight two functions which might be useful: 1. `purrr::map()` (and other similar functions) for iterating over a list of inputs. @@ -96,7 +97,7 @@ For an example use of these functions, have a look at the [`epidist-paper`](http ## How did you choose the default priors for `epidist`? -`brms` provides default priors for all parameters. +[`brms`](http://paulbuerkner.com/brms/) provides default priors for all parameters. However, some of those priors do not make sense in the context of our application. Instead, we used [prior predictive checking](https://mc-stan.org/docs/stan-users-guide/posterior-predictive-checks.html) to set `epidist`-specific default priors which produce epidemiological delay distribution mean and standard deviation parameters in a reasonable range. @@ -160,7 +161,7 @@ quantile(pred$sd, c(0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99)) # How can I assess how sensitive the fitted posterior distribution is to the prior distribution used? -We recommend use of the [`priorsense`](https://github.com/n-kall/priorsense) R package [@kallioinen2024detecting] to check how sensitive the posterior distribution is to perturbations of the prior distribution and likelihood using power-scaling analysis: +We recommend use of the [`priorsense`](https://github.com/n-kall/priorsense) package [@kallioinen2024detecting] to check how sensitive the posterior distribution is to perturbations of the prior distribution and likelihood using power-scaling analysis: ```{r} library(priorsense) @@ -170,7 +171,7 @@ powerscale_plot_dens(fit, variable = c("Intercept", "Intercept_sigma")) + # What do the parameters in my model output correspond to? -The `epidist` package uses `brms` to fit models. +The `epidist` package uses [`brms`](http://paulbuerkner.com/brms/) to fit models. This means that the model output will include `brms`-style names for parameters. Here, we provide a table giving the correspondence between the distributional parameter names used in `brms` and those used in standard R functions for some common likelihood families. @@ -184,4 +185,33 @@ Here, we provide a table giving the correspondence between the distributional pa Note that all families in `brms` are parameterised with some measure of centrality `mu` as their first parameter. This parameter does not necessarily correspond to the mean: hence the provision of a function `add_mean_sd()` within `epidist` to add columns containing the natural scale mean and standard deviation to a `data.frame` of draws. +# How can I generate predictions with my fitted `epidist` model? + +It is possible to generate predictions manually by working with [samples from the model output](https://epidist.epinowcast.org/articles/faq.html#i-would-like-to-work-with-the-samples-output). +However this is tricky to do, and so where possible we recommend using the [`tidybayes`](http://mjskay.github.io/tidybayes/) package. +In particular, following functions may be useful: + +1. `tidybayes::add_epred_draws()` for predictions of the expected value of a delay. +2. `tidybayes::add_linpred_draws()` for predictions of the delay distributional parameter linear predictors. +3. `tidybayes::add_predicted_draws()` for predictions of the observed delay. + +To see these functions demonstrated in a vignette, see ["Advanced features with Ebola data"](https://epidist.epinowcast.org/articles/ebola.html). +As a short example, to generate 4000 predictions (equal to the number of draws) of the delay that would be observed with a double censored observation process (in which the primary and secondary censoring windows are both one) then: + +```{r} +draws_pmf <- data.frame(relative_obs_time = 1000, pwindow = 1, swindow = 1) |> + add_predicted_draws(fit, ndraws = 4000) + +ggplot(draws_pmf, aes(x = .prediction)) + + geom_bar(aes(y = after_stat(count / sum(count)))) + + labs(x = "Delay", y = "PMF") + + scale_x_continuous(limits = c(0, 30)) + + theme_minimal() +``` + +Importantly, this functionality is only available for `epidist` models using custom `brms` families that have `posterior_predict` and `posterior_epred` methods implemented. +For example, for the `latent_individual` model, currently methods are implemented for the [lognormal](https://github.com/epinowcast/epidist/blob/main/R/latent_lognormal.R) and [gamma](https://github.com/epinowcast/epidist/blob/main/R/latent_gamma.R) families. +If you are using another family, consider [submitting a pull request](https://github.com/epinowcast/epidist/pulls) to implement these methods! +In doing so, you may find it useful to use the [`primarycensored`](https://primarycensored.epinowcast.org/) package. + ## Bibliography