Skip to content

Commit

Permalink
Issue #394: Add information about predictions to FAQ (#395)
Browse files Browse the repository at this point in the history
* Add package links

* Add documentation on how to do prediction
  • Loading branch information
athowes authored Oct 21, 2024
1 parent 8b53356 commit 4d84ee0
Showing 1 changed file with 36 additions and 6 deletions.
42 changes: 36 additions & 6 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ library(dplyr)
library(ggplot2)
library(scales)
library(tidyr)
library(tidybayes)
set.seed(1)
Expand Down Expand Up @@ -57,7 +58,7 @@ fit <- epidist(
## I would like to work with the samples output

The output of a call to `epidist` is compatible with typical Stan workflows.
We recommend use of the `posterior` package for working with samples from MCMC or other sampling algorithms.
We recommend use of the [`posterior`](https://mc-stan.org/posterior/) package for working with samples from MCMC or other sampling algorithms.
For example, the function `posterior::as_draws_df()` may be used to obtain a dataframe of MCMC draws for specified parameters.

```{r message = FALSE}
Expand All @@ -69,7 +70,7 @@ head(draws)
## How can I assess if sampling has converged?

The output of a call to `epidist` is compatible with typical Stan workflows.
We recommend use of the `bayesplot` package for sampling diagnostic plots.
We recommend use of the [`bayesplot`](http://mc-stan.org/bayesplot/) package for sampling diagnostic plots.
For example, the function `bayesplot::mcmc_trace()` can be used to produce traceplots for specified parameters.

```{r message = FALSE}
Expand All @@ -85,7 +86,7 @@ epidist_diagnostics(fit)

## I'd like to run a simulation study

We recommend use of the `purrr` package for running many `epidist` models, for example as a part of a simulation study.
We recommend use of the [`purrr`](https://purrr.tidyverse.org/) package for running many `epidist` models, for example as a part of a simulation study.
We particularly highlight two functions which might be useful:

1. `purrr::map()` (and other similar functions) for iterating over a list of inputs.
Expand All @@ -96,7 +97,7 @@ For an example use of these functions, have a look at the [`epidist-paper`](http

## How did you choose the default priors for `epidist`?

`brms` provides default priors for all parameters.
[`brms`](http://paulbuerkner.com/brms/) provides default priors for all parameters.
However, some of those priors do not make sense in the context of our application.
Instead, we used [prior predictive checking](https://mc-stan.org/docs/stan-users-guide/posterior-predictive-checks.html) to set `epidist`-specific default priors which produce epidemiological delay distribution mean and standard deviation parameters in a reasonable range.

Expand Down Expand Up @@ -160,7 +161,7 @@ quantile(pred$sd, c(0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99))

# How can I assess how sensitive the fitted posterior distribution is to the prior distribution used?

We recommend use of the [`priorsense`](https://github.com/n-kall/priorsense) R package [@kallioinen2024detecting] to check how sensitive the posterior distribution is to perturbations of the prior distribution and likelihood using power-scaling analysis:
We recommend use of the [`priorsense`](https://github.com/n-kall/priorsense) package [@kallioinen2024detecting] to check how sensitive the posterior distribution is to perturbations of the prior distribution and likelihood using power-scaling analysis:

```{r}
library(priorsense)
Expand All @@ -170,7 +171,7 @@ powerscale_plot_dens(fit, variable = c("Intercept", "Intercept_sigma")) +

# What do the parameters in my model output correspond to?

The `epidist` package uses `brms` to fit models.
The `epidist` package uses [`brms`](http://paulbuerkner.com/brms/) to fit models.
This means that the model output will include `brms`-style names for parameters.
Here, we provide a table giving the correspondence between the distributional parameter names used in `brms` and those used in standard R functions for some common likelihood families.

Expand All @@ -184,4 +185,33 @@ Here, we provide a table giving the correspondence between the distributional pa
Note that all families in `brms` are parameterised with some measure of centrality `mu` as their first parameter.
This parameter does not necessarily correspond to the mean: hence the provision of a function `add_mean_sd()` within `epidist` to add columns containing the natural scale mean and standard deviation to a `data.frame` of draws.

# How can I generate predictions with my fitted `epidist` model?

It is possible to generate predictions manually by working with [samples from the model output](https://epidist.epinowcast.org/articles/faq.html#i-would-like-to-work-with-the-samples-output).
However this is tricky to do, and so where possible we recommend using the [`tidybayes`](http://mjskay.github.io/tidybayes/) package.
In particular, following functions may be useful:

1. `tidybayes::add_epred_draws()` for predictions of the expected value of a delay.
2. `tidybayes::add_linpred_draws()` for predictions of the delay distributional parameter linear predictors.
3. `tidybayes::add_predicted_draws()` for predictions of the observed delay.

To see these functions demonstrated in a vignette, see ["Advanced features with Ebola data"](https://epidist.epinowcast.org/articles/ebola.html).
As a short example, to generate 4000 predictions (equal to the number of draws) of the delay that would be observed with a double censored observation process (in which the primary and secondary censoring windows are both one) then:

```{r}
draws_pmf <- data.frame(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
add_predicted_draws(fit, ndraws = 4000)
ggplot(draws_pmf, aes(x = .prediction)) +
geom_bar(aes(y = after_stat(count / sum(count)))) +
labs(x = "Delay", y = "PMF") +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
```

Importantly, this functionality is only available for `epidist` models using custom `brms` families that have `posterior_predict` and `posterior_epred` methods implemented.
For example, for the `latent_individual` model, currently methods are implemented for the [lognormal](https://github.com/epinowcast/epidist/blob/main/R/latent_lognormal.R) and [gamma](https://github.com/epinowcast/epidist/blob/main/R/latent_gamma.R) families.
If you are using another family, consider [submitting a pull request](https://github.com/epinowcast/epidist/pulls) to implement these methods!
In doing so, you may find it useful to use the [`primarycensored`](https://primarycensored.epinowcast.org/) package.

## Bibliography

0 comments on commit 4d84ee0

Please sign in to comment.