Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 44: Approximate inference vignette #69

Merged
merged 31 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
edd122a
Laplace and pathfinder via brms
athowes May 30, 2024
7d824b6
Update vignette outline and use S3 versions
athowes Jun 10, 2024
e67f068
Fix typo
athowes Jun 10, 2024
7171327
Rename ltcad to latent_individual
athowes Jun 11, 2024
bc4cb4e
Background and Laplace
athowes Jun 11, 2024
dde0074
Decent way through first draft now
athowes Jun 11, 2024
10aaaef
Fix lint
athowes Jun 12, 2024
7a1a2d2
Add required package deps
athowes Jun 12, 2024
1dc2449
Improve text, figure, and note pathfinder error
athowes Jun 12, 2024
6d82cc6
Text and code improvements for approximate inference vignette
athowes Jun 14, 2024
c8af32e
Sam comment
athowes Jun 18, 2024
4ea1009
Better describe ADVI
athowes Jun 25, 2024
3051606
Some writing, layout, and use of purrr
athowes Jun 25, 2024
ef531cc
Add plot of delay distributions
athowes Jun 25, 2024
12bea73
Add time taken manually (looks like no inbuilt Stan functions for this)
athowes Jun 25, 2024
bd124f6
What is Pathfinder (basic)
athowes Jun 27, 2024
8935e94
Standardise colours across figures
athowes Jun 28, 2024
b308d95
Clarify the stochastic behaviour of Pathfinder, set a seed that works…
athowes Jun 28, 2024
ea872b3
Use imap as suggested by Damon
athowes Jul 8, 2024
9ccad89
Finish first draft of approximate inference vignette with conclusions…
athowes Jul 8, 2024
3b2f78b
Try adding back rstan to see if it fixes CI
athowes Jul 9, 2024
12c1465
Add description
athowes Jul 9, 2024
cfbe00b
Add GH remote for brms to get latest bug fix
athowes Jul 10, 2024
1317db1
Merge branch 'main' into approximate-inference-vignette
seabbs Jul 11, 2024
61d4bc5
Merge branch 'main' into approximate-inference-vignette
seabbs Jul 11, 2024
fe525c0
Fix typos
athowes Jul 12, 2024
df0178a
As of recent PR then prepare has been replaced
athowes Jul 12, 2024
c9a1de5
Add units to times
athowes Jul 14, 2024
875772d
Fix typo spotted by Katie
athowes Jul 14, 2024
c28e6da
Fix typo spotted by Daniel
athowes Jul 15, 2024
d0effbf
Merge branch 'main' into approximate-inference-vignette
seabbs Jul 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ Suggests:
roxyglobals
Remotes:
stan-dev/cmdstanr,
Rdatatable/data.table
Rdatatable/data.table,
paul-buerkner/brms
Config/Needs/website:
r-lib/pkgdown,
epinowcast/enwtheme
Expand Down
267 changes: 267 additions & 0 deletions vignettes/approx-inference.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
---
title: "Approximate Bayesian inference in epidist"
description: |
"A demonstration of how to fit models in epidist using approximate Bayesian
inference methods such as the Laplace approximation, ADVI, and Pathfinder."
output:
bookdown::html_document2:
fig_caption: yes
code_folding: show
number_sections: true
pkgdown:
as_is: true
# csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl
link-citations: true
vignette: >
%\VignetteIndexEntry{Getting started with epidist}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r setup, include=FALSE}
set.seed(2)

knitr::opts_chunk$set(
fig.path = "figures/epidist-",
cache = TRUE,
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE,
error = FALSE
)
```

# Background

The `epidist` package uses Bayesian inference to estimate delay distributions and other quantities.
Doing Bayesian inference amounts to approximating the posterior distribution of each parameter in the statistical model^[We are currently developing a vignette explaining the statistical model in more detail!].
A range of methods exist to perform this approximation.

By default, `epidist` uses the No-U-Turn Sampler [NUTS; @hoffman2014no] Hamiltonian Monte Carlo (HMC) algorithm.
NUTS is an example of a broader class of Markov chain Monte Carlo (MCMC) methods.
These methods work by simulating from a Markov chain with the target posterior distribution as its stationary distribution.
When MCMC algorithms are run for sufficiently many iterations, and have reached convergence, the samples can be treated as being drawn from the posterior distribution.
Relevant posterior quantities such as expectations may then be computed using these samples.
A drawback of MCMC methods, like NUTS, is that simulations can be quite computational intensive, especially for complex models or large data.

The `epidist` package is built using `brms` [@brms], which stands for "Bayesian Regression Models using Stan", where Stan [@carpenter2017stan] is a probabilistic programming language.
Although NUTS the the primary inference algorithm used in Stan, additional options are available.
These additional inference algorithms are also available in `epidist` due to its dependence on `brms`.

In this vignette, we first briefly describe the alternative algorithms available (Section \@ref(other)) as well as directing you to more detailed resources.
Then (Section \@ref(demo)) we demonstrate their application to fitting simulated data, before extracting and comparing posterior distributions.
By comparing the resulting inferences to those from NUTS, we hope to help you make informed decisions about which algorithm to use in your applied problem.

# Alternative approximate inference algorithms {#other}

Here we describe three alternative approximate Bayesian inference algorithms that are available to use in `brms`, and therefore also available in `epidist`.
It's worth noting that further inference algorithms may have become available since this vignette was last updated.
Please check `brms` package updates if interested!

## Laplace method

The Laplace method approximates a posterior distribution by a Gaussian distribution centered (by default) at the posterior mode.
In Stan, the Gaussian approximation is constructed on the unconstrained parameter space (as the domain of a Gaussian distribution is the real line).
Samples from the Gaussian approximation may then be transformed to the constrained parameter space.
To access the Laplace method, specify `algorithm = "laplace"` within `brms::brm`.
See the section [Laplace Sampling](https://mc-stan.org/docs/cmdstan-guide/laplace_sample_config.html) of the `CmdStan` User's Guide for more information.

## Variational inference using ADVI

Automatic differentiation variational inference [ADVI; @kucukelbir2017advi] is a type of variational inference [VI; @blei2017variational] algorithm.
VI works by restricting to a family of distributions, and then selecting the member of that family which is the most similar to the posterior distribution.
Most commonly, and in Stan, (dis-)similarity is measured using the Kullback–Leibler (KL) divergence.
There are two options for the family of distributions, either a fully factorised Gaussian with `algorithm = "meanfield"` or a Gaussian with a full-rank covariance matrix with `algorithm = "fullrank"`.
See the section [Variational Inference using ADVI](https://mc-stan.org/docs/cmdstan-guide/variational_config.html) of the `CmdStan` User's Guide for more information.

## Pathfinder

Pathfinder is a method closely related to variational inference, which has been more recently developed by @zhang2022pathfinder.
It works by generating Gaussian approximations along each step of an iterative optimisation algorithm (such as L-BFGS).
The KL divergence from each approximation to the posterior is measured, with the best approximation chosen.
Pareto smoothed importance sampling [PSIS; @vehtari2015pareto] is optionally used to resample draws from the chosen Gaussian distribution.
When multiple paths are specified (using `num_paths`) then the Pathfinder algorithm is run multiple times, initialising the optimisation at different points.
The resulting approximation is a mixture of Gaussian distributions, rather than a single Gaussian distribution.
See the section [Pathfinder Method for Approximate Bayesian Inference](https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html) of the `CmdStan` User's Guide for more information.

# Demonstration {#demo}

In this demonstration, we use `epidist` alongside `ggplot2` and `dplyr`:

```{r load-requirements}
library(epidist)
library(ggplot2)
library(dplyr)
```

First, we begin by simulating data.
The example data simulation process follows that used in the [Getting started with epidist](https://epidist.epinowcast.org/articles/epidist.html#data) vignette, so we will not detail exactly what is happening here, but please consult that vignette if interested:

```{r}
meanlog <- 1.8
sdlog <- 0.5
obs_time <- 25
sample_size <- 200

obs_cens_trunc <- simulate_gillespie(seed = 101) |>
simulate_secondary(
meanlog = meanlog,
sdlog = sdlog
) |>
observe_process() |>
filter_obs_by_obs_time(obs_time = obs_time)

obs_cens_trunc_samp <-
obs_cens_trunc[sample(seq_len(.N), sample_size, replace = FALSE)]
```

We now prepare the data for fitting with the latent individual model, and perform inference with HMC:

```{r results='hide'}
data <- as_latent_individual(obs_cens_trunc_samp)

t <- proc.time()
fit_hmc <- epidist(data = data, algorithm = "sampling")
time_hmc <- proc.time() - t
```

Note that for clarity above we specify `algorithm = "sampling"`, but if you were to call `epidist(data = data)` the result would be the same since `"sampling"` (i.e. HMC) is the default value for the `algorithm` argument.

Now, we fit^[Note that in this section, and above for the MCMC, the output of the call is hidden, but if you were to call these functions yourself they would display information about the fitting procedure as it occurs] the same latent individual model using each method in Section \@ref(other).
To match the four Markov chains of length 1000 in HMC above, we then draw 4000 samples from each approximate posterior.

```{r results='hide'}
t <- proc.time()
fit_laplace <- epidist(data = data, algorithm = "laplace", draws = 4000)
time_laplace <- proc.time() - t

t <- proc.time()
fit_advi <- epidist(data = data, algorithm = "meanfield", draws = 4000)
time_advi <- proc.time() - t
```

For the Pathfinder algorithm we will set `num_paths = 1`.
Although in testing both the Laplace and ADVI methods ran without problem in all cases, we found Pathfinder often produced "Error evaluating model log probability: Non-finite gradient.".
Although a `save_single_paths` option is available, which may have allowed recovery of individual Pathfinder paths (and therefore removing faulty paths), it does not appear to be working currently^[See https://github.com/stan-dev/cmdstanr/issues/878].

```{r}
t <- proc.time()
fit_pathfinder <- epidist(
data = data, algorithm = "pathfinder", draws = 4000, num_paths = 1
)
time_pathfinder <- proc.time() - t
```

We now extract posterior distribution for the delay parameters from the fitted model for each inference method.
Thankfully, each algorithm is implemented to sample draws from the posterior distribution, and so post-processing is simple.

```{r}
fits <- list(
"HMC" = fit_hmc,
"Laplace" = fit_laplace,
"ADVI" = fit_advi,
"Pathfinder" = fit_pathfinder
)

draws <- purrr::imap(fits, function(fit, name) {
extract_lognormal_draws(fit) |>
draws_to_long() |>
filter(parameter %in% c("meanlog", "sdlog")) |>
mutate(method = as.factor(name))
})

draws <- bind_rows(draws)
```

## Comparison of parameter posterior distributions

The mean estimated value of each parameter, from each method is as follows.

```{r}
pars <- draws |>
group_by(method, parameter) |>
summarise(value = mean(value)) |>
tidyr::pivot_wider(names_from = parameter, values_from = value)

pars |>
ungroup() |>
gt::gt()
```

More comprehensively, the estimated posterior distributions are as follows.

```{r}
draws |>
filter(parameter == "meanlog") |>
ggplot(aes(x = value, fill = method)) +
geom_histogram(aes(y = ..density..)) +
scale_fill_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
facet_grid(method ~ parameter) +
theme_minimal() +
guides(fill = "none") +
labs(x = "", y = "")
```

```{r}
draws |>
filter(parameter == "sdlog") |>
ggplot(aes(x = value, fill = method)) +
geom_histogram(aes(y = ..density..)) +
scale_fill_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
facet_grid(method ~ parameter) +
theme_minimal() +
guides(fill = "none") +
labs(x = "", y = "")
```

## Comparison of resulting delay distributions

How do these different distributions on the `meanlog` and `sdlog` parameters effect the estimated delay distribution?

```{r}
purrr::pmap_df(
filter(pars), ~ tibble(
x = seq(0, 25, by = 0.1),
method = ..1, density = dlnorm(x, ..2, ..3)
)
) |>
ggplot(aes(x = x, y = density, col = method)) +
geom_line() +
scale_color_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
theme_minimal() +
labs(x = "", y = "", col = "Method")
```

## Comparison of time taken

In this example, HMC took much longer than the other methods, with Pathfinder being the fastest method.
That said, even for HMC the computation time in this case is unlikely to be prohibitive.

```{r}
times <- list(
"HMC" = time_hmc,
"Laplace" = time_laplace,
"ADVI" = time_advi,
"Pathfinder" = time_pathfinder
)

purrr::imap(times, function(time, name) {
data.frame(
"method" = name,
"time (s)" = time[["elapsed"]]
)
}) |>
bind_rows() |>
gt::gt()
```

# Conclusion

The range of alternative approximation algorithms available, and their ease of use, is an attractive feature of `brms`.
We found that these algorithms do produce reasonable approximations in far less time than HMC.
Of course, this vignette only includes one example, and a more thorough investigation would be required to make specific recommendations.
That said, currently we do not recommend use of the Pathfinder algorithm due to its unstable performance in our testing, and early stage software implementation.

## Bibliography {-}
62 changes: 61 additions & 1 deletion vignettes/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,64 @@ @Article{brms
pages = {1--28},
doi = {10.18637/jss.v080.i01},
encoding = {UTF-8},
}
}

@article{kucukelbir2017advi,
author = {Alp Kucukelbir and Dustin Tran and Rajesh Ranganath and Andrew Gelman and David M. Blei},
title = {Automatic Differentiation Variational Inference},
journal = {Journal of Machine Learning Research},
year = {2017},
volume = {18},
number = {14},
pages = {1--45},
url = {http://jmlr.org/papers/v18/16-107.html}
}

@article{zhang2022pathfinder,
author = {Lu Zhang and Bob Carpenter and Andrew Gelman and Aki Vehtari},
title = {Pathfinder: Parallel quasi-Newton variational inference},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {306},
pages = {1--49},
url = {http://jmlr.org/papers/v23/21-0889.html}
}

@article{hoffman2014no,
title={The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.},
author={Hoffman, Matthew D and Gelman, Andrew and others},
journal={J. Mach. Learn. Res.},
volume={15},
number={1},
pages={1593--1623},
year={2014}
}

@article{carpenter2017stan,
title={Stan: A probabilistic programming language},
author={Carpenter, Bob and Gelman, Andrew and Hoffman, Matthew D and Lee, Daniel and Goodrich, Ben and Betancourt, Michael and Brubaker, Marcus A and Guo, Jiqiang and Li, Peter and Riddell, Allen},
journal={Journal of statistical software},
volume={76},
year={2017},
publisher={NIH Public Access}
}

@article{blei2017variational,
title={Variational inference: A review for statisticians},
author={Blei, David M and Kucukelbir, Alp and McAuliffe, Jon D},
journal={Journal of the American statistical Association},
volume={112},
number={518},
pages={859--877},
year={2017},
publisher={Taylor \& Francis}
}

@article{vehtari2015pareto,
title={Pareto smoothed importance sampling},
author={Vehtari, Aki and Simpson, Daniel and Gelman, Andrew and Yao, Yuling and Gabry, Jonah},
journal={arXiv preprint arXiv:1507.02646},
year={2015}
}

Loading