Skip to content

Reproducible Bayesian data analysis pipelines with targets and cmdstanr

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ropensci/stantargets

Repository files navigation

stantargets

ropensci joss zenodo R Targetopia

active check codecov lint

Bayesian data analysis usually incurs long runtimes and cumbersome custom code, and the process of prototyping and deploying custom Stan models can become a daunting software engineering challenge. To ease this burden, the stantargets R package creates Stan pipelines that are concise, efficient, scalable, and tailored to the needs of Bayesian statisticians. Leveraging targets, stantargets pipelines automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom user-side code is required, and there is no need to manually configure branching, so stantargets is easier to use than targets and CmdStanR directly. stantargets can access all of cmdstanr’s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies.

Prerequisites

  1. The prerequisites of the targets R package.
  2. Basic familiarity with targets: watch minutes 7 through 40 of this video, then read this chapter of the user manual.
  3. Familiarity with Bayesian Statistics and Stan. Prior knowledge of cmdstanr helps.

How to get started

Read the stantargets introduction and simulation vignettes, and use https://docs.ropensci.org/stantargets/ as a reference while constructing your own workflows. Visit https://github.com/wlandau/stantargets-example-validation for an example project based on the simulation vignette. The example has an RStudio Cloud workspace which allows you to run the project in a web browser.

Example projects

Description Link
Validating a minimal Stan model https://github.com/wlandau/targets-stan
Using Target Markdown and stantargets to validate a Bayesian longitudinal model for clinical trial data analysis https://github.com/wlandau/rmedicine2021-pipeline

Installation

Install the GitHub development version to access the latest features and patches.

remotes::install_github("ropensci/stantargets")

The CmdStan command line interface is also required.

cmdstanr::install_cmdstan()

If you have problems installing CmdStan, please consult the installation guide of cmdstanr and the installation guide of CmdStan. Alternatively, the Stan discourse is a friendly place to ask Stan experts for help.

Usage

First, write a _targets.R file that loads your packages, defines a function to generate Stan data, and lists a pipeline of targets. The target list can call target factories like tar_stan_mcmc() as well as ordinary targets with tar_target(). The following minimal example is simple enough to contain entirely within the _targets.R file, but for larger projects, you may wish to store functions in separate files as in the targets-stan example.

# _targets.R
library(targets)
library(stantargets)

generate_data <- function() {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

list(
  tar_stan_mcmc(
    name = example,
    stan_files = "x.stan",
    data = generate_data()
  )
)

Run tar_visnetwork() to check _targets.R for correctness, then call tar_make() to run the pipeline. Access the results using tar_read(), e.g. tar_read(example_summary_x). Visit the introductory vignette to read more about this example.

How it works behind the scenes

stantargets supports specialized target factories that create ensembles of target objects for cmdstanr workflows. These target factories abstract away the details of targets and cmdstanr and make both packages easier to use. For details, please read the introductory vignette.

Help

Please first read the help guide to learn how best to ask for help.

If you have trouble using stantargets, you can ask for help in the GitHub discussions forum. Because the purpose of stantargets is to combine targets and cmdstanr, your issue may have something to do with one of the latter two packages, a dependency of targets, or Stan itself. When you troubleshoot, peel back as many layers as possible to isolate the problem. For example, if the issue comes from cmdstanr, create a reproducible example that directly invokes cmdstanr without invoking stantargets. The GitHub discussion and issue forums of those packages, as well as the Stan discourse, are great resources.

Participation

Development is a community effort, and we welcome discussion and contribution. Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Citation

citation("stantargets")
#> 
#> To cite stantargets in publications use:
#> 
#>   Landau, W. M., (2021). The stantargets R package: a workflow
#>   framework for efficient reproducible Stan-powered Bayesian data
#>   analysis pipelines. Journal of Open Source Software, 6(60), 3193,
#>   https://doi.org/10.21105/joss.03193
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {The stantargets {R} package: a workflow framework for efficient reproducible {S}tan-powered {B}ayesian data analysis pipelines},
#>     author = {William Michael Landau},
#>     journal = {Journal of Open Source Software},
#>     year = {2021},
#>     volume = {6},
#>     number = {60},
#>     pages = {3193},
#>     url = {https://doi.org/10.21105/joss.03193},
#>   }