Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation-based calibration #9

Open
sethaxen opened this issue Jun 16, 2021 · 6 comments
Open

Simulation-based calibration #9

sethaxen opened this issue Jun 16, 2021 · 6 comments

Comments

@sethaxen
Copy link
Member

sethaxen commented Jun 16, 2021

Under the name InferenceDiagnostics.jl, it might make sense to include functionality for simulation-based calibration here. https://arxiv.org/abs/1804.06788

@yebai
Copy link
Member

yebai commented Jun 16, 2021

Looks interesting! Under the philosophy of "Do one thing and do it well", I suggest that we keep this package focused on MCMCDiagnostics. Adding too many algorithms under the generic name InferenceDiagnostics might actually go against some motivations for separating these functionalities from MCMCChains.

@sethaxen
Copy link
Member Author

Yeah, SBC is also quite different than the diagnostics here in that a PPL-agnostic implementation does almost nothing, while a PPL-specific implementation would be really convenient but would essentially require a function taking over all the book-keeping of drawing prior and posterior samples. This should probably either be its own package, included in ArviZ.jl proper (e.g. if we add functions like that in arviz-devs/ArviZ.jl#133), or be provided by the PPL.

@devmotion
Copy link
Member

I don't think this has to or should be couple to a PPL. For instance, for algorithm 2 it seems one just needs to provide functions that sample from the prior, sample a dataset from the conditional distribution, and generate a Markov chain of correlated posterior samples given the data. One could either let users provide them as arguments to the diagnostic function (e.g. done in MCMCChains for dic) or use a (custom?) interface with e.g. rand_prior(rng, ::Model), rand_data(rng, ::Model, theta) and mcmc(rng, ::Model, data) (similar to how loglikelihood etc. are used in the implementation of aic etc. in StatsBase: https://github.com/JuliaStats/StatsBase.jl/blob/master/src/statmodels.jl#L202).

I'm not sure if it should be added to this package but since algorithm 2 is a diagnostic tool for MCMC it could fit even if this package focuses only on MCMC diagnostics (which I am not sure if it should - my main motivation was to separate the algorithms and diagnostics from the Chains backend but I don't think the tools have to be limited to MCMC necessarily).

@yebai
Copy link
Member

yebai commented Jun 16, 2021

I don't think the tools have to be limited to MCMC necessarily

I think it's helpful to focus on MCMC in principle, but I agree we don't have to take it too literally where beneficial exceptions are encountered.

@sethaxen
Copy link
Member Author

I don't think this has to or should be couple to a PPL. For instance, for algorithm 2 it seems one just needs to provide functions that sample from the prior, sample a dataset from the conditional distribution, and generate a Markov chain of correlated posterior samples given the data. One could either let users provide them as arguments to the diagnostic function (e.g. done in MCMCChains for dic) or use a (custom?) interface with e.g. rand_prior(rng, ::Model), rand_data(rng, ::Model, theta) and mcmc(rng, ::Model, data) (similar to how loglikelihood etc. are used in the implementation of aic etc. in StatsBase: https://github.com/JuliaStats/StatsBase.jl/blob/master/src/statmodels.jl#L202).

Yeah, I think it might be too complex compared to our other diagnostics in what it requires of the user, but it might be worth having. I'll open a draft PR in the next few weeks, and we can discuss whether it belongs here or perhaps in another package.

@ParadaCarleton
Copy link
Member

I don't think this has to or should be couple to a PPL. For instance, for algorithm 2 it seems one just needs to provide functions that sample from the prior, sample a dataset from the conditional distribution, and generate a Markov chain of correlated posterior samples given the data. One could either let users provide them as arguments to the diagnostic function (e.g. done in MCMCChains for dic) or use a (custom?) interface with e.g. rand_prior(rng, ::Model), rand_data(rng, ::Model, theta) and mcmc(rng, ::Model, data) (similar to how loglikelihood etc. are used in the implementation of aic etc. in StatsBase: https://github.com/JuliaStats/StatsBase.jl/blob/master/src/statmodels.jl#L202).

Yeah, I think it might be too complex compared to our other diagnostics in what it requires of the user, but it might be worth having. I'll open a draft PR in the next few weeks, and we can discuss whether it belongs here or perhaps in another package.

I happen to be building a completely different package for Bayesian model checking and comparison diagnostics -- it might belong there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants