Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bayes Factor plot via Savage-Dickey density ratio #2037

Merged
merged 16 commits into from
Nov 2, 2022
67 changes: 67 additions & 0 deletions arviz/plots/bf_plot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Plotting and reporting Bayes Factor given idata, var name, prior distribution and reference value
from scipy import stats
import matplotlib.pyplot as plt
import numpy as np
import logging
orduek marked this conversation as resolved.
Show resolved Hide resolved
#
orduek marked this conversation as resolved.
Show resolved Hide resolved
_log = logging.getLogger(__name__)

def plot_bf(idata, var_name, prior, family = 'normal', ref_val=0, xlim=None, ax=None):
orduek marked this conversation as resolved.
Show resolved Hide resolved
"""
orduek marked this conversation as resolved.
Show resolved Hide resolved
Bayes Factor approximated as the Savage-Dickey density ratio.
The Bayes factor is estimated by comparing a model
against a model in which the parameter of interest has been restricted to a point-null.

:idata: The "trace" of model, after sampling the
:var_name: [str] Name of variable we want to test.
:prior: In case we want to use diffent prior (for sensitivity analysis of BF), we can define one and sent it to the function.
:family: for now, supports only the normal distribution.
:ref_val: reference value for BF testing
:xlim: limit the x axis (might be used for visualization porpuses sometimes)

# grab trace, a variable name to compute difference and prior.
# ref_val is the parameter we want to compare
# test some elemtns
# varName should be string
if not isinstance(var_name, str):
print('varName is not a string')
# BFs based on density estimation (using kernel smoothing instead of spline)
orduek marked this conversation as resolved.
Show resolved Hide resolved
"""
post = extract(idata, var_names=var_name)
if prior is None:
# grab prior from the data in case it wasn't defined by the user
prior = extract(idata, var_names=var_name, group="prior")
if post.ndim > 1:
print("Posterior distribution has {post.ndim} dimensions")
# generate vector
if xlim is None:
x = np.linspace(np.min(prior), np.max(prior),prior.shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to just have a "high" number of points, like 5000, 10000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably be very similar, but why not use the prior dimensions? to save resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to fill the plot evenly with points and this method either needs many points or some density based sampling.

Common images are happy with 1000+ points

There could be problems with 100k points or if prior point count is low and concentrated in one location with huge tails

else:
x = np.linspace(xlim[0], xlim[1],prior.shape[0])
#x = np.linspace(np.min(post), np.max(post),prior.shape[0])
orduek marked this conversation as resolved.
Show resolved Hide resolved
my_pdf = stats.gaussian_kde(post)
prior_pdf = stats.gaussian_kde(prior)
if ax is None:
fig, ax = plt.subplots()
ax.plot(
x, my_pdf(x), "--", lw=2.5, alpha=0.6, label="Posterior"
) # distribution function
ax.plot(x, prior_pdf(x), "r-", lw=2.5, alpha=0.6, label="Prior")
orduek marked this conversation as resolved.
Show resolved Hide resolved
if ref_val>np.max(post) | ref_val<np.min(post):
print('Reference value is out of bounds of posterior')
orduek marked this conversation as resolved.
Show resolved Hide resolved
else:
posterior = my_pdf(ref_val) # this gives the pdf at ref_val
prior = prior_pdf(ref_val)
BF10 = posterior / prior
BF01 = prior / posterior
_log.warning("the Bayes Factor 10 is %.3f" % (BF10)"
orduek marked this conversation as resolved.
Show resolved Hide resolved
("the Bayes Factor 01 is %.3f" % (BF01)
)
ax.plot(ref_val, posterior, "ko", lw=1.5, alpha=1)
ax.plot(ref_val, prior, "ko", lw=1.5, alpha=1)
orduek marked this conversation as resolved.
Show resolved Hide resolved
ax.set_xlabel(var_name)
ax.set_ylabel("Density")
plt.legend()

return {'BF10': BF10, 'BF01':BF01}, ax
# end