Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bayes Factor plot via Savage-Dickey density ratio #2037

Merged
merged 16 commits into from
Nov 2, 2022
Merged

Add Bayes Factor plot via Savage-Dickey density ratio #2037

merged 16 commits into from
Nov 2, 2022

Conversation

orduek
Copy link
Contributor

@orduek orduek commented May 30, 2022

Description

As requested in issue #689 - this is an attempt to add plot_bf function, that will take prior, idata (trace) and reference value, and output a Bayes Factor + plot (please see issue #689 for details and examples)

@orduek orduek changed the title added Bayes Factor plot, issue #689 [WIP] [WIP] added Bayes Factor plot, issue #689 May 30, 2022
@orduek
Copy link
Contributor Author

orduek commented Jul 18, 2022

Is it of interest to the audience?

@OriolAbril
Copy link
Member

Hi, sorry for the very slow response. It looks like it is something that is probably useful to ArviZ users, but I had never heard about this plot before the issues was opened and haven't found the time to review the references (I am also quite slow to review PRs about things I do know about). I can leave some general comments for now and questions. Thanks for your patience.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to focus on conceptual questions, but did want to advertize the extract function. It isn't released yet, but can be safely used here and should make things like this easier.

arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@orduek orduek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to correct your comments. I hope this one is better.
Thank you and sorry for the late response.

arviz/plots/bf_plot.py Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
@orduek
Copy link
Contributor Author

orduek commented Oct 31, 2022

OK, I've fixed the issues you've raised.
Two things I'm not sure about:

  1. The logging system
  2. Should we add more distributions (I would probably consult other about it before doing so)

@orduek
Copy link
Contributor Author

orduek commented Oct 31, 2022

ok, I added logging. Let me know if you think it's ok now.
Thank you for all the guidance!

@orduek
Copy link
Contributor Author

orduek commented Oct 31, 2022

Ok, I went back to the issue conversation with @drbenvincent , the main concern was around bounded distributions (not categorical), but I'm not sure we'll have an easy solution to this one. We need some more expertise here.

@aloctavodia
Copy link
Contributor

I don't think we should have a family argument. Instead, we should be able to handle all kinds of distributions without user intervention.

I see two simple solutions

  1. Ignore boundaries and use the KDE from scipy as you are currently doing
  2. Use our KDE, which takes into account the boundaries. Our KDE implementation does not currently allow for evaluation of the pdf. But you get the densities evaluated at discrete points on a grid, so you can use those values if the grid is small enough.

@orduek
Copy link
Contributor Author

orduek commented Oct 31, 2022

I lean toward starting with the current option. Later on, I think it's worth testing & develop the second option, maybe then we'll be able to allow a "method" input to the function. What do you think?

@aloctavodia
Copy link
Contributor

yes we can leave it like this for this PR, and in the future think of a better method, not sure we will need to support more than one method.

@orduek
Copy link
Contributor Author

orduek commented Nov 1, 2022

ok, great.
Let me know what else is needed to merge it.

arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
print("Posterior distribution has {post.ndim} dimensions")
# generate vector
if xlim is None:
x = np.linspace(np.min(prior), np.max(prior),prior.shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to just have a "high" number of points, like 5000, 10000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably be very similar, but why not use the prior dimensions? to save resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to fill the plot evenly with points and this method either needs many points or some density based sampling.

Common images are happy with 1000+ points

There could be problems with 100k points or if prior point count is low and concentrated in one location with huge tails

arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
@orduek
Copy link
Contributor Author

orduek commented Nov 1, 2022

ok, I've kept two things open:

  1. Left the testing whether var_name is a string or not (otherwise, it'll fail, no?)
  2. Wether we should use 5000/10000 or the prior shape

@aloctavodia
Copy link
Contributor

We already have some logic to handle variables names see for example https://github.com/arviz-devs/arviz/blob/main/arviz/plots/autocorrplot.py#L120.

Something like 5000 points for the kde should be enough.

@orduek
Copy link
Contributor Author

orduek commented Nov 2, 2022

I've changed the size to 5000 and tried to incorporate the var_names method.
Let me know if it looks ok now

arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
arviz/plots/bf_plot.py Outdated Show resolved Hide resolved
@aloctavodia
Copy link
Contributor

After those changes I think this is ready to merge.

Co-authored-by: Osvaldo A Martin <[email protected]>
@orduek
Copy link
Contributor Author

orduek commented Nov 2, 2022

Ok, I've added some more words (what the function returns).
I also committed both of your suggestion (I'm still not sure how to work on the web with that).
Thank you!

@aloctavodia aloctavodia changed the title [WIP] added Bayes Factor plot, issue #689 Add Bayes Factor plot via Savage-Dickey density ratio Nov 2, 2022
@orduek
Copy link
Contributor Author

orduek commented Nov 2, 2022

For now, in order to use the extract function, I am using from arviz import extract, I'm sure there's a better solution

@aloctavodia
Copy link
Contributor

yes, try with a relative import.

@orduek
Copy link
Contributor Author

orduek commented Nov 2, 2022

Done. Thanks!

@aloctavodia aloctavodia merged commit 8c819d2 into arviz-devs:main Nov 2, 2022
@aloctavodia
Copy link
Contributor

thanks @orduek!

@orduek orduek deleted the bf_plot branch November 2, 2022 18:16
GStechschulte added a commit to GStechschulte/arviz that referenced this pull request Nov 3, 2022
Add Bayes Factor plot via Savage-Dickey density ratio (arviz-devs#2037)
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Mar 22, 2023
https://build.opensuse.org/request/show/1073712
by user dgarcia + dimstar_suse
- skip python 3.11 and python 3.8 because some dependencies doesn't
  support that python versions
- Update to 0.15.1
  - Fix memory usage and improve efficiency in `from_emcee`
    ([2215](arviz-devs/arviz#2215))
  - Lower pandas version needed
    ([2217](arviz-devs/arviz#2217))
- 0.15.0
  - Adds Savage-Dickey density ratio plot for Bayes factor
    approximation.
    ([2037](arviz-devs/arviz#2037),
    [2152](arviz-devs/arviz#2152))
  - Add `CmdStanPySamplingWrapper` and `PyMCSamplingWrapper` classes
    ([2158](arviz-devs/arviz#2158))
  - Changed dependency on netcdf4-python to h5netcdf
    ([2122](arviz-devs/arviz#2122))
  - Fix `reloo` outdate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants