Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make forward sampling functions return InferenceData #5073

Merged
merged 22 commits into from
Oct 15, 2021

Conversation

AlexAndorra
Copy link
Contributor

@AlexAndorra AlexAndorra commented Oct 13, 2021

This PR adds the possibility to return foward samples directly as an InferenceData objects, and sets this behavior as the default. These objects then just need to be extended to the original trace (which is already an InferenceData object by default in 4.0), which simplifies the workflow.

I implemented this for sample_prior_predictive and you can already review -- I would like to get your approval on the logic before going to sample_posterior_predictive and sample_posterior_predictive_w, as the implementation will be very similar.

  • Refactor sample_prior_predictive
  • Refactor sample_posterior_predictive
  • Refactor sample_posterior_predictive_w
  • Breaking changes: forward sampling functions now return an InferenceData object by default.
  • Mention in the RELEASE-NOTES.md

pymc/sampling.py Outdated Show resolved Hide resolved
pymc/sampling.py Outdated Show resolved Hide resolved
@AlexAndorra AlexAndorra changed the title Make forward sampling function return InferenceData Make forward sampling functions return InferenceData Oct 14, 2021
@AlexAndorra
Copy link
Contributor Author

This is all done and ready for (final?) review 🥳

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented Oct 14, 2021

This is weird: all the tests for sample_smc are failing... but I didn't change anything in there 🤔
Any idea why @aloctavodia ?

@codecov
Copy link

codecov bot commented Oct 14, 2021

Codecov Report

Merging #5073 (aeb7c0d) into main (b54c8af) will decrease coverage by 3.18%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5073      +/-   ##
==========================================
- Coverage   78.41%   75.22%   -3.19%     
==========================================
  Files         130       87      -43     
  Lines       24461    14134   -10327     
==========================================
- Hits        19182    10633    -8549     
+ Misses       5279     3501    -1778     
Impacted Files Coverage Δ
pymc/smc/smc.py 98.36% <ø> (ø)
pymc/sampling.py 86.81% <75.00%> (-0.81%) ⬇️
pymc/backends/ndarray.py 62.10% <0.00%> (-25.27%) ⬇️
pymc/model.py 83.18% <0.00%> (-0.96%) ⬇️
pymc/distributions/multivariate.py 71.42% <0.00%> (-0.60%) ⬇️
pymc/distributions/discrete.py 98.01% <0.00%> (-0.34%) ⬇️
pymc/distributions/distribution.py 94.47% <0.00%> (-0.15%) ⬇️
pymc/distributions/continuous.py 95.83% <0.00%> (-0.10%) ⬇️
pymc/tuning/starting.py 83.19% <0.00%> (ø)
... and 48 more

@AlexAndorra
Copy link
Contributor Author

I managed to fix all the tests, except those related to SMC

@ricardoV94
Copy link
Member

I managed to fix all the tests, except those related to SMC

SMC uses prior predictive sampling internally. It should be a simple matter of setting return_inferencedata to False there

@ricardoV94
Copy link
Member

You have a lot of unrelated changes due to your pre-commit. This seems to always happen with your setup :P (and you should revert them...)

@AlexAndorra
Copy link
Contributor Author

SMC uses prior predictive sampling internally.

Aaaaah, that's why!

You have a lot of unrelated changes due to your pre-commit

I didn't see any unrelated changes -- this is just Black doing its thing through pre-commit. And I did change quite a lot of lines (mainly tests; we use forward sampling a lot 😅 ).
Note that I deleted test_normal_scalar_idata, this is not a typo. It's because I just switched test_normal_scalar to use idata by default

@ricardoV94
Copy link
Member

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good move to add this capability!
However, I'd prefer if we didn't add an unnecessary dimension to the prior.

With these changes, what's the recommended workflow if I want to combine the MCMC, prior and posterior traces into the same InferenceData? Is there a idata.concat() or idata.swallow() method or something like that?

size=100000,
alpha=0.05,
fails=20,
dist, paramdomains, valuedomain=Domain([0]), ref_rand=None, size=100000, alpha=0.05, fails=20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexAndorra looks like you're rolling with an outdated pre-commit configuration.
Maybe a pre-commit uninstall + pre-commit install is enough to get you updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: my pymc venv is brand new and I pulled from main. I think the issue is PyCharm is running a Black from another environment than my pymc env. Is there an easy way to revert those changes though (some of them are not in dedicated commits)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know any automated way to revert them.
Next time try to stage selected ranges/lines contiously, then it's easier to spot this before committing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird: now I'm sure that the Black ran on this file is from pre-commitn and it still gives me diffs compared to main. Doesn't that mean that there was a problem with this even before my PR?

assert gen["phi"].shape == (draws,)
assert gen["y"].shape == (draws, n)
assert "thetas" in gen
assert gen.prior["phi"].shape == (1, draws)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's that additional prior dimension? "chain" doesn't make much sense for prior predictive..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's necessary for the next step: merging the prior pred object automagically with the trace and the post pred samples: xarray needs this dimension to match everything, as the two other objects have it (speaking under the control of @aloctavodia and @OriolAbril )

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented Oct 15, 2021

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

@ricardoV94: Yes, I saw them, it's just Black. But I think I know what's happening: PyCharm is probably running a Black from another venv than my pymc env! Seems hard to revert only those changes though, isn't it? Maybe easier in another PR?

@AlexAndorra
Copy link
Contributor Author

AlexAndorra commented Oct 15, 2021

With these changes, what's the recommended workflow if I want to combine the MCMC, prior and posterior traces into the same InferenceData?

@michaelosthege: I use that:

with model:
    idata = pm.sample()
    idata.extend(pm.sample_prior_predictive())
    idata.extend(pm.sample_posterior_predictive(idata))

There may be something even better, but that already works like a (py)charm 🤩

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 15, 2021

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

@ricardoV94: Yes, I saw them, it's just Black. But I think I know what's happening: PyCharm is probably running a Black from another venv than my pymc env! Seems hard to revert only those changes though, isn't it? Maybe easier in another PR?

It seems the bad changes are (all) in the test_distributions_random.py? So maybe you can revert those and redo your intended changes?

@AlexAndorra
Copy link
Contributor Author

Ooh damn, I thought the problem was coming from

[tool.black]
line-length = 100

in pyproject.toml, but it really doesn't look like it 😅 --> Reverting

@AlexAndorra
Copy link
Contributor Author

It seems the bad changes are (all) in the test_distributions_random.py? So maybe you can revert those and redo your intended changes?

Ok, I just did that and think I fixed all the tests @ricardoV94 and @michaelosthege. So, if tests pass, everything should be ready to merge 🤞

)
def test_missing(data):

with Model() as model:
x = Normal("x", 1, 1)
with pytest.warns(ImputationWarning):
y = Normal("y", x, 1, observed=data)
_ = Normal("y", x, 1, observed=data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha, just my own neurosis when I see a named but unused variable 😅

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's see if the CI passes :D

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff looks fine now.

I'm still not a fan of having an unnecessary "chain" dimension in prior predictive. InferenceData can handle constant_data without "chain"/"draw" dims just fine. But if that's a matter of what InferenceData supports, we should discuss that separately.

@AlexAndorra
Copy link
Contributor Author

CI passes, CI passes 🍾 (the failing docs are expected)

Agreed @michaelosthege. If this makes you feel any better, this is already the current behavior: a chain dim is added to the prior pred object when calling arviz.from_pymc3 🤷‍♂️

@michaelosthege
Copy link
Member

If this makes you feel any better, this is already the current behavior: a chain dim is added to the prior pred object when calling arviz.from_pymc3 🤷‍♂️

😬

@michaelosthege michaelosthege merged commit ce447cc into main Oct 15, 2021
@michaelosthege michaelosthege deleted the forward-sampling-idata branch October 15, 2021 13:18
@OriolAbril
Copy link
Member

To add a bit of extra info to Alex's comment:

with model:
    idata = pm.sample()
    idata.extend(pm.sample_prior_predictive())
    idata.extend(pm.sample_posterior_predictive(idata))

The sampling and extends are "commutative". That is, this would also work

with model:
    idata = pm.sample_prior_predictive()
    idata.extend(pm.sample())
    idata.extend(pm.sample_posterior_predictive(idata))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants