Make forward sampling functions return `InferenceData` #5073

AlexAndorra · 2021-10-13T16:19:26Z

This PR adds the possibility to return foward samples directly as an InferenceData objects, and sets this behavior as the default. These objects then just need to be extended to the original trace (which is already an InferenceData object by default in 4.0), which simplifies the workflow.

I implemented this for sample_prior_predictive and you can already review -- I would like to get your approval on the logic before going to sample_posterior_predictive and sample_posterior_predictive_w, as the implementation will be very similar.

Refactor sample_prior_predictive
Refactor sample_posterior_predictive
Refactor sample_posterior_predictive_w
Breaking changes: forward sampling functions now return an InferenceData object by default.
Mention in the RELEASE-NOTES.md

pymc/sampling.py

Co-authored-by: Osvaldo Martin <[email protected]>

AlexAndorra · 2021-10-14T11:47:54Z

This is all done and ready for (final?) review 🥳

AlexAndorra · 2021-10-14T13:38:15Z

This is weird: all the tests for sample_smc are failing... but I didn't change anything in there 🤔
Any idea why @aloctavodia ?

codecov · 2021-10-14T16:37:25Z

Codecov Report

Merging #5073 (aeb7c0d) into main (b54c8af) will decrease coverage by 3.18%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             main    #5073      +/-   ##
==========================================
- Coverage   78.41%   75.22%   -3.19%     
==========================================
  Files         130       87      -43     
  Lines       24461    14134   -10327     
==========================================
- Hits        19182    10633    -8549     
+ Misses       5279     3501    -1778

Impacted Files	Coverage Δ
pymc/smc/smc.py	`98.36% <ø> (ø)`
pymc/sampling.py	`86.81% <75.00%> (-0.81%)`	⬇️
pymc/backends/ndarray.py	`62.10% <0.00%> (-25.27%)`	⬇️
pymc/model.py	`83.18% <0.00%> (-0.96%)`	⬇️
pymc/distributions/multivariate.py	`71.42% <0.00%> (-0.60%)`	⬇️
pymc/distributions/discrete.py	`98.01% <0.00%> (-0.34%)`	⬇️
pymc/distributions/distribution.py	`94.47% <0.00%> (-0.15%)`	⬇️
pymc/distributions/continuous.py	`95.83% <0.00%> (-0.10%)`	⬇️
pymc/tuning/starting.py	`83.19% <0.00%> (ø)`
... and 48 more

AlexAndorra · 2021-10-14T16:58:37Z

I managed to fix all the tests, except those related to SMC

ricardoV94 · 2021-10-14T17:13:22Z

I managed to fix all the tests, except those related to SMC

SMC uses prior predictive sampling internally. It should be a simple matter of setting return_inferencedata to False there

ricardoV94 · 2021-10-14T17:14:57Z

You have a lot of unrelated changes due to your pre-commit. This seems to always happen with your setup :P (and you should revert them...)

AlexAndorra · 2021-10-14T18:12:52Z

SMC uses prior predictive sampling internally.

Aaaaah, that's why!

You have a lot of unrelated changes due to your pre-commit

I didn't see any unrelated changes -- this is just Black doing its thing through pre-commit. And I did change quite a lot of lines (mainly tests; we use forward sampling a lot 😅 ).
Note that I deleted test_normal_scalar_idata, this is not a typo. It's because I just switched test_normal_scalar to use idata by default

ricardoV94 · 2021-10-14T19:09:47Z

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

michaelosthege

Good move to add this capability!
However, I'd prefer if we didn't add an unnecessary dimension to the prior.

With these changes, what's the recommended workflow if I want to combine the MCMC, prior and posterior traces into the same InferenceData? Is there a idata.concat() or idata.swallow() method or something like that?

michaelosthege · 2021-10-14T19:09:08Z

pymc/tests/test_distributions_random.py

-    size=100000,
-    alpha=0.05,
-    fails=20,
+    dist, paramdomains, valuedomain=Domain([0]), ref_rand=None, size=100000, alpha=0.05, fails=20


@AlexAndorra looks like you're rolling with an outdated pre-commit configuration.
Maybe a pre-commit uninstall + pre-commit install is enough to get you updated?

I don't think so: my pymc venv is brand new and I pulled from main. I think the issue is PyCharm is running a Black from another environment than my pymc env. Is there an easy way to revert those changes though (some of them are not in dedicated commits)?

I don't know any automated way to revert them.
Next time try to stage selected ranges/lines contiously, then it's easier to spot this before committing?

This is weird: now I'm sure that the Black ran on this file is from pre-commitn and it still gives me diffs compared to main. Doesn't that mean that there was a problem with this even before my PR?

michaelosthege · 2021-10-14T19:20:43Z

pymc/tests/test_sampling.py

-        assert gen["phi"].shape == (draws,)
-        assert gen["y"].shape == (draws, n)
-        assert "thetas" in gen
+        assert gen.prior["phi"].shape == (1, draws)


What's that additional prior dimension? "chain" doesn't make much sense for prior predictive..

I think it's necessary for the next step: merging the prior pred object automagically with the trace and the post pred samples: xarray needs this dimension to match everything, as the two other objects have it (speaking under the control of @aloctavodia and @OriolAbril )

AlexAndorra · 2021-10-15T08:14:02Z

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

@ricardoV94: Yes, I saw them, it's just Black. But I think I know what's happening: PyCharm is probably running a Black from another venv than my pymc env! Seems hard to revert only those changes though, isn't it? Maybe easier in another PR?

AlexAndorra · 2021-10-15T08:19:27Z

With these changes, what's the recommended workflow if I want to combine the MCMC, prior and posterior traces into the same InferenceData?

@michaelosthege: I use that:

with model:
    idata = pm.sample()
    idata.extend(pm.sample_prior_predictive())
    idata.extend(pm.sample_posterior_predictive(idata))

There may be something even better, but that already works like a (py)charm 🤩

ricardoV94 · 2021-10-15T09:40:12Z

Look at test_distributions_random.py, it's full of black changes in lines that were not modified

@ricardoV94: Yes, I saw them, it's just Black. But I think I know what's happening: PyCharm is probably running a Black from another venv than my pymc env! Seems hard to revert only those changes though, isn't it? Maybe easier in another PR?

It seems the bad changes are (all) in the test_distributions_random.py? So maybe you can revert those and redo your intended changes?

AlexAndorra · 2021-10-15T11:50:28Z

Ooh damn, I thought the problem was coming from

[tool.black]
line-length = 100

in pyproject.toml, but it really doesn't look like it 😅 --> Reverting

AlexAndorra · 2021-10-15T12:11:06Z

It seems the bad changes are (all) in the test_distributions_random.py? So maybe you can revert those and redo your intended changes?

Ok, I just did that and think I fixed all the tests @ricardoV94 and @michaelosthege. So, if tests pass, everything should be ready to merge 🤞

ricardoV94 · 2021-10-15T12:28:26Z

pymc/tests/test_missing.py

 )
 def test_missing(data):

    with Model() as model:
        x = Normal("x", 1, 1)
        with pytest.warns(ImputationWarning):
-            y = Normal("y", x, 1, observed=data)
+            _ = Normal("y", x, 1, observed=data)


Why this change?

Ha ha, just my own neurosis when I see a named but unused variable 😅

ricardoV94

Looks good, let's see if the CI passes :D

michaelosthege

The diff looks fine now.

I'm still not a fan of having an unnecessary "chain" dimension in prior predictive. InferenceData can handle constant_data without "chain"/"draw" dims just fine. But if that's a matter of what InferenceData supports, we should discuss that separately.

AlexAndorra · 2021-10-15T13:11:45Z

CI passes, CI passes 🍾 (the failing docs are expected)

Agreed @michaelosthege. If this makes you feel any better, this is already the current behavior: a chain dim is added to the prior pred object when calling arviz.from_pymc3 🤷‍♂️

michaelosthege · 2021-10-15T13:15:53Z

If this makes you feel any better, this is already the current behavior: a chain dim is added to the prior pred object when calling arviz.from_pymc3 🤷‍♂️

😬

OriolAbril · 2021-10-15T15:38:40Z

To add a bit of extra info to Alex's comment:

with model:
    idata = pm.sample()
    idata.extend(pm.sample_prior_predictive())
    idata.extend(pm.sample_posterior_predictive(idata))

The sampling and extends are "commutative". That is, this would also work

with model:
    idata = pm.sample_prior_predictive()
    idata.extend(pm.sample())
    idata.extend(pm.sample_posterior_predictive(idata))

Make prior pred return inference data

924412f

AlexAndorra added enhancements v4 labels Oct 13, 2021

AlexAndorra requested a review from aloctavodia October 13, 2021 16:19

AlexAndorra self-assigned this Oct 13, 2021

AlexAndorra requested a review from OriolAbril October 13, 2021 16:19

ricardoV94 reviewed Oct 13, 2021

View reviewed changes

pymc/sampling.py Outdated Show resolved Hide resolved

aloctavodia reviewed Oct 14, 2021

View reviewed changes

pymc/sampling.py Outdated Show resolved Hide resolved

AlexAndorra changed the title ~~Make forward sampling function return InferenceData~~ Make forward sampling functions return InferenceData Oct 14, 2021

AlexAndorra and others added 5 commits October 14, 2021 11:34

Add Osvaldo's suggestion

ca26134

Co-authored-by: Osvaldo Martin <[email protected]>

Set return_inferencedata to True by default instead of None

cf71383

Make posterior pred return inference data

eb7c40c

Make posterior_predictive_w return inference data

991fe04

Add release note

772f454

AlexAndorra added 2 commits October 14, 2021 15:27

black

84ab2e8

Fix tests missing

8942206

AlexAndorra added 2 commits October 14, 2021 15:38

Fix tests shared

7b396ac

Fix tests distributions random

e33a813

AlexAndorra requested a review from aloctavodia October 14, 2021 14:42

AlexAndorra added 3 commits October 14, 2021 17:18

Fix tests distributions

cd4ce13

black

fbda63e

Fix tests idata conversion

826c602

michaelosthege reviewed Oct 14, 2021

View reviewed changes

AlexAndorra added 2 commits October 15, 2021 10:47

black on smc

59d3475

Fix tests SMC

9f8af4d

AlexAndorra added 4 commits October 15, 2021 13:25

Black on test_sampling

ef7a2e6

Reinstate test_normal_scalar_idata

44d3730

Remove Black line length 100 in pyproject.toml

4eea15c

Black on test_distributions_random

7ad4586

AlexAndorra added 3 commits October 15, 2021 13:52

Reinstate Black line length 100 in pyproject

3d17a8a

Revert test_distributions_random to version main

56cea87

Fix test_distributions_random for idata forward sampling

aeb7c0d

ricardoV94 reviewed Oct 15, 2021

View reviewed changes

ricardoV94 approved these changes Oct 15, 2021

View reviewed changes

michaelosthege approved these changes Oct 15, 2021

View reviewed changes

michaelosthege merged commit ce447cc into main Oct 15, 2021

michaelosthege deleted the forward-sampling-idata branch October 15, 2021 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make forward sampling functions return `InferenceData` #5073

Make forward sampling functions return `InferenceData` #5073

AlexAndorra commented Oct 13, 2021 •

edited

Loading

AlexAndorra commented Oct 14, 2021

AlexAndorra commented Oct 14, 2021 •

edited

Loading

codecov bot commented Oct 14, 2021 •

edited

Loading

AlexAndorra commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

AlexAndorra commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

michaelosthege left a comment

michaelosthege Oct 14, 2021

AlexAndorra Oct 15, 2021

michaelosthege Oct 15, 2021

AlexAndorra Oct 15, 2021

michaelosthege Oct 14, 2021

AlexAndorra Oct 15, 2021

AlexAndorra commented Oct 15, 2021 •

edited

Loading

AlexAndorra commented Oct 15, 2021 •

edited

Loading

ricardoV94 commented Oct 15, 2021 •

edited

Loading

AlexAndorra commented Oct 15, 2021

AlexAndorra commented Oct 15, 2021

ricardoV94 Oct 15, 2021

AlexAndorra Oct 15, 2021

ricardoV94 left a comment

michaelosthege left a comment

AlexAndorra commented Oct 15, 2021

michaelosthege commented Oct 15, 2021

OriolAbril commented Oct 15, 2021

Make forward sampling functions return InferenceData #5073

Make forward sampling functions return InferenceData #5073

Conversation

AlexAndorra commented Oct 13, 2021 • edited Loading

AlexAndorra commented Oct 14, 2021

AlexAndorra commented Oct 14, 2021 • edited Loading

codecov bot commented Oct 14, 2021 • edited Loading

Codecov Report

AlexAndorra commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

AlexAndorra commented Oct 14, 2021

ricardoV94 commented Oct 14, 2021

michaelosthege left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexAndorra commented Oct 15, 2021 • edited Loading

AlexAndorra commented Oct 15, 2021 • edited Loading

ricardoV94 commented Oct 15, 2021 • edited Loading

AlexAndorra commented Oct 15, 2021

AlexAndorra commented Oct 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

michaelosthege left a comment

Choose a reason for hiding this comment

AlexAndorra commented Oct 15, 2021

michaelosthege commented Oct 15, 2021

OriolAbril commented Oct 15, 2021

Make forward sampling functions return `InferenceData` #5073

Make forward sampling functions return `InferenceData` #5073

AlexAndorra commented Oct 13, 2021 •

edited

Loading

AlexAndorra commented Oct 14, 2021 •

edited

Loading

codecov bot commented Oct 14, 2021 •

edited

Loading

AlexAndorra commented Oct 15, 2021 •

edited

Loading

AlexAndorra commented Oct 15, 2021 •

edited

Loading

ricardoV94 commented Oct 15, 2021 •

edited

Loading