Add sample_generative function #2983

ColCarroll · 2018-05-18T23:02:20Z

This replaces #2876, since @springcoil's recent improvements to draw_values (#2979) makes this somewhat easier. Includes a test, plus two more I had written for draw_values. Both of them actually failed without changes here.

This method is quite fast now.

springcoil

LGTM - needs an addition to the RELEASE NOTES though.

springcoil · 2018-05-18T23:11:25Z

Not for this PR - but a future PR would include some benchmarks I think. Other than that awesome work @ColCarroll

junpenglao · 2018-05-19T06:01:11Z

pymc3/sampling.py

+        elif is_transformed_name(var_name):
+            untransformed = get_untransformed_name(var_name)
+            if untransformed in data:
+                prior[var_name] = model[untransformed].transformation.forward(data[untransformed]).eval()


forward_val is the numpy version of the forward function and should be faster.

Should use .transformation.forward here.

Oops i meant .transformation.forward_val should be used here... Should be slightly faster:
model[untransformed].transformation.forward(data[untransformed]).eval() --> model[untransformed].transformation.forward_val(data[untransformed])

junpenglao · 2018-05-19T06:59:37Z

pymc3/distributions/distribution.py

+                evaluated[param_idx] = givens[param.name][1]
+            else:
+                try:  # might evaluate in a bad order,
+                    evaluated[param_idx] = _draw_value(param, point=point, givens=givens.values(), size=size)


the size args is not working yet:

from pymc3.distributions.distribution import draw_values with pm.Model() as m: p = pm.Beta('p', 1., 1.) draw_values([m['p']], size=10) [array(0.72159403)]

your previous example is still a little funny too:

X = theano.shared(np.arange(3)) with pm.Model() as m: ind = pm.Categorical('i', np.ones(3)/3) x = pm.Deterministic('X', X[ind]) prior=pm.sample_generative()

I am really good at discovering edge cases.

twiecki · 2018-05-19T12:42:38Z

pymc3/sampling.py

@@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None,
    return {k: np.asarray(v) for k, v in ppc.items()}


+def sample_generative(samples=500, model=None, vars=None, random_seed=None):


should we rename this sample_prior_predictive?

and sample_ppc should then probably be sample_posterior_predictive (it's not really doing a check).

I think this is now actually the generative function (see the test case). The prior predictive would be nice to also implement.

I looked at the test but I don't understand the difference between generative and prior predictive here.

Ugh... I think I still don't. Will work out a simple model by hand later to try to help my intuition.

springcoil · 2018-05-19T12:46:35Z

I agree with the renaming

…

On Sat, 19 May 2018, 2:43 pm Thomas Wiecki, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pymc3/sampling.py <#2983 (comment)>: > @@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None, return {k: np.asarray(v) for k, v in ppc.items()} +def sample_generative(samples=500, model=None, vars=None, random_seed=None): and sample_ppc should then probably be sample_posterior_predictive (it's not really doing a check). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2983 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiLV0bYDwo4DMtt3WzNFjxvzTuEHOks5t0BNygaJpZM4UFdN9> .

ferrine · 2018-05-20T05:18:16Z

pymc3/distributions/distribution.py

+            else:
+                try:  # might evaluate in a bad order,
+                    evaluated[param_idx] = _draw_value(param, point=point, givens=givens.values(), size=size)
+                    if isinstance(param, collections.Hashable) and named_nodes_parents.get(param):


I think that isinstance check should me replaced with checker for name

twiecki · 2018-05-22T08:45:09Z

pymc3/tests/test_sampling.py

+        pm.Normal('x_obs', mu=z, sd=1, observed=observed)
+        prior = pm.sample_generative()
+
+    assert (prior['mu'] < 90).all()


That seems a bit crude, couldn't we just test for the mean and sd?

AllenDowney · 2018-06-01T16:40:18Z

I am coming to this discussion late, but I was talking with Chris Fonnesbeck about this issue and he pointed me to this PR.

This functionality would be very helpful to me. I am working on a new volume of Think Bayes that will use PyMC, and I plan to demonstrate a development process that uses a pm.Model to generate a prior predictive distribution before running an inference. I think this is useful for both validation (model makes sense) and verification (model specification is correct).

Especially for verification, it would be ideal if generation and inference are as identical as possible. For example, I would love to be able to run

with pm.Model() as model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu)
    trace = pm.sample(1000)

To generate the prior predictives, and then

with pm.Model() as model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu, observed=6)
    trace = pm.sample(1000)

If the first one validates, it would be really hard to mess up the second one.

This API would require pm.sample to analyze the model, see that there is no data, and run sample_prior_predictive rather than any of the MCMC samplers. I assume that would not be hard, but I don't really know.

This notebook contains more examples of what I have in mind.

https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb

In those examples I use pm.sample to generate, but (you will not be surprised to hear) it doesn't work very well. @ColCarroll , since you saw the talk where I presented this notebook, you might have thoughts to add.

Thank you all!

springcoil · 2018-06-01T16:42:45Z

Fascinating idea!!!

…

On Fri, 1 Jun 2018, 6:40 pm Allen Downey, ***@***.***> wrote: I am coming to this discussion late, but I was talking with Chris Fonnesbeck about this issue and he pointed me to this PR. This functionality would be very helpful to me. I am working on a new volume of Think Bayes that will use PyMC, and I plan to demonstrate a development process that uses a pm.Model to generate a prior predictive distribution before running an inference. I think this is useful for both validation (model makes sense) and verification (model specification is correct). Especially for verification, it would be ideal if generation and inference are as identical as possible. For example, I would love to be able to run with pm.Model() as model: mu = pm.Gamma('mu', alpha, beta) goals = pm.Poisson('goals', mu) trace = pm.sample(1000) To generate the prior predictives, and then with pm.Model() as model: mu = pm.Gamma('mu', alpha, beta) goals = pm.Poisson('goals', mu, observed=6) trace = pm.sample(1000) If the first one validates, it would be really hard to mess up the second one. This API would require pm.sample to analyze the model, see that there is no data, and run sample_prior_predictive rather than any of the MCMC samplers. I assume that would not be hard, but I don't really know. This notebook contains more examples of what I have in mind. https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb In those examples I use pm.sample to generate, but (you will not be surprised to hear) it doesn't work very well. @ColCarroll <https://github.com/ColCarroll> , since you saw the talk where I presented this notebook, you might have thoughts to add. Thank you all! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2983 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiOoNyaRuR_1nXGUanaVFfORluf86ks5t4W52gaJpZM4UFdN9> .

fonnesbeck · 2018-06-01T17:18:03Z

So I wonder if sample could infer that there are no observed RVs and just do forward sampling. Can anyone think of a reason for doing MCMC without any likelihoods?

kyleabeauchamp · 2018-06-01T17:22:19Z

I think there are lots of situations where we want to sample from a joint (unobserved) distribution that we describe using the machinery of pymc---pymc allows one to describe models in terms of random variables and functions of them, which is the right level of abstraction for that kind of work.

One example: Monte Carlo based probabilistic sensitivity analysis of an economic model.

kyleabeauchamp · 2018-06-01T17:27:07Z

Another example: for performance reasons, one may sometimes apply Bayes' rule analytically with a known conjugate prior. E.g., perhaps it's computationally faster to model the posterior RVs directly, rather than include the prior and observations. In such cases, the model won't have observed / likelihood nodes.

fonnesbeck · 2018-06-01T17:44:33Z

Right, but those situations don’t require MCMC, do they? Just forward simulation using independent sampling.

kyleabeauchamp · 2018-06-01T17:47:08Z

Agreed, but my point is that such folks ought to use pymc rather than write their own model description / independent sampling code.

fonnesbeck · 2018-06-01T17:55:12Z

Yeah I think we are talking about the same thing. Following on Allen’s point, we could engineer sample such that it uses simple forward sampling when there are no likelihoods in the model, without the user having to do anything. My question was related to whether you would ever not want to do this.

ColCarroll · 2018-06-01T23:11:42Z

@AllenDowney I actually started using your notebook as a test case on the train home last night!

I think the issues brought up here with size and shape may be too hard to fix in a single go, and I think making the sort of analysis you did "just work" would be a good start (#2984 is something of a blocker), even if some edge-cases remain.

My goal for the api is more like

with pm.Model() as model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu, observed=6)

with model:
    prior = pm.sample_prior()

with model:
    trace = pm.sample()

with model:
    posterior_pred = pm.sample_ppc(trace)

which allows for model reuse. There's no reason your approach could not also be supported, though it makes me a little nervous to overload a function like that: in particular, silently ignoring arguments like tune or step.

AllenDowney · 2018-06-02T00:06:11Z

@ColCarroll

That makes sense: the parameters of sample_prior, sample, and sample_ppc are different, so it makes sense for them to be different functions. And I like the idea of separating the specification of the model from the sampling functions.

To be completely systematic, you might want four functions:

sample_prior
sample_prior_predictive
sample_posterior (synonym for sample)
sample_posterior_predictive (synonym for sample_ppc)

But maybe (1) is not necessary; if you're going to run the model forward, you might as well generate both the priors and the prior predictives.

If you like shorter names, maybe sample_prior_pred, sample_post, and sample_post_pred.

twiecki · 2018-06-06T09:23:56Z

I like this naming scheme. For completeness we should indeed add pm.sample_posterior and then make pm.sample() an alias for that function for backwards compatibility.

springcoil · 2018-06-14T19:14:39Z

I'm at a talk by Mike Betancourt tonight. Is this implemented yet.

ColCarroll · 2018-06-14T19:26:16Z

Have been making surprisingly steady progress. I'll push a more complete description of the changes when it is ready (tomorrow?), but will push something partial tonight.

Has turned into a partial rewrite of all shape handling, which runs through generate_samples. I think what I have is much cleaner now, and ironically most of what is left is cleaning up all the code that deals with bad behavior in the old/current implementation. See, for example, Multinomial._random.

I do have it so that @AllenDowney's notebook runs cleanly without "abusing" PyMC3 😄 !

AllenDowney · 2018-06-14T19:33:58Z

That's awesome. Thank you, Colin!

…

On Thu, Jun 14, 2018 at 3:26 PM, Colin ***@***.***> wrote: Have been making surprisingly steady progress. I'll push a more complete description of the changes when it is ready (tomorrow?), but will push something partial tonight. Has turned into a partial rewrite of all shape handling, which runs through generate_samples. I think what I have is much cleaner now, and ironically most of what is left is cleaning up all the code that deals with bad behavior in the old/current implementation. See, for example, Multinomial._random. I do have it so that @AllenDowney <https://github.com/AllenDowney>'s notebook runs cleanly without "abusing" PyMC3 😄 ! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2983 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABy37cTXRb_cYoyamShDFpCK5_hzX02rks5t8rjfgaJpZM4UFdN9> .

springcoil · 2018-06-14T19:34:18Z

Wicked!!!

…

On Thu, 14 Jun 2018, 8:26 pm Colin, ***@***.***> wrote: Have been making surprisingly steady progress. I'll push a more complete description of the changes when it is ready (tomorrow?), but will push something partial tonight. Has turned into a partial rewrite of all shape handling, which runs through generate_samples. I think what I have is much cleaner now, and ironically most of what is left is cleaning up all the code that deals with bad behavior in the old/current implementation. See, for example, Multinomial._random. I do have it so that @AllenDowney <https://github.com/AllenDowney>'s notebook runs cleanly without "abusing" PyMC3 😄 ! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2983 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8DiMQsRwcyib1Fi62pUCQK8ykK89Gdks5t8rjagaJpZM4UFdN9> .

ColCarroll · 2018-06-17T00:10:39Z

Oddly, I think this is nearly ready to merge. There were a few unexpected tests failing, and I think a few distributions (notably, Dirchlet) are not tested as well as they should be. In any case, I expect some edge cases to pop up after merging.

Everything is quite fast for .random and .draw_values (see the screen shot below). Indeed, this may now be the most flexible Python library for generating random numbers from a network of variables. I am a little worried that I accidentally slowed down a hot path for actual MCMC sampling, though, so checking the http://pandas.pydata.org/speed/pymc3/ after merging may also be good.

I do have a modestly embarrassing question about nomenclature, and what the prior vs prior predictive is. In the example below, would I be able to define

prior = {k: v for k, v in generated.items() if k != 'obs'}
prior_predictive = {'obs': generated['obs']}

junpenglao · 2018-06-17T06:08:57Z

My understanding is that prior predictive will return:

prior_predictive = {k: v for k, v in generated.items()}

The current output from the function is what I would expect.

twiecki · 2018-06-17T10:50:06Z

pymc3/sampling.py

@@ -25,7 +26,7 @@
 import sys
 sys.setrecursionlimit(10000)

-__all__ = ['sample', 'iter_sample', 'sample_ppc', 'sample_ppc_w', 'init_nuts']
+__all__ = ['sample', 'iter_sample', 'sample_ppc', 'sample_ppc_w', 'init_nuts', 'sample_generative']


sample_generative -> sample_prior_predictive

twiecki · 2018-06-17T10:51:46Z

Found some small things. I'm very excited about having this functionality. It definitely needs a release note as well as an example, though.

ColCarroll · 2018-06-17T10:57:47Z

Thanks for those! I'd prefer to add an example in a separate PR, if that is ok. This touches so much code in subtle ways...

AllenDowney · 2018-06-28T13:01:40Z

Thank you all for your work on this, especially @ColCarroll! Do you know when this will make it into a release (or has it already)?

…

On Mon, Jun 18, 2018 at 1:01 AM, Junpeng Lao ***@***.***> wrote: Merged #2983 <#2983>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2983 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABy37S5y87VqK85Pt936ijxQwJPs98KJks5t9zQkgaJpZM4UFdN9> .

junpenglao · 2018-06-28T13:21:36Z

It will be in the next release. @fonnesbeck @twiecki maybe we should prepare a release actually, we got quite some new features.

ColCarroll · 2018-06-28T18:10:09Z

I've been using this function and the graph function daily-ish and ironing out the bugs. Which are shape related, obviously.

twiecki · 2018-06-28T20:54:06Z

Are there any non-shape bugs?

The answer to should we do a new release is always yes.

fonnesbeck · 2018-06-29T08:25:44Z

OK, I will get a release ready over the weekend.

springcoil reviewed May 18, 2018

View reviewed changes

junpenglao reviewed May 19, 2018

View reviewed changes

junpenglao added the WIP label May 19, 2018

junpenglao reviewed May 19, 2018

View reviewed changes

twiecki reviewed May 19, 2018

View reviewed changes

ferrine reviewed May 20, 2018

View reviewed changes

twiecki reviewed May 22, 2018

View reviewed changes

ColCarroll mentioned this pull request Jun 1, 2018

Add size everywhere #2984

Merged

ColCarroll force-pushed the sample_generative branch from 6c9d38f to 626f074 Compare June 12, 2018 01:43

ColCarroll force-pushed the sample_generative branch 2 times, most recently from 22147c4 to d03ab9e Compare June 16, 2018 22:51

twiecki reviewed Jun 17, 2018

View reviewed changes

ColCarroll added 14 commits June 17, 2018 17:01

Add sample_generative function

09a5510

Resolve graph order, make tests pass

d621072

WIP, sample_generative works sometimes!

85300fc

WIP

e70b3cb

Add some tests

cfee4ae

More WIP

02f15b9

Almost there

19cb612

No f-strings

cb9afd0

Assert almost equal

79f1ee5

Fix failing test

20c7c99

Update from comments

71922f1

More comments, add release note

d1f29fa

forward_val instead of forward.eval

60a624a

Actually remove kwargs.get

c12af1a

ColCarroll force-pushed the sample_generative branch from 542e73a to c12af1a Compare June 17, 2018 21:01

junpenglao merged commit ea5cbf3 into pymc-devs:master Jun 18, 2018

junpenglao removed the request discussion label Jun 18, 2018

This was referenced Jun 18, 2018

[WIP] Allow size argument sampling from Normal distribution #2623

Closed

SMC sampling from prior stop working #3025

Closed

Port to use pymc3.sample_prior_predictive AllenDowney/BayesMadeSimple#5

Merged

lucianopaz mentioned this pull request Sep 25, 2018

draw_values draws from marginal distributions #3210

Closed

aflaxman mentioned this pull request Jan 6, 2020

Rename from_pymc3(prior) arg to prior_predictive and add prior arg arviz-devs/arviz#597

Closed

		@@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None,
		return {k: np.asarray(v) for k, v in ppc.items()}


		def sample_generative(samples=500, model=None, vars=None, random_seed=None):

Add sample_generative function #2983

Add sample_generative function #2983

Conversation

ColCarroll commented May 18, 2018

springcoil left a comment

Choose a reason for hiding this comment

springcoil commented May 18, 2018

junpenglao May 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

springcoil commented May 19, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AllenDowney commented Jun 1, 2018

springcoil commented Jun 1, 2018 via email

fonnesbeck commented Jun 1, 2018

kyleabeauchamp commented Jun 1, 2018 • edited Loading

kyleabeauchamp commented Jun 1, 2018 • edited Loading

fonnesbeck commented Jun 1, 2018 • edited Loading

kyleabeauchamp commented Jun 1, 2018

fonnesbeck commented Jun 1, 2018

ColCarroll commented Jun 1, 2018

AllenDowney commented Jun 2, 2018

twiecki commented Jun 6, 2018

springcoil commented Jun 14, 2018

ColCarroll commented Jun 14, 2018

AllenDowney commented Jun 14, 2018 via email

springcoil commented Jun 14, 2018 via email

ColCarroll commented Jun 17, 2018

junpenglao commented Jun 17, 2018 • edited Loading

Choose a reason for hiding this comment

twiecki commented Jun 17, 2018

ColCarroll commented Jun 17, 2018

AllenDowney commented Jun 28, 2018 via email

junpenglao commented Jun 28, 2018

ColCarroll commented Jun 28, 2018

twiecki commented Jun 28, 2018

fonnesbeck commented Jun 29, 2018

junpenglao May 19, 2018 •

edited

Loading

kyleabeauchamp commented Jun 1, 2018 •

edited

Loading

kyleabeauchamp commented Jun 1, 2018 •

edited

Loading

fonnesbeck commented Jun 1, 2018 •

edited

Loading

junpenglao commented Jun 17, 2018 •

edited

Loading