-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample_generative function #2983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - needs an addition to the RELEASE NOTES though.
Not for this PR - but a future PR would include some benchmarks I think. Other than that awesome work @ColCarroll |
pymc3/sampling.py
Outdated
elif is_transformed_name(var_name): | ||
untransformed = get_untransformed_name(var_name) | ||
if untransformed in data: | ||
prior[var_name] = model[untransformed].transformation.forward(data[untransformed]).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forward_val
is the numpy version of the forward function and should be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use .transformation.forward
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops i meant .transformation.forward_val
should be used here... Should be slightly faster:
model[untransformed].transformation.forward(data[untransformed]).eval()
--> model[untransformed].transformation.forward_val(data[untransformed])
pymc3/distributions/distribution.py
Outdated
evaluated[param_idx] = givens[param.name][1] | ||
else: | ||
try: # might evaluate in a bad order, | ||
evaluated[param_idx] = _draw_value(param, point=point, givens=givens.values(), size=size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the size
args is not working yet:
from pymc3.distributions.distribution import draw_values
with pm.Model() as m:
p = pm.Beta('p', 1., 1.)
draw_values([m['p']], size=10)
[array(0.72159403)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your previous example is still a little funny too:
X = theano.shared(np.arange(3))
with pm.Model() as m:
ind = pm.Categorical('i', np.ones(3)/3)
x = pm.Deterministic('X', X[ind])
prior=pm.sample_generative()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really good at discovering edge cases.
pymc3/sampling.py
Outdated
@@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None, | |||
return {k: np.asarray(v) for k, v in ppc.items()} | |||
|
|||
|
|||
def sample_generative(samples=500, model=None, vars=None, random_seed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename this sample_prior_predictive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and sample_ppc
should then probably be sample_posterior_predictive
(it's not really doing a check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now actually the generative function (see the test case). The prior predictive would be nice to also implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the test but I don't understand the difference between generative and prior predictive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh... I think I still don't. Will work out a simple model by hand later to try to help my intuition.
I agree with the renaming
…On Sat, 19 May 2018, 2:43 pm Thomas Wiecki, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pymc3/sampling.py
<#2983 (comment)>:
> @@ -1208,6 +1209,49 @@ def sample_ppc_w(traces, samples=None, models=None, weights=None,
return {k: np.asarray(v) for k, v in ppc.items()}
+def sample_generative(samples=500, model=None, vars=None, random_seed=None):
and sample_ppc should then probably be sample_posterior_predictive (it's
not really doing a check).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiLV0bYDwo4DMtt3WzNFjxvzTuEHOks5t0BNygaJpZM4UFdN9>
.
|
else: | ||
try: # might evaluate in a bad order, | ||
evaluated[param_idx] = _draw_value(param, point=point, givens=givens.values(), size=size) | ||
if isinstance(param, collections.Hashable) and named_nodes_parents.get(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that isinstance check should me replaced with checker for name
pymc3/tests/test_sampling.py
Outdated
pm.Normal('x_obs', mu=z, sd=1, observed=observed) | ||
prior = pm.sample_generative() | ||
|
||
assert (prior['mu'] < 90).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems a bit crude, couldn't we just test for the mean and sd?
I am coming to this discussion late, but I was talking with Chris Fonnesbeck about this issue and he pointed me to this PR. This functionality would be very helpful to me. I am working on a new volume of Think Bayes that will use PyMC, and I plan to demonstrate a development process that uses a Especially for verification, it would be ideal if generation and inference are as identical as possible. For example, I would love to be able to run
To generate the prior predictives, and then
If the first one validates, it would be really hard to mess up the second one. This API would require This notebook contains more examples of what I have in mind. https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb In those examples I use Thank you all! |
Fascinating idea!!!
…On Fri, 1 Jun 2018, 6:40 pm Allen Downey, ***@***.***> wrote:
I am coming to this discussion late, but I was talking with Chris
Fonnesbeck about this issue and he pointed me to this PR.
This functionality would be very helpful to me. I am working on a new
volume of Think Bayes that will use PyMC, and I plan to demonstrate a
development process that uses a pm.Model to generate a prior predictive
distribution before running an inference. I think this is useful for both
validation (model makes sense) and verification (model specification is
correct).
Especially for verification, it would be ideal if generation and inference
are as identical as possible. For example, I would love to be able to run
with pm.Model() as model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu)
trace = pm.sample(1000)
To generate the prior predictives, and then
with pm.Model() as model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu, observed=6)
trace = pm.sample(1000)
If the first one validates, it would be really hard to mess up the second
one.
This API would require pm.sample to analyze the model, see that there is
no data, and run sample_prior_predictive rather than any of the MCMC
samplers. I assume that would not be hard, but I don't really know.
This notebook contains more examples of what I have in mind.
https://github.com/AllenDowney/BayesMadeSimple/blob/master/zigzag.ipynb
In those examples I use pm.sample to generate, but (you will not be
surprised to hear) it doesn't work very well. @ColCarroll
<https://github.com/ColCarroll> , since you saw the talk where I
presented this notebook, you might have thoughts to add.
Thank you all!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiOoNyaRuR_1nXGUanaVFfORluf86ks5t4W52gaJpZM4UFdN9>
.
|
So I wonder if |
I think there are lots of situations where we want to sample from a joint (unobserved) distribution that we describe using the machinery of pymc---pymc allows one to describe models in terms of random variables and functions of them, which is the right level of abstraction for that kind of work. One example: Monte Carlo based probabilistic sensitivity analysis of an economic model. |
Another example: for performance reasons, one may sometimes apply Bayes' rule analytically with a known conjugate prior. E.g., perhaps it's computationally faster to model the posterior RVs directly, rather than include the prior and observations. In such cases, the model won't have observed / likelihood nodes. |
Right, but those situations don’t require MCMC, do they? Just forward simulation using independent sampling. |
Agreed, but my point is that such folks ought to use pymc rather than write their own model description / independent sampling code. |
Yeah I think we are talking about the same thing. Following on Allen’s point, we could engineer |
@AllenDowney I actually started using your notebook as a test case on the train home last night! I think the issues brought up here with My goal for the api is more like
which allows for model reuse. There's no reason your approach could not also be supported, though it makes me a little nervous to overload a function like that: in particular, silently ignoring arguments like |
That makes sense: the parameters of To be completely systematic, you might want four functions:
But maybe (1) is not necessary; if you're going to run the model forward, you might as well generate both the priors and the prior predictives. If you like shorter names, maybe |
I like this naming scheme. For completeness we should indeed add |
6c9d38f
to
626f074
Compare
I'm at a talk by Mike Betancourt tonight. Is this implemented yet. |
Have been making surprisingly steady progress. I'll push a more complete description of the changes when it is ready (tomorrow?), but will push something partial tonight. Has turned into a partial rewrite of all shape handling, which runs through I do have it so that @AllenDowney's notebook runs cleanly without "abusing" PyMC3 😄 ! |
That's awesome. Thank you, Colin!
…On Thu, Jun 14, 2018 at 3:26 PM, Colin ***@***.***> wrote:
Have been making surprisingly steady progress. I'll push a more complete
description of the changes when it is ready (tomorrow?), but will push
something partial tonight.
Has turned into a partial rewrite of all shape handling, which runs
through generate_samples. I think what I have is much cleaner now, and
ironically most of what is left is cleaning up all the code that deals with
bad behavior in the old/current implementation. See, for example,
Multinomial._random.
I do have it so that @AllenDowney <https://github.com/AllenDowney>'s
notebook runs cleanly without "abusing" PyMC3 😄 !
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABy37cTXRb_cYoyamShDFpCK5_hzX02rks5t8rjfgaJpZM4UFdN9>
.
|
Wicked!!!
…On Thu, 14 Jun 2018, 8:26 pm Colin, ***@***.***> wrote:
Have been making surprisingly steady progress. I'll push a more complete
description of the changes when it is ready (tomorrow?), but will push
something partial tonight.
Has turned into a partial rewrite of all shape handling, which runs
through generate_samples. I think what I have is much cleaner now, and
ironically most of what is left is cleaning up all the code that deals with
bad behavior in the old/current implementation. See, for example,
Multinomial._random.
I do have it so that @AllenDowney <https://github.com/AllenDowney>'s
notebook runs cleanly without "abusing" PyMC3 😄 !
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA8DiMQsRwcyib1Fi62pUCQK8ykK89Gdks5t8rjagaJpZM4UFdN9>
.
|
22147c4
to
d03ab9e
Compare
Oddly, I think this is nearly ready to merge. There were a few unexpected tests failing, and I think a few distributions (notably, Everything is quite fast for I do have a modestly embarrassing question about nomenclature, and what the prior vs prior predictive is. In the example below, would I be able to define
|
My understanding is that prior predictive will return:
The current output from the function is what I would expect. |
pymc3/sampling.py
Outdated
@@ -25,7 +26,7 @@ | |||
import sys | |||
sys.setrecursionlimit(10000) | |||
|
|||
__all__ = ['sample', 'iter_sample', 'sample_ppc', 'sample_ppc_w', 'init_nuts'] | |||
__all__ = ['sample', 'iter_sample', 'sample_ppc', 'sample_ppc_w', 'init_nuts', 'sample_generative'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_generative -> sample_prior_predictive
Found some small things. I'm very excited about having this functionality. It definitely needs a release note as well as an example, though. |
Thanks for those! I'd prefer to add an example in a separate PR, if that is ok. This touches so much code in subtle ways... |
542e73a
to
c12af1a
Compare
Thank you all for your work on this, especially @ColCarroll!
Do you know when this will make it into a release (or has it already)?
…On Mon, Jun 18, 2018 at 1:01 AM, Junpeng Lao ***@***.***> wrote:
Merged #2983 <#2983>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2983 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABy37S5y87VqK85Pt936ijxQwJPs98KJks5t9zQkgaJpZM4UFdN9>
.
|
It will be in the next release. @fonnesbeck @twiecki maybe we should prepare a release actually, we got quite some new features. |
I've been using this function and the graph function daily-ish and ironing out the bugs. Which are shape related, obviously. |
Are there any non-shape bugs? The answer to should we do a new release is always yes. |
OK, I will get a release ready over the weekend. |
This replaces #2876, since @springcoil's recent improvements to
draw_values
(#2979) makes this somewhat easier. Includes a test, plus two more I had written fordraw_values
. Both of them actually failed without changes here.This method is quite fast now.