Advanced usage: allow the users to deal with manually batching models #193

lucianopaz · 2019-12-19T07:57:05Z

This is a rather big and ugly PR that addresses the following issues:

Make a more adequate choice on where to add the plate axis of a Distribution.
Adds a custom tfp.distributions.Distribution subclass called Plate, that enables the above point
Enable users to specify that they wrote their model's in a vectorized manner, so that there is no need to perform automatic batching in either sample_prior_predictive, sample or sample_posterior_predictive.

1 and 2. Problems with `plate` kwarg

At the moment we were using tfd.Sample to add plate axes to a Distribution in the following way:

def __init__(self, ...):
    ...
    self._distribution = tfd.Sample(self._distribution, sample_shape=plate)
    ...

This can have some confusing side effects because Sample adds dimensions to the event_shape, and leaves the batch_shapes as they were so:

@pm.model
def model():
    mu = pm.Normal("mu", tf.zeros(2), 1)
    x = pm.Normal("x", mu, 1, plate=3)
    y = pm.MvNormal("y", mu, tf.linalg.diag(tf.ones(2)), plate=3)

Makes samples from x have shape (2, 3) instead of the expected (3, 2), because mu gets interpreted as multiple batches. However, samples from y do come out with shape (3, 2) because mu gets interpreted as defining the MvNormals event_shape.

The solution was to add a tfd.Distribution subclass Plate that's mostly copied off tfd.Sample but it adds dimensions to the batch dimensions instead of the event dimensions.

I've added a flag, plate_events to control whether to use Plate or Sample depending on the user's wishes.

3. The `use_auto_batching` flag

By far the auto batching part of this PR was the longest and it involved some API changes.

One can define a Distribution to be conditionally independent. Why would one need that? To be able to pass pm.evaluate_model a sample_shape that draws multiple samples in a vectorized way for conditionally independent nodes. The user will then have to be sure that it didn't skrew up something later in the model's flow, and that all the operations work in a vectorized way across batches. This will enable potentially faster forward sampling at the expense that the user will have to deal with all the shape problems.
One can re-interpret a Distribution's batch_shape axes as event_shapes. This enables the user to have fine grained control on how the distribution's log_prob deals with batches, and eventually lets it not depend on vectorized_map in inference.sampling.sample.

Things left out

I started out writing this with the intention of allowing users to vectorize draws from the prior, but regrettably that problem also affects how to do vectorized draws from the posterior or the posterior predictive, so in the end, I had to do all three tasks in a single big PR (sorry).
What I didn't do, and will do in january, is to write down a notebook guide to using these features.

…or_sample_shape

lucianopaz · 2019-12-19T08:01:38Z

This PR isn't linked to a particular issue because it's mostly about new features. But I think that it should close #91, because it adds all the flexibility of the three kinds of shape (event, batch and sample) with all the added problems

twiecki · 2019-12-19T10:55:46Z

pymc4/distributions/plate.py

+
+class Plate(distribution_lib.Distribution):
+    """Plate distribution via independent draws.
+  This distribution is useful for stacking collections of independent,


indent looks off

twiecki · 2019-12-19T10:56:20Z

pymc4/distributions/plate.py

+@kullback_leibler.RegisterKL(Plate, Plate)
+def _kl_sample(a, b, name="kl_sample"):
+    """Batched KL divergence `KL(a || b)` for Plate distributions.
+  We can leverage the fact that:


ferrine · 2019-12-19T11:01:45Z

pymc4/flow/executor.py

@@ -546,7 +552,12 @@ def proceed_distribution(
                # might be posterior predictive or programmatically override to exchange observed variable to latent
                if scoped_name not in state.untransformed_values:
                    # posterior predictive
-                    return_value = state.untransformed_values[scoped_name] = dist.sample()
+                    if dist.is_root:


getting less and less difference from tfd coroutines

Yeah, this PR brings us much closer to tfd coroutines. However, all of the "root" stuff and manual vectorization is optional. We do auto batching by default and the users don't have to think so much about all of the potential shape issues. So I think that we will be providing a much better UX than tfd coroutines

+1, in tfd.jointCoroutine user need to identify the root, if we can do that automatically it is a win.

Oh wait, we are doing that with conditional_independent kwarg... hmmm it is probably needed but could we find a better name for this kwarg?

I think that calling them root is really vague. I think that they should be given a name that bears a meaning in probability distributions. I could only think of conditionally independent but maybe there's a more appropriate alternative.
About identifying the root automatically, we can't do that when we define the model. We at least need to do one forward pass to build something like the probabilistic graphical model that we plot in pymc3 with model_graph.

…al chain

junpenglao

Looks good from the first glance and I think this is a very much needed. I am going to do a more in-depth review in coming days.

junpenglao · 2019-12-20T02:17:09Z

pymc4/flow/executor.py

@@ -546,7 +552,12 @@ def proceed_distribution(
                # might be posterior predictive or programmatically override to exchange observed variable to latent
                if scoped_name not in state.untransformed_values:
                    # posterior predictive
-                    return_value = state.untransformed_values[scoped_name] = dist.sample()
+                    if dist.is_root:


+1, in tfd.jointCoroutine user need to identify the root, if we can do that automatically it is a win.

pymc4/distributions/plate.py

junpenglao · 2019-12-20T02:22:23Z

pymc4/distributions/distribution.py

+        *,
+        transform=None,
+        observed=None,
+        plate=None,


One thing we should definitely clarify in the doc, is how plate/plate_events are the high level user API, while sample_shape and plate_shape is the internal (and lower level TFP) API depending on the resample/reshape distribution.

Should we still call it plate I wonder and not side with TFP's naming convention?

I think that we can't call it either batch_shape or event_shape because, with either name, it would give the impression to the user that it is setting the full batch or event shape of the distribution, instead of just appending some dimensions to either of them. I think that it needs to have a different name, and plate seems nice.

We could maybe call it something like stack_batch or stack_event because that gives the impression that you are stacking things to either the batch or event dimensions

DAG jargon like plate should be avoided, I think. It should be clear to the user what it represents. Something like dist_sample_shape? This links it to the distribution, and describes it more unambiguously.

@lucianopaz thoughts?

Sorry that I missed the notifications of the previous comments. @fonnesbeck, so we should change the names of the plate stuff.
@twiecki, I don't really like sample_shape because it will be confusing for the predictive sampling methods. At least I find that the tfd.Sample naming is very confusing and I wouldn't like to copy it in pymc4.
@junpenglao, I'm at a loss for an appropriate name for these. The correct term for the operation that we are carrying out could be either stacking or repeating. With plate and plate_events we either stack or repeat events or batches. I would call it event_stack and batch_stack, and we could then change the Plate distribution to BatchStacker. What do you guys think?

I like event_stack and batch_stack and BatchStacker. Let's go with that if no one is opposed, we will probably not find the perfect name anyway.

+1, I would be ok with plate as well actually. As long as we document it clearly even if we make a mistake user would adapt (hopefully).

Great, I'll change the names and fix the conflicts tonight so we can merge this

junpenglao · 2019-12-23T05:56:13Z

I had a bit more thought about this PR:

So first of all, I totally agree that currently our way of handling models like

@pm.model
def model(): 
    mu = yield pm.Normal('mu', tf.zeros(2), 1) 
    x = yield pm.Normal('x', mu, 1, plate=3)
    y = yield pm.MvNormal('y', mu, tf.linalg.diag(tf.ones(2)), plate=3)

value, state=pm.flow.evaluate_model(model())
for name, dist in state.distributions.items(): 
    print(name,
          'batch_shape:', dist._distribution.batch_shape, 
          'event_shape:', dist._distribution.event_shape, 
          'shape of dist.sample() -->', dist.sample().shape)
    
# model/mu batch_shape: (2,) event_shape: () shape of dist.sample() --> (2,)
# model/x batch_shape: (2,) event_shape: (3,) shape of dist.sample() --> (2, 3)
# model/y batch_shape: () event_shape: (3, 2) shape of dist.sample() --> (3, 2)

outputs confusing random variables, and pm.Plate is potentially useful for moving these kind of repeated shape to the "right" axis (right most). But I am not completely sure whether adding Plate here is right approach to solve this problem. There are a few design choices are TBD here:

are we going to introduce pm.Sample and pm.Independent which wrap tfd.Sample and tfd.Indenpendent?
The reason being that if we introduce these "meta distribution wrapper", the above problem could be solved by user wrapping independent at the right place:

@pm.model
def model_with_independent(): 
    mu = yield pm.Normal('mu', tf.zeros(2), 1) 
    x = yield pm.Independent(pm.Normal('x', mu, 1), plate=3)
    y = yield pm.MvNormal('y', mu, tf.linalg.diag(tf.ones(2)), plate=3)

Just as currently in TFP land you would do:

def model_jd():
    mu = yield root(tfd.Normal(tf.zeros(2), 1))
    x = yield tfd.Sample(tfd.Independent(tfd.Normal(mu, 1), 1), 3)
    y = yield tfd.Sample(tfd.MultivariateNormalFullCovariance(
        mu, tf.linalg.diag(tf.ones(2))), 3)


jd = tfd.JointDistributionCoroutine(model_jd)

dists, values = jd.sample_distributions()
dists

If we are not exposing pm.Independent, what are the other options?
One option I would like to entertain is whether by default treating all RVs as batch shape = () is reasonable, assuming that we are going to batch using vectorized_map always (unless explicitly turned off). In that case, the above model would give output as:

value, state=pm.flow.evaluate_model(model())
for name, dist in state.distributions.items(): 
    print(name,
          'batch_shape:', dist._distribution.batch_shape, 
          'event_shape:', dist._distribution.event_shape, 
          'shape of dist.sample() -->', dist.sample().shape)
# Junpeng's expected behavior
# model/mu batch_shape: () event_shape: (2, ) shape of dist.sample() --> (2,)
# model/x batch_shape: () event_shape: (3, 2) shape of dist.sample() --> (3, 2)
# model/y batch_shape: () event_shape: (3, 2) shape of dist.sample() --> (3, 2)

On the implementation side, this would means we wrap everything with tfd.Independent by default if they are not root. The restriction is that we will need to assume a static batch shape so that we can rewrite the reinterpreted_batch_ndims at runtime (i.e., pm.flow)

lucianopaz · 2020-01-21T22:25:17Z

Thanks @junpenglao! I totally agree with discussing the design. I like your suggestion of using plate and sample_shape instead of plate and the boolean flag that I'm using here, but I don't like the name sample_shape. It's confusing because it looks like a sample call.
I don't like the idea of adding the metadistributions pm.Independent and pm.Sample. I would much prefer that their functionality be controlled with parameter kwargs to the construction of regular distributions.
I'm quite time constrained, but I'll try to wrap up this PR to make it ready to merge in the next week.

lucianopaz · 2020-01-29T22:30:58Z

@junpenglao, this should be ready for review. In the next weeks I'll write a notebook about the advanced usage. Also, in another PR, I'll write a docstring for Distribution, but this PR has been open for some time and I would like to finish it off soonish.

junpenglao · 2020-02-07T20:15:53Z

Sorry about the delay - I will take a look this weekend.

junpenglao · 2020-02-13T07:24:29Z

@lucianopaz I dont have further comments, can you resolve the conflicts?

lucianopaz · 2020-02-13T08:20:38Z

@junpenglao, I'll resolve the conflicts tonight.

…r_sample_shape

twiecki · 2020-02-14T11:03:22Z

Looks great, thanks @lucianopaz! I'm going to merge as I think it's in a good state, we can still follow up if there are more changes we think we need.

lucianopaz and others added 6 commits November 27, 2019 15:27

[WIP] Adding sample_shape to forward sampling

7597a06

Working on a Plate distribution

6616e04

Merge branch 'master' of https://github.com/lucianopaz/pymc4 into pri…

20ec155

…or_sample_shape

Finished vectorized forward sampling functions and tests

469aa4c

Merge branch 'master' of https://github.com/lucianopaz/pymc3 into pri…

271d63d

…or_sample_shape

Added use_auto_batching flag to sample

4570ec8

lucianopaz requested review from ColCarroll, junpenglao and twiecki December 19, 2019 07:57

lucianopaz added the enhancement New feature or request label Dec 19, 2019

twiecki reviewed Dec 19, 2019

View reviewed changes

ferrine reviewed Dec 19, 2019

View reviewed changes

lucianopaz added 2 commits December 19, 2019 13:08

Fixed mypy and docstring indents

4807700

Collect unreduced log_prob to get a single log_prob for each individu…

e7c1525

…al chain

junpenglao reviewed Dec 20, 2019

View reviewed changes

lucianopaz mentioned this pull request Jan 2, 2020

JointDistributionCoroutine vs Pymc4 style models #207

Closed

junpenglao mentioned this pull request Jan 14, 2020

batch_shape param for non scalar distributions #210

Closed

Fixed conflicts with InferenceData outputs and shape changes

dc02584

lucianopaz added 4 commits January 21, 2020 23:30

Fixed mypy issues.

cc2ad36

Enable ppc sampling with root variables that are observed

08e004f

Applied API change proposed by @junpenglao

4254dae

Merge branch 'master' into prior_sample_shape

d737fb0

pymc-devs deleted a comment from lucianopaz Feb 7, 2020

lucianopaz mentioned this pull request Feb 10, 2020

ENH: add Flat and HalfFlat Distributions #219

Merged

lucianopaz mentioned this pull request Feb 13, 2020

Additional axis added for heirarchical variables #222

Closed

lucianopaz added 4 commits February 13, 2020 22:21

Merge branch 'master' of https://github.com/pymc-devs/pymc4 into prio…

45a8670

…r_sample_shape

Resolved conflicts and changed names

453a67f

Fixed naming inconsistencies and missed name changes

c111110

Renamed plate in notebooks

901f859

roblem mentioned this pull request Feb 14, 2020

Additional axis added when using tf.matmult #223

Closed

twiecki merged commit cabd67c into pymc-devs:master Feb 14, 2020

lucianopaz mentioned this pull request Feb 14, 2020

Add shape support #91

Closed

lucianopaz mentioned this pull request Jul 6, 2020

Write a notebook on manual model vectorization #294

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced usage: allow the users to deal with manually batching models #193

Advanced usage: allow the users to deal with manually batching models #193

lucianopaz commented Dec 19, 2019

lucianopaz commented Dec 19, 2019

twiecki Dec 19, 2019

twiecki Dec 19, 2019

ferrine Dec 19, 2019

lucianopaz Dec 19, 2019

junpenglao Dec 20, 2019

junpenglao Dec 20, 2019

lucianopaz Dec 20, 2019 •

edited

Loading

junpenglao left a comment

junpenglao Dec 20, 2019

junpenglao Dec 20, 2019

twiecki Dec 20, 2019

lucianopaz Dec 20, 2019

lucianopaz Dec 20, 2019

fonnesbeck Feb 7, 2020

junpenglao Feb 13, 2020

lucianopaz Feb 13, 2020

twiecki Feb 13, 2020

junpenglao Feb 13, 2020

lucianopaz Feb 13, 2020

junpenglao commented Dec 23, 2019

lucianopaz commented Jan 21, 2020

lucianopaz commented Jan 29, 2020

junpenglao commented Feb 7, 2020

junpenglao commented Feb 13, 2020

lucianopaz commented Feb 13, 2020

twiecki commented Feb 14, 2020

Advanced usage: allow the users to deal with manually batching models #193

Advanced usage: allow the users to deal with manually batching models #193

Conversation

lucianopaz commented Dec 19, 2019

1 and 2. Problems with plate kwarg

3. The use_auto_batching flag

Things left out

lucianopaz commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucianopaz Dec 20, 2019 • edited Loading

Choose a reason for hiding this comment

junpenglao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junpenglao commented Dec 23, 2019

lucianopaz commented Jan 21, 2020

lucianopaz commented Jan 29, 2020

junpenglao commented Feb 7, 2020

junpenglao commented Feb 13, 2020

lucianopaz commented Feb 13, 2020

twiecki commented Feb 14, 2020

1 and 2. Problems with `plate` kwarg

3. The `use_auto_batching` flag

lucianopaz Dec 20, 2019 •

edited

Loading