ENH: add Flat and HalfFlat Distributions #219

tirthasheshpatel · 2020-02-08T16:46:31Z

Addresses #44.

Summary of my changes:

Add Flat distribution
Add HalfFlat distribution

Question

Do these need tests?

brianwa84 · 2020-02-08T17:15:31Z

pymc4/distributions/continuous.py

+        return tfd.Uniform(low=0.0, high=np.inf)
+
+    def log_prob(self, value):
+        return tf.cond(


You should use tf.where here

Done. Thanks!

lucianopaz · 2020-02-09T20:15:47Z

@tirthasheshpatel, thanks again. Contrary to what it may look like, these two distributions will need more tests than the rest for two big reasons:

They are improper distributions, so we can't draw forward samples from them (Flat.sample). I'm my opinion, if someone tries to call their sample method, a TypeError should be raised. However, we should be able to do inference in models that have flat or half flat priors (pm.sample), and given their posterior samples, we should be able to call sample posterior predictive on them. So they must behave a bit different from the rest of our distributions.
There is no tfp equivalent, so we must test that the returned log_prob values are correct. That sounds simple because the log_prob should always be zero. The difficulty here is the shape handling. If the value passed to log_prob has a rank that is lower than the distribution's event shape, we should raise an error, if the value has a shape that doesn't broadcast with the distribution's batch+event shape, we should raise an error. Actually, I think it's even more stringent than that, the rightmost axis of the passed values should exactly match the distribution's event shape, not just broadcast with it, or an error should be raised. Furthermore, the log prob should sum reduce over the event shape axis, in order to only get back a batch of zeroes.

Point 1, cannot be done until we fix #167, and point 2 is a pit in shape hell, so for now I think that we should leave this PR open and when we fix up #167 we can come back to it.

tirthasheshpatel · 2020-02-10T13:53:07Z

@lucianopaz Sorry for the late reply and thanks for such a detailed review! I will think more about the "shape hell" and try to raise relevant errors. Meanwhile, I will also try to catch up with #167 and try to resolve it!

lucianopaz · 2020-02-10T14:09:43Z

@tirthasheshpatel, for now don't worry about #167. I'm working on it in this branch, but it requires #193 to be finished first. My recomendation is to do one of the following:

Continue to implement regular distributions like what you did with the BetaBinomial, and at a later time we can come back to this PR and finish it up.
Write the tests for the Flat and HalfFlat distributions, make sure that the log_prob outputs the correct shape, and then mark the tests for predictive sampling and pm.sample as expected failures on pytest. Once we finish with Over reliance on evaluate_model #167, we'll remove the expected failure mark and ensure that these tests pass.

tirthasheshpatel · 2020-02-12T03:44:38Z

Some distributions in #44 don't have a tfp equivalent. So, do we have to wait for it to be implemented or is there a way around? (I was thinking about implementing classes inheriting from tfp's Distribution class)

tirthasheshpatel · 2020-02-12T05:41:26Z

pymc4/distributions/continuous.py

+        #     raise ValueError("Rank of input tensor less than distribution's event shape")
+        # # if the rightmost axis of `value` doesn't match the distribution's `event_shape`, raise an error
+        # if (
+        #     len(self._distribution.event_shape)


There is no tfp equivalent, so we must test that the returned log_prob values are correct. That sounds simple because the log_prob should always be zero. The difficulty here is shape handling. If the value passed to log_prob has a rank that is lower than the distribution's event shape, we should raise an error, if the value has a shape that doesn't broadcast with the distribution's batch+event shape, we should raise an error. Actually, I think it's even more stringent than that, the rightmost axis of the passed values should exactly match the distribution's event shape, not just broadcast with it, or an error should be raised. Furthermore, the log prob should sum reduce over the event shape axis, in order to only get back a batch of zeroes.

As the distribution is univariate the event_shape and batch_shape will always be () and so I think we don't need to worry about it, right?

The defaults will be (), but we can stack multiple independent variable that follow this distribution with something similar to tfd.Sample, and that will lead to an event shape that could be a tuple of any length.
Furthermore, in #167, we are also aiming to provide a similar mechanism as tfd.Sample but instead of stacking events, we will stack batches. So in the end, any distribution can have any number of event and batch axes. I just realized that I wrote the singular axis in my previous comment, but I meant it to be the plural, axes.

tirthasheshpatel · 2020-02-12T05:43:17Z

pymc4/distributions/continuous.py

+        #     raise ValueError(
+        #         "Batch shape of input tensor not consistent with the distributions's batch shape"
+        #     )
+        if tf.rank(value) > 2:


I have reduced sum when the shape of value is (n_samples, batch_shape, event_shape) along the event_shape axis. Shouldn't we reduce sum over n_samples axis?

We only must reduce sum over the event shape axes. That's what truly distinguishes between the batch and event shapes. For example,

>>> import tensorflow as tf >>> from tensorflow_probability import distributions as tfd >>> >>> d = tfd.Normal(loc=tf.zeros(1, 2), scale=1) >>> d = tfd.Sample(d, sample_shape=(3, 4, 5)) >>> d.batch_shape.as_list() [1, 2] >>> d.event_shape.as_list() [3, 4, 5] >>> >>> x = d.sample(sample_shape=(6, 7)) >>> x.numpy().shape (6, 7, 1, 2, 3, 4, 5) >>> v = d.log_prob(x) >>> v.numpy().shape (6, 7, 1, 2)

Oh! This is very clear now. Thanks!!

lucianopaz · 2020-02-12T07:03:31Z

pymc4/distributions/continuous.py

+        return tfd.Uniform(low=-np.inf, high=np.inf)
+
+    def log_prob(self, value):
+        # convert the value to tensor


I think that you're doing too many if statements. I recommend that you start out with expected = tf.zeros(self._distribution.batch_shape + self._distribution.event_shape). This would be the expected shape of a call to a tfd.Distribution.sample().

You should first check if value has a matching event shape part to expected, if yes then broadcast expected and value and create a zeros tensor with the resulting shape. Finally, you'll have to reduce sum over all event shape axes.

lucianopaz · 2020-02-17T08:15:12Z

Pinging @tirthasheshpatel, just to let you know that I've opened #227. Once that gets merged in, you should merge it into your branch and you'll be able to run all of the tests required by the Flat and HalfFlat distributions.

tirthasheshpatel · 2020-02-20T05:32:31Z

@lucianopaz sorry for getting back to you late. I will complete this by today.

…dists

lucianopaz

Great work @tirthasheshpatel! However, I think that some important tests are still missing before being able to merge this.

lucianopaz · 2020-02-20T11:20:02Z

tests/test_distributions.py

@@ -348,6 +362,8 @@ def test_rvs_test_point_are_valid(tf_seed, distribution_conditions):
    dist_class = getattr(pm, distribution_name)
    dist = dist_class(name=distribution_name, **conditions)
    test_value = dist.test_value
+    if distribution_name in ["Flat", "HalfFlat"]:
+        pytest.skip("Flat and HalfFlat distributions don't support sampling.")


We shouldn't skip this test. In the case of the Flat and HalfFlat, instead of comparing the test_value.shape against the test_sample.shape, we should compare it against the sample_shape + batch_shape + event_shape that we expect to get.

lucianopaz · 2020-02-20T11:20:42Z

pymc4/distributions/continuous.py

+            and value.shape[-len(self._distribution.event_shape) :]
+            != self._distribution.event_shape
+        ):
+            raise ValueError("values not consistent with the event shape of distribution")


You should write a test that raises this exception to make sure it's working

lucianopaz · 2020-02-20T11:21:04Z

pymc4/distributions/continuous.py

+        try:
+            expected = tf.broadcast_to(expected, value.shape)
+        except tf.python.framework.errors_impl.InvalidArgumentError:
+            raise ValueError("value can't be broadcasted to expected shape")


You should write a test that raises this exception to make sure it's working

lucianopaz · 2020-02-20T11:22:39Z

pymc4/distributions/continuous.py

+        expected = tf.zeros(self._distribution.batch_shape + self._distribution.event_shape)
+        # check if the event shape matches
+        if (
+            len(self._distribution.event_shape)


A minor nitpick, our distributions now have the event_shape and batch_shape properties, so you don't have to access the _distribution attribute to look them up. You can do self.event_shape and self.batch_shape instead

lucianopaz · 2020-02-20T11:23:13Z

pymc4/distributions/continuous.py

+            and value.shape[-len(self._distribution.event_shape) :]
+            != self._distribution.event_shape
+        ):
+            raise ValueError("values not consistent with the event shape of distribution")


The same comment as for the Flat distribution, we need a test that raises this exception

lucianopaz · 2020-02-20T11:23:17Z

pymc4/distributions/continuous.py

+        try:
+            expected = tf.broadcast_to(expected, value.shape) + value
+        except tf.python.framework.errors_impl.InvalidArgumentError:
+            raise ValueError("value can't be broadcasted to expected shape")


The same comment as for the Flat distribution, we need a test that raises this exception

lucianopaz · 2020-02-20T11:26:10Z

pymc4/distributions/continuous.py

+            raise ValueError("values not consistent with the event shape of distribution")
+        # broadcast expected to shape of value
+        try:
+            expected = tf.broadcast_to(expected, value.shape)


Does this work if value.shape=(3,) but the distribution has batch_shape=(4,) and event_shape=(3,), so the expected shape is (4, 3)? Wont it raise an error for that kind of situation?

Yes. I think this can be fixed by checking len(value.shape) < len(self.batch_shape + self.event_shape). If this condition evaluates to True, we only check if the values are consistent with batch_shape and we don't need to broadcast. While if the condition is False, let tf.broadcast_to handle it. What do you say? Is there a way around?

Code

# broadcast expected to shape of value if len(value.shape) < len(self.batch_shape + self.event_shape): expected = expected + value if value.shape[:-len(self.event_shape)] != list(reversed(self.batch_shape))[:(len(value.shape)-len(self.event_shape))]: raise ValueError("batch shape of values is not consistent with distribution's batch shape") else: try: expected = tf.broadcast_to(expected, value.shape) + value except tf.errors.InvalidArgumentError: raise ValueError("value can't be broadcasted to expected shape")

lucianopaz

I think that this is almost done. I left a few comments above

lucianopaz · 2020-02-21T07:08:47Z

pymc4/distributions/continuous.py

+        # broadcast expected to shape of value
+        if len(value.shape) < len(self.batch_shape + self.event_shape):
+            if (
+                value.shape[: -len(self.event_shape)]


This won't work if the event shape is a scalar, you'll end up with value.shape[:0]. You will have to change this to value.shape[:len(value.shape) - len(event_shape)]

lucianopaz · 2020-02-21T07:09:11Z

pymc4/distributions/continuous.py

+        if len(value.shape) < len(self.batch_shape + self.event_shape):
+            if (
+                value.shape[: -len(self.event_shape)]
+                != list(reversed(self.batch_shape))[: (len(value.shape) - len(self.event_shape))]


Why do you need to reverse this?

If the batch_shape is (1, 2), event_shape is (3, 4), and we want to sample (2, 3, 4) then the target must broadcast to (1, 2, 3, 4). So, we need to check the values of batch_shape from the rightmost axis.

Oh, I see the problem! I have to check the batch_shape[len(self.batch_shape) - len(value.shape):] instead of reversing it.

lucianopaz · 2020-02-21T07:11:23Z

tests/test_distributions.py

+
+
+@pytest.mark.parametrize("distribution_name", ["Flat", "HalfFlat"])
+@pytest.mark.parametrize("sample", [tf.zeros(1), tf.zeros((1, 3, 4)), tf.zeros((1, 5, 3, 4))])


We are using fixtures instead of mark parameterize. Could you change it to that?

tirthasheshpatel · 2020-03-10T05:35:16Z

want to finish this up @lucianopaz?

lucianopaz · 2020-03-31T20:58:45Z

Sorry @tirthasheshpatel for not being able to finish this up sooner. Covid quarantine and all, I can't put almost any time to review pymc stuff. I'll merge this PR because it seems ready to go. Great job!

tirthasheshpatel added 3 commits February 8, 2020 22:10

add flat and halfflat dists

6f64bf8

correct error message

c4ef636

correct error message

745e3cf

tirthasheshpatel requested a review from lucianopaz February 8, 2020 16:46

brianwa84 reviewed Feb 8, 2020

View reviewed changes

tirthasheshpatel added 2 commits February 10, 2020 22:31

replace valueerror with typeerror

a460829

change tf.cond to tf.where

52a6147

shape handing in log_prob

fb55ebd

tirthasheshpatel commented Feb 12, 2020

View reviewed changes

lucianopaz requested changes Feb 12, 2020

View reviewed changes

tirthasheshpatel added 3 commits February 20, 2020 11:06

Merge branch 'master' of https://github.com/pymc-devs/pymc4 into add-…

6291bed

…dists

correct conditions and add tests

8c0c672

fix lint

06fb059

tirthasheshpatel requested a review from lucianopaz February 20, 2020 10:56

lucianopaz requested changes Feb 20, 2020

View reviewed changes

tirthasheshpatel added 2 commits February 20, 2020 19:48

add conditions and tests

ad24b01

fix linting

0597ab3

tirthasheshpatel requested a review from lucianopaz February 20, 2020 15:22

lucianopaz requested changes Feb 21, 2020

View reviewed changes

add fixtures

c28778d

tirthasheshpatel requested a review from lucianopaz February 21, 2020 09:00

lucianopaz merged commit ccf2fe4 into pymc-devs:master Mar 31, 2020

tirthasheshpatel deleted the add-dists branch April 1, 2020 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add Flat and HalfFlat Distributions #219

ENH: add Flat and HalfFlat Distributions #219

tirthasheshpatel commented Feb 8, 2020

brianwa84 Feb 8, 2020

tirthasheshpatel Feb 11, 2020

lucianopaz commented Feb 9, 2020

tirthasheshpatel commented Feb 10, 2020

lucianopaz commented Feb 10, 2020

tirthasheshpatel commented Feb 12, 2020

tirthasheshpatel Feb 12, 2020

lucianopaz Feb 12, 2020

tirthasheshpatel Feb 12, 2020 •

edited

Loading

lucianopaz Feb 12, 2020

tirthasheshpatel Feb 12, 2020

lucianopaz Feb 12, 2020

lucianopaz commented Feb 17, 2020

tirthasheshpatel commented Feb 20, 2020

lucianopaz left a comment

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

lucianopaz Feb 20, 2020

tirthasheshpatel Feb 20, 2020 •

edited

Loading

lucianopaz left a comment

lucianopaz Feb 21, 2020

lucianopaz Feb 21, 2020

tirthasheshpatel Feb 21, 2020

tirthasheshpatel Feb 21, 2020

lucianopaz Feb 21, 2020

tirthasheshpatel Feb 21, 2020

tirthasheshpatel commented Mar 10, 2020

lucianopaz commented Mar 31, 2020



		@pytest.mark.parametrize("distribution_name", ["Flat", "HalfFlat"])
		@pytest.mark.parametrize("sample", [tf.zeros(1), tf.zeros((1, 3, 4)), tf.zeros((1, 5, 3, 4))])

ENH: add Flat and HalfFlat Distributions #219

ENH: add Flat and HalfFlat Distributions #219

Conversation

tirthasheshpatel commented Feb 8, 2020

Question

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucianopaz commented Feb 9, 2020

tirthasheshpatel commented Feb 10, 2020

lucianopaz commented Feb 10, 2020

tirthasheshpatel commented Feb 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel Feb 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucianopaz commented Feb 17, 2020

tirthasheshpatel commented Feb 20, 2020

lucianopaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

lucianopaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tirthasheshpatel commented Mar 10, 2020

lucianopaz commented Mar 31, 2020

tirthasheshpatel Feb 12, 2020 •

edited

Loading

tirthasheshpatel Feb 20, 2020 •

edited

Loading