Replace incomplete_beta with betainc and speedup logcdf tests #4736

ricardoV94 · 2021-06-04T16:45:36Z

Implement Betainc

This PR replaces the incomplete_beta method in dist_math by an aesara op betainc that wraps the scipy.special.betainc method. Supersedes #4519; Closes #4420.

This provides a large increase in speed (and graph compilation) even without providing the c_code implementation directly (this can always be added later). I did some comparisons here. On my machine, the compilation time of a simple input/output function for the incomplete_beta was around 10 seconds, and it's pretty much instantaneous for the betainc as would be expected. Run time is ~2 orders of magnitude faster for the main op (even though it's calling scipy via Python), and ~4-5 orders of magnitude faster for the gradient.

It also allows the evaluation of the logcdf for multiple values in Beta, StudentT, Binomial, NegativeBinomial (and zero inflated variants). I removed the previous limitation warning when trying to evaluate these methods for multiple values. These changes are critical for any practical implementation of Truncated Versions of these distributions.

The derivative algorithm was adapted from this paper and requires the use of two extra "pseudo aesara ops" that simply wrap the equivalent python function. This does not allow for further graph optimizations, but I doubt aesara could do much with the previous scan implementation either. I did some extensive comparison with the STAN implementation, and these derivatives seem to be both faster and more accurate than theirs, although it's not easy to find a good numerical reference. Some discussion can be found in stan-dev/math#2417

The one downside is the constant warning that the op has no c-implementation whenever the logp/logcdf makes use of it.

Speedup logcdf tests

The logcdf tests are refactored to avoid recreating the logcdf graph for every parameter / value evaluation. This supersedes #4734. The old incomplete_beta was found to be defective when attempting #4734, and so I decided to merge the two PRs into this one instead.

This allows to remove most of the custom n_samples limitations, and makes the test suite run considerably faster.

A temporary work-around for the HyperGeometric tests was added, related to an optimization issue on Aesara side pymc-devs/pytensor#461

Finally, closes #4467

ricardoV94 · 2021-06-04T20:14:42Z

Failing test was related to the minimum scipy version 1.2.0 not including betabinom, and one of tests now calls it, incidentally, to draw the initval. This is somewhat related to #4737

I raised the requirement to 1.4.1, is this okay? There were some concerns about raising it to 1.6.0, because Colab is still on 1.4.1, but this should be fine in that regard.

ricardoV94 · 2021-06-04T21:25:08Z

Looking at the pytest workflow, this PR seems to shave about 20-25 minutes from ~1h05m to 40m

brandonwillard

This is the wrong place for an Aesara implementation of a scipy.special Op. Such Ops should be submitted as a PR to Aesara, so they can go alongside all the other scipy.special Ops; otherwise, we'll have more of the same unnecessary overlap in testing and maintenance that we've been trying to avoid.

Regardless, this is a very exciting update!

ricardoV94 · 2021-06-05T05:49:24Z

I can open a PR for the betainc in Aesara. When I opened the PR for the first time nobody mentioned that interest :)

junpenglao · 2021-06-05T05:59:12Z

pymc3/distributions/dist_math.py

+    else:
+        dK = np.log1p(-x) + scipy.special.digamma(p + q) - scipy.special.digamma(q)
+
+    for n in range(1, max_iters + 1):


Is the plain python for loop going to be slow here, or is the function is meant to run in python/numpy/scipy

Main Python loop for now. It's not really possible to vectorize as it is an iterative loop. It's still about ~4-5 orders of magnitude faster than the old C derivatives obtained by autodiff from the scan implementation of the incomplete_beta.

It's also very simple code to port to C or to Numba but I didn't want to do it just yet.

I have some speed comparisons here https://www.github.com/ricardoV94/derivatives_betainc/tree/master/comparison_aesara.ipynb

brandonwillard · 2021-06-05T06:00:21Z

I can open a PR for the betainc in Aesara. When I opened the PR for the first time nobody mentioned that interest :)

Sorry, I've been too busy to keep up on PRs, but, if you request a review, I'll usually get to it by the weekend.

ricardoV94 · 2021-06-05T06:12:20Z

I can open a PR for the betainc in Aesara. When I opened the PR for the first time nobody mentioned that interest :)

Sorry, I've been too busy to keep up on PRs, but, if you request a review, I'll usually get to it by the weekend.

I didn't mean anyone was at fault, there is no rush on my side. I'll open it there soon. 😅

Closes pymc-devs#4420

Also reverts reduced test n_samples due to speed issues

…/issues/461

…nverseGamma` Closes pymc-devs#4467

ricardoV94 · 2021-07-12T09:28:59Z

Closing this one and opening a new one, since many things have changed

ricardoV94 mentioned this pull request Jun 4, 2021

Speedup logcdf tests #4734

Closed

ricardoV94 force-pushed the inc_beta branch from bc12c7c to dc4f211 Compare June 4, 2021 16:49

ricardoV94 mentioned this pull request Jun 4, 2021

Systematized Distribution testing overlooks Theano arguments #4001

Closed

ricardoV94 force-pushed the inc_beta branch from 47c4b03 to a9dbea5 Compare June 4, 2021 20:24

brandonwillard suggested changes Jun 5, 2021

View reviewed changes

junpenglao reviewed Jun 5, 2021

View reviewed changes

ricardoV94 mentioned this pull request Jun 6, 2021

Implement betainc and derivatives aesara-devs/aesara#464

Merged

6 tasks

ricardoV94 marked this pull request as draft June 6, 2021 13:28

This was referenced Jun 9, 2021

Move content of distributions.special into distributions.dist_math #4760

Merged

Update conda env files #4757

Merged

ricardoV94 force-pushed the inc_beta branch from a9dbea5 to 9027b57 Compare June 10, 2021 18:13

ricardoV94 changed the base branch from v4 to main June 10, 2021 18:36

ricardoV94 mentioned this pull request Jul 1, 2021

Reenable tests that were mostly good already #4830

Merged

ricardoV94 added 6 commits July 12, 2021 11:12

Replace custom incomplete_beta with aesara.tensor.betainc

eebed5e

Closes pymc-devs#4420

Speedup check_logcdf and check_selfconsistency_discrete_logcdf tests

8eae85f

Also reverts reduced test n_samples due to speed issues

Float32 skipif on Beta and StudentT logcdf tests

752c184

Remove workaround for HyperGeometric logcdf failure related to aesara…

1db0c19

…/issues/461

Remove gammainc(c) safeguards in logcdf methods of Gamma and `I…

e54729f

…nverseGamma` Closes pymc-devs#4467

Update scipy dependency to >= 1.4.1

20e5f89

ricardoV94 force-pushed the inc_beta branch from 9027b57 to 20e5f89 Compare July 12, 2021 09:26

ricardoV94 closed this Jul 12, 2021

ricardoV94 mentioned this pull request Jul 12, 2021

Replace incomplete_beta with at.betainc and speedup/clean logcdf tests #4857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace incomplete_beta with betainc and speedup logcdf tests #4736

Replace incomplete_beta with betainc and speedup logcdf tests #4736

ricardoV94 commented Jun 4, 2021 •

edited

Loading

ricardoV94 commented Jun 4, 2021 •

edited

Loading

ricardoV94 commented Jun 4, 2021

brandonwillard left a comment •

edited

Loading

ricardoV94 commented Jun 5, 2021

junpenglao Jun 5, 2021

ricardoV94 Jun 5, 2021 •

edited

Loading

brandonwillard commented Jun 5, 2021

ricardoV94 commented Jun 5, 2021 •

edited

Loading

ricardoV94 commented Jul 12, 2021

Replace incomplete_beta with betainc and speedup logcdf tests #4736

Replace incomplete_beta with betainc and speedup logcdf tests #4736

Conversation

ricardoV94 commented Jun 4, 2021 • edited Loading

Implement Betainc

Speedup logcdf tests

ricardoV94 commented Jun 4, 2021 • edited Loading

ricardoV94 commented Jun 4, 2021

brandonwillard left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 commented Jun 5, 2021

junpenglao Jun 5, 2021

Choose a reason for hiding this comment

ricardoV94 Jun 5, 2021 • edited Loading

Choose a reason for hiding this comment

brandonwillard commented Jun 5, 2021

ricardoV94 commented Jun 5, 2021 • edited Loading

ricardoV94 commented Jul 12, 2021

ricardoV94 commented Jun 4, 2021 •

edited

Loading

ricardoV94 commented Jun 4, 2021 •

edited

Loading

brandonwillard left a comment •

edited

Loading

ricardoV94 Jun 5, 2021 •

edited

Loading

ricardoV94 commented Jun 5, 2021 •

edited

Loading