Function to optimize prior under constraints #5231

AlexAndorra · 2021-11-30T16:54:10Z

Ready for review 🍾

This PR introduces the pm.find_optim_prior function, that finds the optimal prior parameters of a distribution under some constraints.

For instance, sometimes, while building a model, we think "I need a Gamma that has 95% probability mass between 0.1 and 0.4". It may take a long time to find an appropriate combination of parameters that give a decent Gamma.
pm.find_optim_prior basically answers the question: "What are the alpha and beta parameters that fit this constraint?".

Notes:

pm.find_optim_prior can be used with any PyMC distribution for which the log-cdf function is implemented.
For now, it only works for two-parameter distributions, as there are exactly two constraints (a lower and an upper bound).
When possible, it uses Aesara and gradients of the PyMC log CDF for more efficient computations. Otherwise, it falls back to Scipy's default.

Depending on what your PR does, here are a few things you might want to address in the description:

Docstring
Tests
linting/style checks have been run
Mentioned in RELEASE-NOTES.md

codecov · 2021-11-30T17:16:21Z

Codecov Report

Merging #5231 (d89e375) into main (600fe90) will increase coverage by 0.00%.
The diff coverage is 87.80%.

@@           Coverage Diff           @@
##             main    #5231   +/-   ##
=======================================
  Coverage   79.08%   79.08%           
=======================================
  Files          87       88    +1     
  Lines       14394    14435   +41     
=======================================
+ Hits        11383    11416   +33     
- Misses       3011     3019    +8

Impacted Files	Coverage Δ
pymc/func_utils.py	`87.50% <87.50%> (ø)`
pymc/__init__.py	`96.07% <100.00%> (+0.07%)`	⬆️
pymc/parallel_sampling.py	`86.33% <0.00%> (-1.00%)`	⬇️

ricardoV94

Looks cool, left some suggestions below

pymc/find_optim_prior.py

ferrine · 2021-11-30T17:32:56Z

I would do the API differently to make that less verbose and elegant. Ideally, something like this:

intercept = pm.Normal.fit("intercept", lower=0.1, upper=0.9)
# That could (in future) be extended as
intercept = pm.Normal.fit("intercept", data=input_data)

ricardoV94 · 2021-11-30T17:45:01Z

I would do the API differently to make that less verbose and elegant. Ideally, something like this:
intercept = pm.Normal.fit("intercept", lower=0.1, upper=0.9)
That could (in future) be extended as
intercept = pm.Normal.fit("intercept", data=input_data)

I don't agree. You still need to provide initial parameter guesses, lower/upper can be distribution parameters, and you might want to specify that some parameters should be fixed, as I mentioned above. All that is more difficult if you mix optimization goals with distribution initialization.

Also are you suggesting this would automatically add a new random variable to the model? I think this suggestion can create a lot of ambiguity, similar to how the .dist often confuses people. Specially because you need to play nice with all the additional keywords like dims, size, observed, transform and so on...

ricardoV94

Should this go into util.py? That's not a very discoverable file name but this is also a very small helper function.

pymc/find_optim_prior.py

AlexAndorra · 2021-11-30T19:22:20Z

I would do the API differently to make that less verbose and elegant. Ideally, something like this

@ferrine I agree with @ricardoV94 on this, at least for a first version, where we can try and see how people use that feature and what is missing (or not).

Should this go into util.py? That's not a very discoverable file name but this is also a very small helper function.

@ricardoV94 I actually put that into util.py originally, but got into issues when trying to make find_optim_prior accessible at the pm level 🤷‍♂️ It's not something I'm a master at, so if someone knows how to do that, I'm down to put it back into util.py

ricardoV94 · 2021-11-30T19:54:02Z

You can import it explicitly in init.py I think.

ricardoV94 · 2021-12-01T17:38:05Z

I suggest that instead of minimizing two distances: logcdf(lower) - np.log(alpha/2) and logcdf(upper) - np.log(1-alpha/2), we just minimize logdiffexp(logcdf(upper), logcdf(lower)) - np.log(mass), which among other things allows one to set the lower point to zero/a value where at the edge of the distribution support.

@aseyboldt do you see any loss in functionality by doing things like this?

pymc/tests/test_util.py

ricardoV94

Perhaps add a test with a discrete distribution?

AlexAndorra · 2021-12-26T16:39:46Z

All green @ricardoV94 🥳 Let's make that Xmas miracle happen!

pymc/func_utils.py

RELEASE-NOTES.md

pymc/tests/test_util.py

ricardoV94 · 2021-12-29T19:37:19Z

pymc/func_utils.py

+        jac = pm.aesaraf.compile_pymc([dist_params], aesara_jac, allow_input_downcast=True)
+    # when PyMC cannot compute the gradient
+    # TODO: use specific gradient, not implemented exception
+    except Exception:


I think the exception we want here is the one found in aesara.gradient.NullTypeGradError as well as NotImplementedError

Suggested change

except Exception:

except (NotImplementedError, NullTyppeGradError):

The exception that's thrown is a TypeError by the Aesara grad method (line 501) in aesara/gradient.py:

if cost is not None and cost.ndim != 0: > raise TypeError("cost must be a scalar.")

I added that error in the Except clause and tests pass locally. aesara.gradient.NullTypeGradError and NotImplementedError don't seem to be raised but I kept them in case they are by other cases we may have forgotten

We shouldn't catch that TypeError. That means we produced a wrong input to aesara grad.

The other two exceptions are the ones that (should) appear when a grad is not implemented for an Op.

That means we should make these two exceptions appear then, shouldn't we? Because they are not raised right now -- only the TypeError is raised
(here is to my first GH comment of the year 🥂 )

That means we should make these two exceptions appear then, shouldn't we? Because they are not raised right now -- only the TypeError is raised
(here is to my first GH comment of the year 🥂 )

Those two exceptions mean there is no grad implemented for some Op in the cdf which can very well happen and its a good reason to silently default to the scipy approximation. The TypeError, on the other hand, should not be catched, as that means we did something wrong.

In fact there was a point during this PR when it was always silently defaulting to the scipy approximation because we were passing two values to grad and suppressing the TypeError.

That seems to be the case. Locally it is passing in float64/float32 and 1e-5 precision, so we don't need to have the separate test just for the Poisson anymore.

Ha ha damn, I just pushed without that change. Tests are indeed passing locally. Gonna refactor the tests and push again

Hot damn, tests are passing locally 🔥 Pushed!
Why does the symbolic gradien help so much with numerical errors?

Because the logcdf uses gammaincc whose gradient is notoriously tricky. We somewhat recently added a numerically stable(r) implementation to Aesara

aesara-devs/aesara#513

ricardoV94

See my reply above. We shouldn't catch the TypeError

ricardoV94

We should move the new tests to a file test_func_utils.py if func_utils.py is where the functions are going to live.

AlexAndorra · 2022-01-03T18:09:39Z

Good point, just did it

ricardoV94 · 2022-01-03T18:14:12Z

pymc/func_utils.py

+    Returns
+    -------
+    The optimized distribution parameters as a dictionary with the parameters'
+    name as key and the optimized value as value.


Follow up PR (not to be sadistic with this one): we should add a code example in the docstrings

I'm not preaching for my choir here, but I actually should add that here. Don't merge in the meantime. Will ping when done

Done @ricardoV94 -- we can merge once tests pass

AlexAndorra added 2 commits November 30, 2021 17:23

Replace print statement by AttributeError

8ca3ded

pre-commit formatting

9dc0096

AlexAndorra added enhancements v4 labels Nov 30, 2021

AlexAndorra added this to the v4.0.0-beta1 (vNext) milestone Nov 30, 2021

AlexAndorra requested review from lucianopaz and ricardoV94 November 30, 2021 16:54

AlexAndorra self-assigned this Nov 30, 2021

Mention in release notes

9675e4f

ricardoV94 reviewed Nov 30, 2021

View reviewed changes

pymc/find_optim_prior.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Nov 30, 2021

View reviewed changes

pymc/find_optim_prior.py Outdated Show resolved Hide resolved

AlexAndorra added 3 commits December 1, 2021 16:52

Handle 1-param and 3-param distributions

d132364

Update tests

6f9ccd4

Fix some wording

fea6643

AlexAndorra added 5 commits December 3, 2021 13:32

pre-commit formatting

524a900

Only raise UserWarning when mass_in_interval not optimal

91174b9

Raise NotImplementedError for non-scalar params

29741f1

Remove pipe operator for old python versions

1ad4297

Update tests

a708e6d

ricardoV94 reviewed Dec 3, 2021

View reviewed changes

pymc/tests/test_util.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Dec 3, 2021

View reviewed changes

ricardoV94 modified the milestones: v4.0.0-beta1 (vNext), v4.0.0-beta2 Dec 6, 2021

Add test with discrete distrib & wrap in pytest.warns(None)

e1c5125

Reduce Poisson test tol to 1% for float32

6ea7861

ricardoV94 reviewed Dec 26, 2021

View reviewed changes

pymc/func_utils.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Dec 26, 2021

View reviewed changes

pymc/func_utils.py Outdated Show resolved Hide resolved

AlexAndorra added 2 commits December 27, 2021 12:07

Remove Exponential logic

d63b652

Rename function

37e6251

AlexAndorra mentioned this pull request Dec 27, 2021

Exponential CDF gradient is failing #5289

Closed

twiecki requested changes Dec 27, 2021

View reviewed changes

RELEASE-NOTES.md Show resolved Hide resolved

pymc/tests/test_util.py Outdated Show resolved Hide resolved

pymc/tests/test_util.py Outdated Show resolved Hide resolved

Refactor test functions names

b912ac6

twiecki approved these changes Dec 28, 2021

View reviewed changes

twiecki requested a review from ricardoV94 December 28, 2021 20:01

ricardoV94 reviewed Dec 29, 2021

View reviewed changes

Use more precise exception for gradient

d4bce39

ricardoV94 requested changes Dec 30, 2021

View reviewed changes

ricardoV94 mentioned this pull request Jan 3, 2022

Fix Poisson logcdf #5307

Merged

AlexAndorra added 3 commits January 3, 2022 17:19

Don't catch TypeError

9a51289

Merge branch 'main' into optim-prior

90a88ff

Remove specific Poisson test

8b9ae6e

AlexAndorra requested a review from ricardoV94 January 3, 2022 16:37

Remove typo from old Poisson test

d53154a

ricardoV94 approved these changes Jan 3, 2022

View reviewed changes

Put tests for constrained priors into their own file

1f42835

ricardoV94 reviewed Jan 3, 2022

View reviewed changes

AlexAndorra added the don't merge label Jan 3, 2022

Add code examples in docstrings

bad236c

AlexAndorra removed the don't merge label Jan 4, 2022

Merge branch 'main' into optim-prior

d89e375

ricardoV94 merged commit 75ea2a8 into main Jan 4, 2022

AlexAndorra deleted the optim-prior branch January 4, 2022 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to optimize prior under constraints #5231

Function to optimize prior under constraints #5231

AlexAndorra commented Nov 30, 2021 •

edited

Loading

codecov bot commented Nov 30, 2021 •

edited

Loading

ricardoV94 left a comment

ferrine commented Nov 30, 2021 •

edited

Loading

ricardoV94 commented Nov 30, 2021

ricardoV94 left a comment

AlexAndorra commented Nov 30, 2021

ricardoV94 commented Nov 30, 2021

ricardoV94 commented Dec 1, 2021 •

edited

Loading

ricardoV94 left a comment

AlexAndorra commented Dec 26, 2021

ricardoV94 Dec 29, 2021 •

edited

Loading

AlexAndorra Dec 30, 2021

ricardoV94 Dec 30, 2021 •

edited

Loading

AlexAndorra Jan 1, 2022 •

edited

Loading

ricardoV94 Jan 1, 2022 •

edited

Loading

ricardoV94 Jan 3, 2022 •

edited

Loading

AlexAndorra Jan 3, 2022

AlexAndorra Jan 3, 2022

ricardoV94 Jan 3, 2022

ricardoV94 Jan 3, 2022

ricardoV94 left a comment

ricardoV94 left a comment

AlexAndorra commented Jan 3, 2022

ricardoV94 Jan 3, 2022

AlexAndorra Jan 3, 2022

AlexAndorra Jan 4, 2022

	except Exception:
	except (NotImplementedError, NullTyppeGradError):

Function to optimize prior under constraints #5231

Function to optimize prior under constraints #5231

Conversation

AlexAndorra commented Nov 30, 2021 • edited Loading

codecov bot commented Nov 30, 2021 • edited Loading

Codecov Report

ricardoV94 left a comment

Choose a reason for hiding this comment

ferrine commented Nov 30, 2021 • edited Loading

ricardoV94 commented Nov 30, 2021

ricardoV94 left a comment

Choose a reason for hiding this comment

AlexAndorra commented Nov 30, 2021

ricardoV94 commented Nov 30, 2021

ricardoV94 commented Dec 1, 2021 • edited Loading

ricardoV94 left a comment

Choose a reason for hiding this comment

AlexAndorra commented Dec 26, 2021

ricardoV94 Dec 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Dec 30, 2021 • edited Loading

Choose a reason for hiding this comment

AlexAndorra Jan 1, 2022 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Jan 1, 2022 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Jan 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

ricardoV94 left a comment

Choose a reason for hiding this comment

AlexAndorra commented Jan 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexAndorra commented Nov 30, 2021 •

edited

Loading

codecov bot commented Nov 30, 2021 •

edited

Loading

ferrine commented Nov 30, 2021 •

edited

Loading

ricardoV94 commented Dec 1, 2021 •

edited

Loading

ricardoV94 Dec 29, 2021 •

edited

Loading

ricardoV94 Dec 30, 2021 •

edited

Loading

AlexAndorra Jan 1, 2022 •

edited

Loading

ricardoV94 Jan 1, 2022 •

edited

Loading

ricardoV94 Jan 3, 2022 •

edited

Loading