Distribution variable API difficult to use for basic debugging: logp, logcdf, random samples, ppf #5798

hottwaj · 2022-05-24T09:58:02Z

Description of your problem

This is more of a proposal - please let me know if this would be better put elsewhere.

One thing I often want to do while experimenting with or debugging models is to individually explore the RVs I'm using, but I find this difficult with the current API.

Maybe I'm missing something in the docs, but I find the current API awkward if I then want to: evaluate or plot pdf/cdf/ppf or draw RVs. I find it really helpful to do this especially for distributions where pymc (very usefully) provides parameterisations that are not provided by e.g. scipy (e.g. Beta has mu and sigma parameters). If I can't "see" what pymc is doing with my basic priors then I either have to find the formulae to convert the parameterisation into scipy-compatible parameters, or resort to trial and error (the horror!).

Currently you have to do things like this:

# generate 1000 samples of Bernoulli RV
pymc.Bernoulli.dist(p = 0.04, shape = [1000]).eval()  

# evaluate PDF of Binomial RV over interval 0-1000
pymc.logp(pymc.Binomial.dist(p = 0.04, n=1000), numpy.arange(0, 1000)).eval()  

# I'm not sure of any way to evaluate PPF but maybe I'm missing it?

Is e.g. pymc.Continuous.dist(...) wrapping basic scipy distributions somewhere (I guess not), and if so could those be exposed in a more obvious way so that we can access the scipy API e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

Alternatively, could a more straightforward API for rvs/pdf/cdf/ppf etc be provided?

See also: #5032

Thanks for the awesome library :)

hottwaj · 2022-05-24T10:08:31Z

#5308 is also somewhat relevant

ricardoV94 · 2022-05-24T11:25:08Z

Two short notes:

We don't have ppfs implemented (yet) for our distributions. We do have cdf via pymc.cdf(dist, value) in case you care about that.
You should you pymc.draw(dist, draws) instead of dist.eval() for proper seeding in next releases.

For the main question. What do you imagine a more straightforward API would be?

Keep in mind that PyMC defines everything in terms of Aesara logp/random graphs, that's why you always get back a symbolic expression that needs to be compiled before evaluation (what your .eval() is doing).

hottwaj · 2022-05-24T13:05:19Z

Thanks for your reply! Noted on pymc.draw, thanks

Did you mean pymc.logcdf instead of pymc.cdf? Latter is not available in the version I'm using (4.0.0b6).

As for a more straightforward API I suppose I am biased having had some experience of scipy, but I find that API fairly intuitive
e.g.

my_beta = scipy.stats.beta(a = 1, b = 2)

# Can then call things like my_beta.rvs, or pdf/cdf/ppf/moment/mean etc

My opinion (please feel free to disagree :) ) is that it is easier for someone to "discover" the rvs/pdf/cdf etc methods in this case because the distribution object itself provides them. If you are experimenting in jupyter lab or some IDE, they will come up as suggestions as soon as you write my_rv.[ completion keyboard shortcut ]

With pymc discovery is a bit harder because relevant methods are of form pymc.logcdf, pymc.logp, pymc.draw. It's not obvious to me (as a user of scipy and earlier versions of pymc) that the pymc module is the place to look to find those methods.

Thanks!

ricardoV94 · 2022-05-24T13:53:07Z

Yes I meant logcdf. I understand what you mean now.

It was a decision choice to use functions instead of methods, because we are using vanilla Aesara objects under the hood that do not have such methods.

One longer term reason, is that we are doing some work that will bring logp methods not only for pure Distributions but also graphs based on those, so that you can do:

import pymc as pm
import aesara.tensor as at

censored_normal = at.clip(pm.Normal.dist(), -1, 1)
pm.logp(censored_normal, value).eval()   # It doesn't work just yet, but will hopefully in a future release

The point being that distributions are no longer the only things that have logp (and perhaps one day logcdf) methods. This is already true for draws.

censored_rvs = pm.draw(censored_normal, draws=5)

As such, any work towards adding methods that work with arbitrary Aesara objects and not just the RandomVariables that are returned directly by vanilla PyMC distributions has larger benefits.

We definitely want to make the basic functions logp, logcdf, draw easy to find. Is there any chance that the uneasiness you feel comes from experience with the older versions of PyMC3?

pymc-devs locked and limited conversation to collaborators May 24, 2022

ricardoV94 converted this issue into discussion #5801 May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Distribution variable API difficult to use for basic debugging: logp, logcdf, random samples, ppf #5798

Distribution variable API difficult to use for basic debugging: logp, logcdf, random samples, ppf #5798

hottwaj commented May 24, 2022

hottwaj commented May 24, 2022

ricardoV94 commented May 24, 2022

hottwaj commented May 24, 2022

ricardoV94 commented May 24, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Distribution variable API difficult to use for basic debugging: logp, logcdf, random samples, ppf #5798

Distribution variable API difficult to use for basic debugging: logp, logcdf, random samples, ppf #5798

Comments

hottwaj commented May 24, 2022

Description of your problem

hottwaj commented May 24, 2022

ricardoV94 commented May 24, 2022

hottwaj commented May 24, 2022

ricardoV94 commented May 24, 2022 • edited Loading

This issue was moved to a discussion.

ricardoV94 commented May 24, 2022 •

edited

Loading