Generalized poisson distribution #4775

ally-lee · 2021-06-16T23:56:15Z

In this PR, I am adding an implementation of the Generalized Poisson distribution.

In the standard Poisson distribution, the mean is always equal to the variance. The Generalized Poisson has a dispersion parameter -1 <= lambda <= 1 that controls the variance. When lambda < 0, the variance is less than the mean. When lambda > 0, the variance is greater than the mean. When lambda = 0, the Generalized Poisson reduces to the standard Poisson. You can read more about the distribution in this blog post.

The Generalized Poisson likelihood was very useful for a recent project where we were modeling count data because we wanted that flexibility in dispersion. If you're interested in checking out our work, it's available here.

I defined the Generalized Poisson as a custom distribution in PyMC3 for my project. I will also submit a PR to the pymc-examples repo with a notebook outlining how I defined a custom distribution.

pymc3/distributions/discrete.py

ricardoV94 · 2021-06-17T04:32:47Z

Hi @ally-lee thank you for opening the PR. It seems that your distribution implementation still mixes a bit of the old PyMC logic.

There is a big change going on that requires distributions to be implemented in a slightly different manner. You can have a look here for a quick guide of how to translate distributions to the new version: #4686 (comment)

And here about how to create these new RandomVariable ops that implement the random method: #4518 (comment)

pymc3/distributions/discrete.py

ally-lee · 2021-06-20T18:31:46Z

Hi @ricardoV94 thank you for this feedback! I will take a stab at making these updates.

ricardoV94 · 2021-06-20T18:35:08Z

There is also a current PR with a developer facing guide on adding new distributions: #4783

It seems your new notebook in pymc-examples is more user-focused, which we also need. I think the two new resources could work very well together.

Feel free to ask any questions about the new changes, if you have any.

pymc3/distributions/discrete.py

OriolAbril · 2021-06-20T21:03:22Z

There is also a current PR with a developer facing guide on adding new distributions: #4783

I wanted to echo #4783 (comment) to make sure the new distribution is shown in the docs when the time comes and we make a release. In this case it should go somewhere in https://github.com/pymc-devs/pymc3/blob/main/docs/source/api/distributions/continuous.rst

pymc3/distributions/discrete.py

ricardoV94 · 2021-06-23T07:48:54Z

We will also need tests. Let me know if you need help

pymc-devs/pymc#4775 (comment)

ally-lee · 2021-06-30T01:30:54Z

Thank you for being patient with me! I probably won't have time to work on this again until 2 weeks from now. If this needs to go out soon and you're able to write the tests without my help, feel free to take it over. If not, I'm happy to work on them myself!

* Create custom_distribution.ipynb * Update custom_distribution.ipynb * Replace 0 with -inf * Set bound on theta to be > 0 * Simplify logp expression pymc-devs/pymc#4775 (comment) * address PR comments * upload csv to data folder * move notebook to examples/pymc3_howtos * add notebook to table of contents * read csv from data folder using try except clause * pre-commit * address PR comments * make predictions on heldout set instead of future * add sampling sanity check

twiecki · 2021-09-16T18:22:40Z

@ally-lee Any progress on this or should we close it?

ally-lee · 2021-09-29T23:40:25Z

Hi, I believe the only thing left is to write tests. If anyone else wants to work on them, that would be awesome! I'm sorry I won't have time to work on this anymore, so if no one can complete this, I'm ok with closing it out.

mjhajharia · 2021-12-13T20:15:46Z

@twiecki i tried adding tests, but an equivalent doesn't exist in scipy or any popular library, there's one in statsmodels but that's a regression model, and there's this one implementation with a test here in julia https://github.com/grero/StatUtils.jl/blob/200994b3143b692499f11028ea4e5c4c749463b0/test/runtests.jl#L79, but i couldn't find any reliable place to test, with v4 there's already lots of work + this is missing moments and logcdf so maybe we can just tag this inactive and let it be till we have a better solution or more time, because I think the rest of the PR is very nicely done

twiecki · 2021-12-15T10:53:20Z

@mjhajharia Can't we write a test against scipy's Poisson and set lambda=0? "When lambda = 0, the Generalized Poisson reduces to the standard Poisson." That way we have at least that part covered.

OriolAbril · 2021-12-15T11:04:25Z

This looks like the perfect candidate for https://github.com/pymc-devs/pymc-experimental where IIUC, extensive testing wouldn't be required (or maybe even no testing could be enough). cc @fonnesbeck

mjhajharia · 2021-12-31T10:18:21Z

@mjhajharia Can't we write a test against scipy's Poisson and set lambda=0? "When lambda = 0, the Generalized Poisson reduces to the standard Poisson." That way we have at least that part covered.

yes we can!

twiecki · 2022-01-03T03:06:35Z

@mjhajharia I think that would be good enough for us here.

codecov · 2022-01-03T13:16:58Z

Codecov Report

Merging #4775 (76a06d8) into main (e63cb5b) will decrease coverage by 0.09%.
The diff coverage is 39.39%.

❗ Current head 76a06d8 differs from pull request most recent head 85678d6. Consider uploading reports for the commit 85678d6 to get more accurate results

@@            Coverage Diff             @@
##             main    #4775      +/-   ##
==========================================
- Coverage   79.01%   78.92%   -0.10%     
==========================================
  Files          87       87              
  Lines       14370    14403      +33     
==========================================
+ Hits        11354    11367      +13     
- Misses       3016     3036      +20

Impacted Files	Coverage Δ
pymc/distributions/discrete.py	`94.10% <39.39%> (-4.62%)`	⬇️

ricardoV94 · 2022-03-18T08:47:16Z

I suggest we move this PR to https://github.com/pymc-devs/pymc-experimental.

If nobody objects I'll close it soon

twiecki · 2022-03-18T08:55:18Z

Actually I think with the test for the poisson this is good for pymc proper.

ricardoV94 · 2022-03-18T09:55:11Z

Actually I think with the test for the poisson this is good for pymc proper.

Then I think we should have some tests with lambda != 0

twiecki · 2022-03-18T10:24:22Z

What test could we do?

ricardoV94 · 2022-03-18T10:30:56Z

What test could we do?

Old school comparison with an independent implementation? I think we do that for the ExGaussian for example. Someone just evaluated a bunch of points from the reference R library and checked that our implementation matches

pymc/distributions/discrete.py

pymc/tests/test_distributions.py

pymc/tests/test_distributions_random.py

…es with large ranges

ricardoV94 · 2022-04-04T15:08:38Z

I pushed several changes, namely:

Manually vectorize random method
This is critical for large variables with large mu, otherwise we would be doing the same slow step by step CDF calculation for every draw in the same variable.
Improve numerical stability of random method by computing CDF on the log scale
The vanilla method would start failing for large mu due to rounding issues
Added more tests

According to the reference paper, we should probably use a mixed algorithm to take draws, depending on whether lambda is positive or negative. This might alleviate the need for computing the CDF on the log-scale.

I would still propose to move this distribution to pymc-experimental, but in the meantime we need #5689 and #5688 merged.

@ally-lee If you have the chance, I would appreciate if you could review/comment on the changes I have made.

Co-authored-by: Ricardo Vieira <[email protected]>

ricardoV94 · 2022-05-17T20:36:19Z

We have a working version of this. Will add it in a new PR, perhaps in pymc-experimental

reddigari · 2022-11-18T17:27:52Z

Any update on the possibility of getting this distribution into pymc or pymc-experimental? Works beautifully.

ricardoV94 · 2022-11-19T06:59:57Z

Yeah I'll try to add it next week to pymc-experimental. We found some issues when trying to improve the random method so never came around doing it. But it's definitely usable already

jaharvey8 · 2023-03-15T03:42:37Z

Did this ever get added to pymc-experimental? I didn't see it but I might be looking in the wrong place.

ricardoV94 · 2023-04-17T15:01:37Z

@jaharvey8 and @reddigari apologies for the delay. There is now a PR open on pymc-experimental and the distribution should be available soon: pymc-devs/pymc-experimental#143

jaharvey8 · 2023-04-17T16:15:13Z

Awesome, thank you! I had just copied the code and created a new class within my script so this will be nice to have added.

ally-lee commented Jun 17, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 requested changes Jun 17, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ally-lee mentioned this pull request Jun 20, 2021

Guide for writing custom distributions pymc-devs/pymc-examples#184

Open

ally-lee commented Jun 20, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ally-lee commented Jun 21, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 21, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 23, 2021

View reviewed changes

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved

ally-lee added a commit to ally-lee/pymc-examples that referenced this pull request Jun 27, 2021

Simplify logp expression

90d89c4

pymc-devs/pymc#4775 (comment)

mjhajharia self-assigned this Nov 8, 2021

mjhajharia removed their assignment Dec 13, 2021

mjhajharia force-pushed the generalized-poisson branch from 4b033e4 to 76a06d8 Compare January 3, 2022 12:58

mjhajharia force-pushed the generalized-poisson branch from ecf8772 to 93c0ea6 Compare January 5, 2022 16:31

mjhajharia requested a review from ricardoV94 January 5, 2022 16:45

ricardoV94 reviewed Mar 31, 2022

View reviewed changes

pymc/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Mar 31, 2022

View reviewed changes

pymc/distributions/discrete.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Mar 31, 2022

View reviewed changes

pymc/tests/test_distributions.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Mar 31, 2022

View reviewed changes

pymc/tests/test_distributions_random.py Outdated Show resolved Hide resolved

fonnesbeck and others added 3 commits April 4, 2022 15:42

Refactored pymc3_random_discrete to use histograms instead of raw counts

483dfb0

Fix non-stochastic behavior in pymc_random* test helpers

11f207e

Make pymc_random_discrete test helper more robust to discrete variabl…

53e0e3a

…es with large ranges

ricardoV94 force-pushed the generalized-poisson branch from 86bf8ad to cf17df8 Compare April 4, 2022 13:45

ricardoV94 marked this pull request as draft April 4, 2022 14:40

ricardoV94 force-pushed the generalized-poisson branch from cf17df8 to 050f975 Compare April 4, 2022 14:41

ricardoV94 mentioned this pull request Apr 4, 2022

Misc changes #5689

Merged

ricardoV94 force-pushed the generalized-poisson branch 3 times, most recently from c6475e3 to a4b6e8d Compare April 5, 2022 08:26

Implement generalized poisson distribution

41d5283

Co-authored-by: Ricardo Vieira <[email protected]>

ricardoV94 force-pushed the generalized-poisson branch from a4b6e8d to 41d5283 Compare April 5, 2022 09:50

ricardoV94 closed this May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized poisson distribution #4775

Generalized poisson distribution #4775

ally-lee commented Jun 16, 2021

ricardoV94 commented Jun 17, 2021 •

edited

Loading

ally-lee commented Jun 20, 2021

ricardoV94 commented Jun 20, 2021

OriolAbril commented Jun 20, 2021

ricardoV94 commented Jun 23, 2021

ally-lee commented Jun 30, 2021

twiecki commented Sep 16, 2021

ally-lee commented Sep 29, 2021

mjhajharia commented Dec 13, 2021

twiecki commented Dec 15, 2021

OriolAbril commented Dec 15, 2021

mjhajharia commented Dec 31, 2021

twiecki commented Jan 3, 2022

codecov bot commented Jan 3, 2022 •

edited

Loading

ricardoV94 commented Mar 18, 2022

twiecki commented Mar 18, 2022

ricardoV94 commented Mar 18, 2022

twiecki commented Mar 18, 2022

ricardoV94 commented Mar 18, 2022

ricardoV94 commented Apr 4, 2022 •

edited

Loading

ricardoV94 commented May 17, 2022

reddigari commented Nov 18, 2022

ricardoV94 commented Nov 19, 2022

jaharvey8 commented Mar 15, 2023

ricardoV94 commented Apr 17, 2023

jaharvey8 commented Apr 17, 2023

Generalized poisson distribution #4775

Generalized poisson distribution #4775

Conversation

ally-lee commented Jun 16, 2021

ricardoV94 commented Jun 17, 2021 • edited Loading

ally-lee commented Jun 20, 2021

ricardoV94 commented Jun 20, 2021

OriolAbril commented Jun 20, 2021

ricardoV94 commented Jun 23, 2021

ally-lee commented Jun 30, 2021

twiecki commented Sep 16, 2021

ally-lee commented Sep 29, 2021

mjhajharia commented Dec 13, 2021

twiecki commented Dec 15, 2021

OriolAbril commented Dec 15, 2021

mjhajharia commented Dec 31, 2021

twiecki commented Jan 3, 2022

codecov bot commented Jan 3, 2022 • edited Loading

Codecov Report

ricardoV94 commented Mar 18, 2022

twiecki commented Mar 18, 2022

ricardoV94 commented Mar 18, 2022

twiecki commented Mar 18, 2022

ricardoV94 commented Mar 18, 2022

ricardoV94 commented Apr 4, 2022 • edited Loading

ricardoV94 commented May 17, 2022

reddigari commented Nov 18, 2022

ricardoV94 commented Nov 19, 2022

jaharvey8 commented Mar 15, 2023

ricardoV94 commented Apr 17, 2023

jaharvey8 commented Apr 17, 2023

ricardoV94 commented Jun 17, 2021 •

edited

Loading

codecov bot commented Jan 3, 2022 •

edited

Loading

ricardoV94 commented Apr 4, 2022 •

edited

Loading