Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized poisson distribution #4775

Closed
wants to merge 4 commits into from

Conversation

ally-lee
Copy link

In this PR, I am adding an implementation of the Generalized Poisson distribution.

In the standard Poisson distribution, the mean is always equal to the variance. The Generalized Poisson has a dispersion parameter -1 <= lambda <= 1 that controls the variance. When lambda < 0, the variance is less than the mean. When lambda > 0, the variance is greater than the mean. When lambda = 0, the Generalized Poisson reduces to the standard Poisson. You can read more about the distribution in this blog post.

The Generalized Poisson likelihood was very useful for a recent project where we were modeling count data because we wanted that flexibility in dispersion. If you're interested in checking out our work, it's available here.

I defined the Generalized Poisson as a custom distribution in PyMC3 for my project. I will also submit a PR to the pymc-examples repo with a notebook outlining how I defined a custom distribution.

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 17, 2021

Hi @ally-lee thank you for opening the PR. It seems that your distribution implementation still mixes a bit of the old PyMC logic.

There is a big change going on that requires distributions to be implemented in a slightly different manner. You can have a look here for a quick guide of how to translate distributions to the new version: #4686 (comment)

And here about how to create these new RandomVariable ops that implement the random method: #4518 (comment)

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
@ally-lee
Copy link
Author

Hi @ricardoV94 thank you for this feedback! I will take a stab at making these updates.

@ricardoV94
Copy link
Member

There is also a current PR with a developer facing guide on adding new distributions: #4783

It seems your new notebook in pymc-examples is more user-focused, which we also need. I think the two new resources could work very well together.

Feel free to ask any questions about the new changes, if you have any.

@OriolAbril
Copy link
Member

There is also a current PR with a developer facing guide on adding new distributions: #4783

I wanted to echo #4783 (comment) to make sure the new distribution is shown in the docs when the time comes and we make a release. In this case it should go somewhere in https://github.com/pymc-devs/pymc3/blob/main/docs/source/api/distributions/continuous.rst

pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
pymc3/distributions/discrete.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Member

We will also need tests. Let me know if you need help

ally-lee added a commit to ally-lee/pymc-examples that referenced this pull request Jun 27, 2021
@ally-lee
Copy link
Author

Thank you for being patient with me! I probably won't have time to work on this again until 2 weeks from now. If this needs to go out soon and you're able to write the tests without my help, feel free to take it over. If not, I'm happy to work on them myself!

MarcoGorelli pushed a commit to pymc-devs/pymc-examples that referenced this pull request Sep 1, 2021
* Create custom_distribution.ipynb

* Update custom_distribution.ipynb

* Replace 0 with -inf

* Set bound on theta to be > 0

* Simplify logp expression

pymc-devs/pymc#4775 (comment)

* address PR comments

* upload csv to data folder

* move notebook to examples/pymc3_howtos

* add notebook to table of contents

* read csv from data folder using try except clause

* pre-commit

* address PR comments

* make predictions on heldout set instead of future

* add sampling sanity check
@twiecki
Copy link
Member

twiecki commented Sep 16, 2021

@ally-lee Any progress on this or should we close it?

@ally-lee
Copy link
Author

Hi, I believe the only thing left is to write tests. If anyone else wants to work on them, that would be awesome! I'm sorry I won't have time to work on this anymore, so if no one can complete this, I'm ok with closing it out.

@mjhajharia mjhajharia self-assigned this Nov 8, 2021
@mjhajharia mjhajharia removed their assignment Dec 13, 2021
@mjhajharia
Copy link
Member

@twiecki i tried adding tests, but an equivalent doesn't exist in scipy or any popular library, there's one in statsmodels but that's a regression model, and there's this one implementation with a test here in julia https://github.com/grero/StatUtils.jl/blob/200994b3143b692499f11028ea4e5c4c749463b0/test/runtests.jl#L79, but i couldn't find any reliable place to test, with v4 there's already lots of work + this is missing moments and logcdf so maybe we can just tag this inactive and let it be till we have a better solution or more time, because I think the rest of the PR is very nicely done

@twiecki
Copy link
Member

twiecki commented Dec 15, 2021

@mjhajharia Can't we write a test against scipy's Poisson and set lambda=0? "When lambda = 0, the Generalized Poisson reduces to the standard Poisson." That way we have at least that part covered.

@OriolAbril
Copy link
Member

This looks like the perfect candidate for https://github.com/pymc-devs/pymc-experimental where IIUC, extensive testing wouldn't be required (or maybe even no testing could be enough). cc @fonnesbeck

@mjhajharia
Copy link
Member

@mjhajharia Can't we write a test against scipy's Poisson and set lambda=0? "When lambda = 0, the Generalized Poisson reduces to the standard Poisson." That way we have at least that part covered.

yes we can!

@twiecki
Copy link
Member

twiecki commented Jan 3, 2022

@mjhajharia I think that would be good enough for us here.

@codecov
Copy link

codecov bot commented Jan 3, 2022

Codecov Report

Merging #4775 (76a06d8) into main (e63cb5b) will decrease coverage by 0.09%.
The diff coverage is 39.39%.

❗ Current head 76a06d8 differs from pull request most recent head 85678d6. Consider uploading reports for the commit 85678d6 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4775      +/-   ##
==========================================
- Coverage   79.01%   78.92%   -0.10%     
==========================================
  Files          87       87              
  Lines       14370    14403      +33     
==========================================
+ Hits        11354    11367      +13     
- Misses       3016     3036      +20     
Impacted Files Coverage Δ
pymc/distributions/discrete.py 94.10% <39.39%> (-4.62%) ⬇️

@ricardoV94
Copy link
Member

I suggest we move this PR to https://github.com/pymc-devs/pymc-experimental.

If nobody objects I'll close it soon

@twiecki
Copy link
Member

twiecki commented Mar 18, 2022

Actually I think with the test for the poisson this is good for pymc proper.

@ricardoV94
Copy link
Member

Actually I think with the test for the poisson this is good for pymc proper.

Then I think we should have some tests with lambda != 0

@twiecki
Copy link
Member

twiecki commented Mar 18, 2022

What test could we do?

@ricardoV94
Copy link
Member

What test could we do?

Old school comparison with an independent implementation? I think we do that for the ExGaussian for example. Someone just evaluated a bunch of points from the reference R library and checked that our implementation matches

@ricardoV94 ricardoV94 marked this pull request as draft April 4, 2022 14:40
@ricardoV94 ricardoV94 mentioned this pull request Apr 4, 2022
@ricardoV94
Copy link
Member

ricardoV94 commented Apr 4, 2022

I pushed several changes, namely:

  1. Manually vectorize random method
    This is critical for large variables with large mu, otherwise we would be doing the same slow step by step CDF calculation for every draw in the same variable.
  2. Improve numerical stability of random method by computing CDF on the log scale
    The vanilla method would start failing for large mu due to rounding issues
  3. Added more tests

According to the reference paper, we should probably use a mixed algorithm to take draws, depending on whether lambda is positive or negative. This might alleviate the need for computing the CDF on the log-scale.

I would still propose to move this distribution to pymc-experimental, but in the meantime we need #5689 and #5688 merged.

@ally-lee If you have the chance, I would appreciate if you could review/comment on the changes I have made.

@ricardoV94 ricardoV94 force-pushed the generalized-poisson branch 3 times, most recently from c6475e3 to a4b6e8d Compare April 5, 2022 08:26
@ricardoV94
Copy link
Member

We have a working version of this. Will add it in a new PR, perhaps in pymc-experimental

@ricardoV94 ricardoV94 closed this May 17, 2022
@reddigari
Copy link

Any update on the possibility of getting this distribution into pymc or pymc-experimental? Works beautifully.

@ricardoV94
Copy link
Member

Yeah I'll try to add it next week to pymc-experimental. We found some issues when trying to improve the random method so never came around doing it. But it's definitely usable already

@jaharvey8
Copy link
Contributor

Did this ever get added to pymc-experimental? I didn't see it but I might be looking in the wrong place.

@ricardoV94
Copy link
Member

@jaharvey8 and @reddigari apologies for the delay. There is now a PR open on pymc-experimental and the distribution should be available soon: pymc-devs/pymc-experimental#143

@jaharvey8
Copy link
Contributor

Awesome, thank you! I had just copied the code and created a new class within my script so this will be nice to have added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants