-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized poisson distribution #4775
Conversation
Hi @ally-lee thank you for opening the PR. It seems that your distribution implementation still mixes a bit of the old PyMC logic. There is a big change going on that requires distributions to be implemented in a slightly different manner. You can have a look here for a quick guide of how to translate distributions to the new version: #4686 (comment) And here about how to create these new RandomVariable ops that implement the random method: #4518 (comment) |
Hi @ricardoV94 thank you for this feedback! I will take a stab at making these updates. |
There is also a current PR with a developer facing guide on adding new distributions: #4783 It seems your new notebook in pymc-examples is more user-focused, which we also need. I think the two new resources could work very well together. Feel free to ask any questions about the new changes, if you have any. |
I wanted to echo #4783 (comment) to make sure the new distribution is shown in the docs when the time comes and we make a release. In this case it should go somewhere in https://github.com/pymc-devs/pymc3/blob/main/docs/source/api/distributions/continuous.rst |
We will also need tests. Let me know if you need help |
Thank you for being patient with me! I probably won't have time to work on this again until 2 weeks from now. If this needs to go out soon and you're able to write the tests without my help, feel free to take it over. If not, I'm happy to work on them myself! |
* Create custom_distribution.ipynb * Update custom_distribution.ipynb * Replace 0 with -inf * Set bound on theta to be > 0 * Simplify logp expression pymc-devs/pymc#4775 (comment) * address PR comments * upload csv to data folder * move notebook to examples/pymc3_howtos * add notebook to table of contents * read csv from data folder using try except clause * pre-commit * address PR comments * make predictions on heldout set instead of future * add sampling sanity check
@ally-lee Any progress on this or should we close it? |
Hi, I believe the only thing left is to write tests. If anyone else wants to work on them, that would be awesome! I'm sorry I won't have time to work on this anymore, so if no one can complete this, I'm ok with closing it out. |
@twiecki i tried adding tests, but an equivalent doesn't exist in scipy or any popular library, there's one in statsmodels but that's a regression model, and there's this one implementation with a test here in julia https://github.com/grero/StatUtils.jl/blob/200994b3143b692499f11028ea4e5c4c749463b0/test/runtests.jl#L79, but i couldn't find any reliable place to test, with v4 there's already lots of work + this is missing moments and logcdf so maybe we can just tag this inactive and let it be till we have a better solution or more time, because I think the rest of the PR is very nicely done |
@mjhajharia Can't we write a test against scipy's Poisson and set lambda=0? "When lambda = 0, the Generalized Poisson reduces to the standard Poisson." That way we have at least that part covered. |
This looks like the perfect candidate for https://github.com/pymc-devs/pymc-experimental where IIUC, extensive testing wouldn't be required (or maybe even no testing could be enough). cc @fonnesbeck |
yes we can! |
@mjhajharia I think that would be good enough for us here. |
4b033e4
to
76a06d8
Compare
Codecov Report
@@ Coverage Diff @@
## main #4775 +/- ##
==========================================
- Coverage 79.01% 78.92% -0.10%
==========================================
Files 87 87
Lines 14370 14403 +33
==========================================
+ Hits 11354 11367 +13
- Misses 3016 3036 +20
|
ecf8772
to
93c0ea6
Compare
I suggest we move this PR to https://github.com/pymc-devs/pymc-experimental. If nobody objects I'll close it soon |
Actually I think with the test for the poisson this is good for pymc proper. |
Then I think we should have some tests with |
What test could we do? |
Old school comparison with an independent implementation? I think we do that for the ExGaussian for example. Someone just evaluated a bunch of points from the reference R library and checked that our implementation matches |
86bf8ad
to
cf17df8
Compare
cf17df8
to
050f975
Compare
I pushed several changes, namely:
According to the reference paper, we should probably use a mixed algorithm to take draws, depending on whether lambda is positive or negative. This might alleviate the need for computing the CDF on the log-scale. I would still propose to move this distribution to pymc-experimental, but in the meantime we need #5689 and #5688 merged. @ally-lee If you have the chance, I would appreciate if you could review/comment on the changes I have made. |
c6475e3
to
a4b6e8d
Compare
Co-authored-by: Ricardo Vieira <[email protected]>
a4b6e8d
to
41d5283
Compare
We have a working version of this. Will add it in a new PR, perhaps in pymc-experimental |
Any update on the possibility of getting this distribution into pymc or pymc-experimental? Works beautifully. |
Yeah I'll try to add it next week to pymc-experimental. We found some issues when trying to improve the random method so never came around doing it. But it's definitely usable already |
Did this ever get added to pymc-experimental? I didn't see it but I might be looking in the wrong place. |
@jaharvey8 and @reddigari apologies for the delay. There is now a PR open on pymc-experimental and the distribution should be available soon: pymc-devs/pymc-experimental#143 |
Awesome, thank you! I had just copied the code and created a new class within my script so this will be nice to have added. |
In this PR, I am adding an implementation of the Generalized Poisson distribution.
In the standard Poisson distribution, the mean is always equal to the variance. The Generalized Poisson has a dispersion parameter
-1 <= lambda <= 1
that controls the variance. Whenlambda < 0
, the variance is less than the mean. Whenlambda > 0
, the variance is greater than the mean. Whenlambda = 0
, the Generalized Poisson reduces to the standard Poisson. You can read more about the distribution in this blog post.The Generalized Poisson likelihood was very useful for a recent project where we were modeling count data because we wanted that flexibility in dispersion. If you're interested in checking out our work, it's available here.
I defined the Generalized Poisson as a custom distribution in PyMC3 for my project. I will also submit a PR to the
pymc-examples
repo with a notebook outlining how I defined a custom distribution.