Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace zero inflated distributions with mixtures #5584

Merged

Conversation

ricardoV94
Copy link
Member

Required the following extra changes:

  • Implement logcdf for Mixtures
  • Allow Constant distribution to have a varying dtype

@codecov
Copy link

codecov bot commented Mar 11, 2022

Codecov Report

Merging #5584 (68bb77a) into main (86f8d98) will decrease coverage by 0.04%.
The diff coverage is 96.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5584      +/-   ##
==========================================
- Coverage   87.64%   87.59%   -0.05%     
==========================================
  Files          76       76              
  Lines       13775    13732      -43     
==========================================
- Hits        12073    12029      -44     
- Misses       1702     1703       +1     
Impacted Files Coverage Δ
pymc/distributions/mixture.py 93.14% <91.30%> (-0.03%) ⬇️
pymc/distributions/discrete.py 99.72% <100.00%> (-0.04%) ⬇️

@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch from 9771af3 to 6fc28bc Compare March 11, 2022 18:50
@ricardoV94 ricardoV94 marked this pull request as draft March 11, 2022 21:23
@twiecki
Copy link
Member

twiecki commented Mar 12, 2022

Nice refactor.

@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch 3 times, most recently from 3479b29 to ba6b9a0 Compare March 13, 2022 14:51
@ricardoV94 ricardoV94 marked this pull request as ready for review March 13, 2022 16:35
@ricardoV94 ricardoV94 requested a review from twiecki March 13, 2022 16:35
Copy link
Member

@larryshamalama larryshamalama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great! Left some questions so I can better understand myself 😅 (and maybe for others reading this PR!)

An overarching question that I have is what benefits do we get from positing a Mixture-type rv_op for zero-inflated distributions? My understanding was that Mixture distributions inherit from SymbolicDistribution as a way to initialize a different rv_op depending on how many and what comp_dists are provided. However, for zero-inflated distributions, there will always be "two" components: the zero Constant distribution and Poisson/binomial/negative binomial. Perhaps the answer is simple but I was just wondering...

pymc/distributions/discrete.py Show resolved Hide resolved
weights = at.stack([1 - nonzero_p, nonzero_p], axis=-1)
comp_dists = [
Constant.dist(0),
nonzero_dist,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does nonzero_dist here have to be Discrete?

Copy link
Member Author

@ricardoV94 ricardoV94 Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your question might be related to #4511

Still need to make sure we are not combining discrete and continuous distributions in mixtures. In the cases here they are both supposed to be discrete yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have another thing in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I had in mind! I was wondering if it would help to have a check that nonzero_dist is discrete, but, as you mentioned above, it may not be necessary since _zero_inflated_mixture is not a user-facing function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I want to add an explicit check in the Mixture class in a follow up PR

pymc/distributions/discrete.py Show resolved Hide resolved
@ricardoV94
Copy link
Member Author

ricardoV94 commented Mar 14, 2022

An overarching question that I have is what benefits do we get from positing a Mixture-type rv_op for zero-inflated distributions

The biggest advantage is less code for us to maintain. As Thomas hinted above this is just a refactor.

Reusing a building block also means any bug fixes there propagate everywhere quickly. For instance, the old RandomVariable Ops actually had a slight bug in that they did not broadcast psi with the other parameters when size was not specified. So pm.ZIP.dist(0.5, [200, 300]) Could never draw mixed zeros and non zeros, even though it should have.

Instead of having to fix this everywhere, it's now enough to do it (and test it) in one place as I did here: 78f1786

@twiecki
Copy link
Member

twiecki commented Mar 14, 2022

What about the integrated versions? I guess we get those automatically once it's added to aeppl?

@ricardoV94
Copy link
Member Author

What about the integrated versions? I guess we get those automatically once it's added to aeppl?

What do you mean integrated?

@twiecki
Copy link
Member

twiecki commented Mar 14, 2022

What do you mean integrated?

Upon thinking about it I don't think it makes sense here.

@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch from ba6b9a0 to 650a217 Compare March 14, 2022 13:37
@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch 3 times, most recently from c71a919 to a6e6748 Compare March 14, 2022 14:12
@ricardoV94 ricardoV94 requested a review from twiecki March 14, 2022 14:52
@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch 2 times, most recently from 0fbcd1d to 3917d88 Compare March 14, 2022 16:05
RELEASE-NOTES.md Outdated Show resolved Hide resolved
For consistency with the base Poisson distribution
@ricardoV94 ricardoV94 force-pushed the replace_zero_inflated_dists_with_mixtures branch from 3917d88 to 68bb77a Compare March 14, 2022 16:59
@larryshamalama
Copy link
Member

The biggest advantage is less code for us to maintain. As Thomas hinted above this is just a refactor.

Reusing a building block also means any bug fixes there propagate everywhere quickly. For instance, the old RandomVariable Ops actually had a slight bug in that they did not broadcast psi with the other parameters when size was not specified. So pm.ZIP.dist(0.5, [200, 300]) Could never draw mixed zeros and non zeros, even though it should have.

Instead of having to fix this everywhere, it's now enough to do it (and test it) in one place as I did here: 78f1786

Okay I see, thanks for the clarification! Makes a lot of sense

@ricardoV94 ricardoV94 merged commit 7cc570a into pymc-devs:main Mar 14, 2022
@ricardoV94 ricardoV94 deleted the replace_zero_inflated_dists_with_mixtures branch June 6, 2023 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants