-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HyperGeometric Distribution to pymc3.distributions.discrete #4108 #4249
Add HyperGeometric Distribution to pymc3.distributions.discrete #4108 #4249
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4249 +/- ##
=======================================
Coverage 87.59% 87.60%
=======================================
Files 88 88
Lines 14316 14341 +25
=======================================
+ Hits 12540 12563 +23
- Misses 1776 1778 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Harivallabha
Pre-commit failures can be fixed with
pre-commit run --files pymc3/distributions/__init__.py pymc3/distributions/discrete.py # list other modified files here
, or by enabling pre-commit locally:
pre-commit install
so the checks run automatically when you commit
@@ -737,6 +742,9 @@ def ref_rand(size, alpha, mu): | |||
def test_geometric(self): | |||
pymc3_random_discrete(pm.Geometric, {"p": Unit}, size=500, fails=50, ref_rand=nr.geometric) | |||
|
|||
def test_hypergeometric(self): | |||
pymc3_random_discrete(pm.HyperGeometric, {"N": Nat, "n": Nat, "k": Nat}, size=500, fails=50, ref_rand=nr.hypergeometric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.random.hypergeometric
takes arguments named differently than N
, n
, and k
- you may need to define a helper ref_rand
function here, as is done above in test_negative_binomial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pymc3/distributions/discrete.py
Outdated
- betaln(n - value + 1, bad - n + value + 1) | ||
- betaln(tot + 1, 1) | ||
) | ||
lower = tt.switch(tt.gt(n - N + k, 0), n - N + k, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit of a theano noob, but would
lower = tt.switch(tt.gt(n - N + k, 0), n - N + k, 0) | |
lower = tt.clip(n - N + k, 0, n - N + k) |
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that diverges from the value returned by scipy.stats.hypergeom; However removing the bound entirely, and just returning the result seems to match
fc8351c
to
9882ff6
Compare
@Harivallabha great to see this moving forward; if I understand the code correctly we're almost there, right? Any thoughts on the failing tests? (So far this PR is still listed to go into the 3.10 release, which I hope we'll be able to make soon. If you think it's better to wait, then let me know of course.) |
I think we can leave this as option for 3.10 but not block on it. |
Just for reference: the CI has been flaky recently (seems fixed now though!), but the error here seems related to the changes: ________________ TestScalarParameterSamples.test_hypergeometric ________________
[gw0] linux -- Python 3.6.11 /usr/share/miniconda/envs/testenv/bin/python
self = <pymc3.tests.test_distributions_random.TestScalarParameterSamples object at 0x7f701ddb5cf8>
def test_hypergeometric(self):
def ref_rand(size, N, k, n):
return st.hypergeom.rvs(M=N, n=k, N=n, size=size)
pymc3_random_discrete(
pm.HyperGeometric,
{"N": Nat, "k": Nat, "n": Nat},
size=500,
fails=100,
> ref_rand=ref_rand,
)
pymc3/tests/test_distributions_random.py:754:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dist = <class 'pymc3.distributions.discrete.HyperGeometric'>
paramdomains = {'N': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>, 'k': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>, 'n': <pymc3.tests.test_distributions.Domain object at 0x7f708b011e10>}
valuedomain = <pymc3.tests.test_distributions.Domain object at 0x7f708acfbe10>
ref_rand = <function TestScalarParameterSamples.test_hypergeometric.<locals>.ref_rand at 0x7f701ddf3e18>
size = 500, alpha = 0.05, fails = 100
def pymc3_random_discrete(
dist, paramdomains, valuedomain=Domain([0]), ref_rand=None, size=100000, alpha=0.05, fails=20
):
model = build_model(dist, valuedomain, paramdomains)
domains = paramdomains.copy()
for pt in product(domains, n_samples=100):
pt = pm.Point(pt, model=model)
p = alpha
# Allow Chisq test to fail (i.e., the samples be different)
# a certain number of times.
f = fails
while p <= alpha and f > 0:
o = model.named_vars["value"].random(size=size, point=pt)
e = ref_rand(size=size, **pt)
o = np.atleast_1d(o).flatten()
e = np.atleast_1d(e).flatten()
observed = dict(zip(*np.unique(o, return_counts=True)))
expected = dict(zip(*np.unique(e, return_counts=True)))
for e in expected.keys():
expected[e] = (observed.get(e, 0), expected[e])
k = np.array([v for v in expected.values()])
if np.all(k[:, 0] == k[:, 1]):
p = 1.0
else:
_, p = st.chisquare(k[:, 0], k[:, 1])
f -= 1
> assert p > alpha, str(pt)
E AssertionError: {'N': array(1), 'k': array(1), 'n': array(1)}
E assert nan > 0.05
pymc3/tests/test_distributions_random.py:111: AssertionError |
…discrete diverges from expected value
…st_distributions_random.py
Yes, please don't block on this. So the The reason for the failing random tests is that When I handle this, I keep getting the AssertionError (nan > 0.05) in |
9882ff6
to
6cd35fd
Compare
pymc3_random_discrete( | ||
pm.HyperGeometric, | ||
{"N": Nat, "k": Nat, "n": Nat}, | ||
size=500, | ||
fails=50, | ||
ref_rand=ref_rand, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pymc3_random_discrete( | |
pm.HyperGeometric, | |
{"N": Nat, "k": Nat, "n": Nat}, | |
size=500, | |
fails=50, | |
ref_rand=ref_rand, | |
) | |
pymc3_random_discrete( | |
pm.HyperGeometric, | |
{"N": Domain([10, 11, 12, 13], "int64"), | |
"k": Domain([4, 5, 6, 7], "int64"), | |
"n": Domain([6, 7, 8, 9], "int64")}, | |
size=500, | |
fails=50, | |
ref_rand=ref_rand, | |
) |
You can create your own domain like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should avoid the error that you are facing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying domain fixed it. Wonderful, thanks!
pymc3/distributions/discrete.py
Outdated
|
||
======== ============================= | ||
|
||
Support :math:`x in [max(0, n - \mathbb{N} + k), min(k, n)]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support :math:`x in [max(0, n - \mathbb{N} + k), min(k, n)]` | |
Support :math:`x \in \left[\max(0, n - N + k), \min(k, n)\right]` |
Nitpick for proper latex rendering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very small final nitpick. Otherwise, it looks good to me!
Remove stray newline Co-authored-by: Tirth Patel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Thanks @Harivallabha and @tirthasheshpatel! 🎉 (Edit: and @MarcoGorelli of course! :) ) |
Great job everyone! |
Creating a separate PR and linking it to #4108 because I had a hard time rebasing the old commits on top of master, too many conflicts. So rather than rebase, I just pulled latest master and again added these commits on top of it.
Included suggestions by @tirthasheshpatel. Thank you! :D
Removed
_repr_latex_
implementations.@Spaak