-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multinomial sampling should rely on binomial #287
base: master
Are you sure you want to change the base?
multinomial sampling should rely on binomial #287
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #287 +/- ##
==========================================
- Coverage 93.73% 93.08% -0.66%
==========================================
Files 53 53
Lines 12010 12196 +186
==========================================
+ Hits 11258 11353 +95
- Misses 752 843 +91 ☔ View full report in Codecov by Sentry. |
perhaps I need test statistics to assess if the samples are likely from the distribution... |
Hm... the static types (i.e. SMatrix, SVector) have more type safety and so I generally agree that the dynamically sized types need more testing. But I'm not sure about just testing the dynamic types; we might miss problems with specific nalgebra code paths or even incorrect trait bounds that would exclude the static types from being used in the Looking at #165, this would also exclude basically all multivariate tests from Not 100% sure the existing boiler macro can cover everything... the |
I agree we should be testing our API size generics and that tests should work on no_std if we have it. I'll revert that. Do you have any other ideas to verify that samples are correctly generated? |
Not really, to be honest. Except for very rudimentary testing (samples are in the correct domain/range of values), you'd probably need to implement proper statistical test methods.. |
I'll rely on other implementations of statistical tests and rv sampling to verify my implementation for now. Once we've got tests and verification for them, then I'll consider bootstrapping features. |
a0fd736
to
82501a1
Compare
lint: fix lints
82501a1
to
c40567c
Compare
I'm still unsure if I'm sampling incorrectly or am using bad verification technique, should rely on a common GoF test instead of my homebrew statistics
c40567c
to
7b4180c
Compare
also adds a comment demonstrating success of unwrap
6b20e22
to
66b8d7e
Compare
Tests use both static and dynamic allocators, but aren't broken out into separate tests, but they aren't interleaved into the test suite, so shouldn't be a big lift to conditionally compile them for a |
Certainly not a thorough analysis, but it's safe to say the implementation is not likely correct. >>> stats.cramervonmises_2samp(scipy_mnom_gen(), data) # scipy vs data
CramerVonMisesResult(statistic=[4.46216824 4.365037 4.57871893 4.39759045 4.64458355], pvalue=[4.92114127e-11 7.65537633e-11 3.194799
98e-11 6.57013333e-11
2.73471246e-11])
>>> stats.cramervonmises_2samp(scipy_mnom_gen(), scipy_mnom_gen()) # scipy vs scipy
CramerVonMisesResult(statistic=[0.1209659 0.05345861 0.15851277 0.10590302 0.13809556], pvalue=[0.49155904 0.8550833 0.36467217 0.556
74849 0.42800996]) tested with |
addresses #183, but does not really fix it since binomial is naive sampling.
if we rely on binomial sampling then we can optimize once. This also updates the testing_boiler to work with Multinomial, notice how the tests are written to all use dynamically sized vectors and to rely on norms for approx absolute equality.
There is also a stoachastic test added, but I'm doubtful that I'm correct about both the test and sampling because it doesn't pass often enough. (Perhaps my choice of parameters in the test is not convergent enough to a normal distribution?) But a second pair of eyes on this would be helpful.