multinomial sampling should rely on binomial #287

YeungOnion · 2024-09-11T19:00:56Z

addresses #183, but does not really fix it since binomial is naive sampling.

if we rely on binomial sampling then we can optimize once. This also updates the testing_boiler to work with Multinomial, notice how the tests are written to all use dynamically sized vectors and to rely on norms for approx absolute equality.

There is also a stoachastic test added, but I'm doubtful that I'm correct about both the test and sampling because it doesn't pass often enough. (Perhaps my choice of parameters in the test is not convergent enough to a normal distribution?) But a second pair of eyes on this would be helpful.

codecov · 2024-09-11T19:03:40Z

Codecov Report

Attention: Patch coverage is 57.93103% with 122 lines in your changes missing coverage. Please review.

Project coverage is 93.08%. Comparing base (993a4b5) to head (66b8d7e).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/distribution/multinomial.rs	55.14%	122 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
- Coverage   93.73%   93.08%   -0.66%     
==========================================
  Files          53       53              
  Lines       12010    12196     +186     
==========================================
+ Hits        11258    11353      +95     
- Misses        752      843      +91

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

YeungOnion · 2024-09-12T20:23:47Z

perhaps I need test statistics to assess if the samples are likely from the distribution...

src/distribution/multinomial.rs

FreezyLemon · 2024-09-13T14:59:32Z

notice how the tests are written to all use dynamically sized vectors

Hm... the static types (i.e. SMatrix, SVector) have more type safety and so I generally agree that the dynamically sized types need more testing. But I'm not sure about just testing the dynamic types; we might miss problems with specific nalgebra code paths or even incorrect trait bounds that would exclude the static types from being used in the statrs API altogether.

Looking at #165, this would also exclude basically all multivariate tests from no_std environments without an allocator.

Not 100% sure the existing boiler macro can cover everything... the nalgebra-based distributions are a fair bit different after all. But I think that e.g. new_from_nalgebra should be preferred over just new since it works for both types of Matrix/vector.

YeungOnion · 2024-09-17T00:17:30Z

I agree we should be testing our API size generics and that tests should work on no_std if we have it. I'll revert that.

Do you have any other ideas to verify that samples are correctly generated?

FreezyLemon · 2024-09-22T08:03:56Z

Do you have any other ideas to verify that samples are correctly generated?

Not really, to be honest. Except for very rudimentary testing (samples are in the correct domain/range of values), you'd probably need to implement proper statistical test methods..

YeungOnion · 2024-09-22T22:59:00Z

I'll rely on other implementations of statistical tests and rv sampling to verify my implementation for now. Once we've got tests and verification for them, then I'll consider bootstrapping features.

lint: fix lints

I'm still unsure if I'm sampling incorrectly or am using bad verification technique, should rely on a common GoF test instead of my homebrew statistics

also adds a comment demonstrating success of unwrap

YeungOnion · 2024-09-24T20:46:56Z

Tests use both static and dynamic allocators, but aren't broken out into separate tests, but they aren't interleaved into the test suite, so shouldn't be a big lift to conditionally compile them for a no_std option.

YeungOnion · 2024-09-24T22:13:13Z

Certainly not a thorough analysis, but it's safe to say the implementation is not likely correct.

>>> stats.cramervonmises_2samp(scipy_mnom_gen(), data) # scipy vs data
CramerVonMisesResult(statistic=[4.46216824 4.365037   4.57871893 4.39759045 4.64458355], pvalue=[4.92114127e-11 7.65537633e-11 3.194799
98e-11 6.57013333e-11
 2.73471246e-11])
>>> stats.cramervonmises_2samp(scipy_mnom_gen(), scipy_mnom_gen()) # scipy vs scipy
CramerVonMisesResult(statistic=[0.1209659  0.05345861 0.15851277 0.10590302 0.13809556], pvalue=[0.49155904 0.8550833  0.36467217 0.556
74849 0.42800996])

tested with n=1000, p=5*[0.2] where both scipy and statrs generated 10k samples

YeungOnion mentioned this pull request Sep 12, 2024

tests to verify correctness of sampling #288

Open

FreezyLemon reviewed Sep 13, 2024

View reviewed changes

src/distribution/multinomial.rs Outdated Show resolved Hide resolved

YeungOnion added this to the 0.18 milestone Sep 18, 2024

YeungOnion force-pushed the perf/multinomial-sampling branch from a0fd736 to 82501a1 Compare September 23, 2024 17:42

FreezyLemon mentioned this pull request Sep 23, 2024

refactor: Categorical now stores normalized values for cdf and sf #301

Open

refactor(perf): Multinomial samples from Binomial

d32bd1a

lint: fix lints

YeungOnion force-pushed the perf/multinomial-sampling branch from 82501a1 to c40567c Compare September 24, 2024 04:11

YeungOnion added 3 commits September 23, 2024 23:30

test: mirror testing boiler where possible for multinomial testing

0ef7b10

fix: correct sampling for multinomial, rely on Binomial

b159ac0

test: stochastic test for multinomial - need another pair of eyes

7b4180c

I'm still unsure if I'm sampling incorrectly or am using bad verification technique, should rely on a common GoF test instead of my homebrew statistics

YeungOnion force-pushed the perf/multinomial-sampling branch from c40567c to 7b4180c Compare September 24, 2024 04:30

fix: remove unneeded trait bounds

66b8d7e

also adds a comment demonstrating success of unwrap

YeungOnion force-pushed the perf/multinomial-sampling branch from 6b20e22 to 66b8d7e Compare September 24, 2024 17:53

YeungOnion mentioned this pull request Sep 24, 2024

cli example usage #302

Open

YeungOnion modified the milestones: 0.18, Future, 0.19 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multinomial sampling should rely on binomial #287

multinomial sampling should rely on binomial #287

YeungOnion commented Sep 11, 2024

codecov bot commented Sep 11, 2024 •

edited

Loading

YeungOnion commented Sep 12, 2024

FreezyLemon commented Sep 13, 2024

YeungOnion commented Sep 17, 2024

FreezyLemon commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

YeungOnion commented Sep 24, 2024

YeungOnion commented Sep 24, 2024

multinomial sampling should rely on binomial #287

Are you sure you want to change the base?

multinomial sampling should rely on binomial #287

Conversation

YeungOnion commented Sep 11, 2024

codecov bot commented Sep 11, 2024 • edited Loading

Codecov Report

YeungOnion commented Sep 12, 2024

FreezyLemon commented Sep 13, 2024

YeungOnion commented Sep 17, 2024

FreezyLemon commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

YeungOnion commented Sep 24, 2024

YeungOnion commented Sep 24, 2024

codecov bot commented Sep 11, 2024 •

edited

Loading