Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Kolmogorov–Smirnov test #1504

Merged
merged 27 commits into from
Oct 17, 2024
Merged

Conversation

JamboChen
Copy link
Contributor

@JamboChen JamboChen commented Oct 8, 2024

  • Added a CHANGELOG.md entry

Summary

Enhance Kolmogorov–Smirnov Test Coverage for Various Distributions

Motivation

Improve CDF test coverage

Details

I have implemented Kolmogorov–Smirnov tests for the following distributions:

I've noticed that some samplers may not pass the test with extreme parameters (e.g., (10000, 0.0001)), and I'm unsure if these should be retained in the test set. Additionally, I'm considering whether using a fixed set of seeds combined with the distribution's parameters is beneficial, as it may increase testing time.

@dhardy
Copy link
Member

dhardy commented Oct 8, 2024

@benjamin-lieser (in case you didn't see this already)

I've noticed that some samplers may not pass the test with extreme parameters

For now, it would be helpful if you would comment out each failing parameter set.

Additionally, I'm considering whether using a fixed set of seeds combined with the distribution's parameters is beneficial, as it may increase testing time.

Passing the set of tests with multiple seeds does slightly increase the robustness of those tests, though it may not be an efficient approach. Lets see how long the tests take (so far, it looks like Miri is still the slowest).

@benjamin-lieser
Copy link
Collaborator

Thanks a lot for the additional tests. I will have a look at the failing ones.

I was thinking that the tests are probably better placed in the unit test section for the distributions.

As it is right now, it just takes a new seed for a new parameter set, a constant seed could only improve things if we save the output from the rng, which does not seem worth it.

@JamboChen
Copy link
Contributor Author

I was thinking that the tests are probably better placed in the unit test section for the distributions.

@benjamin-lieser I think it's a good idea. However, the implementation related to Kolmogorov-Smirnov might need to be moved to rand_distr/src/unit_test.rs. If you agree, I'll start the relocation.

But it might be inconvenient to check which distributions haven't implemented the KS test.


let parameters = [
0.1_f32, 1.0, 7.5,
// 1.844E+19, // fail case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case fails because your cdf does not work for values bigger than 1e8. The problem is in the inc_gamma though, it just return 1.0 when lambda > 1e8. Maybe you can find a different implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be related to the implementation of lgamma, but rather that the probability is too small to be stored in f64. In fact, cdf(1e8 as i64, 1e8) can output the result 0.5000265872649796, but cdf((1e8 - 1e5) as i64, 1e8) rounds to 0. I tried using f128, but encountered linker-related issues, and even if it worked, it wouldn't be usable in the stable version.

I will also check other implementations of the CDF to see if there is a possibility of underflow.

Copy link
Collaborator

@benjamin-lieser benjamin-lieser Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdf(2e9 as i64, 1e9) should be 1.0 but it is 0.0

cdf(2e7 as i64, 1e7) is correctly 1.0 because it is under the threshold where inc_gamma stops being correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, I copied the gamma_lr implementation from statrs and used it to calculate the CDF:

println!("{}", cdf(1e9 as i64, 1e9)); // 0.5000077942086617
println!("{}", cdf(2e9 as i64, 1e9)); // 1
println!("{}", cdf((1e9 - 1e4) as i64, 1e9)); // 0.3759229845139799

For reference, here is the calculation in R:

> ppois(1e9, 1e9)
[1] 0.5000084
> ppois(2e9, 1e9)
[1] 1
> ppois(1e9-1e4, 1e9)
[1] 0.3759226

@dhardy
Copy link
Member

dhardy commented Oct 10, 2024

I was thinking that the tests are probably better placed in the unit test section for the distributions.

I'd prefer we keep this separate; it's slow and requires a dev-dependency.

You can use multiple modules. tests/pdf.rs already uses sparkline.rs, but I realise now that it does so wrongly: sparkline gets used both as its own test and as a sub-module of pdf. Simply moving tests/sparkline.rs to tests/sparkline/mod.rs fixes this (or tests/common/sparkline.rs may be better; see integration tests).

@JamboChen
Copy link
Contributor Author

If we have multiple testing methods in the future, we can place the algorithm implementations in tests/common/<test_impl>.rs and then use each test in files like tests/normal.rs. The directory structure might look like this:

rand_distr/
├── src/
└── tests/
    ├── common/
    │   ├── mod.rs
    │   ├── test_impl1.rs
    │   └── test_impl2.rs
    ├── normal.rs
    ├── binomial.rs

This approach allows tests to be distributed across their respective distributions, but it will result in a larger number of files.

@JamboChen
Copy link
Contributor Author

If we run the rand_distr tests in release, the test time decreases from 2m30s to about 50s. While this is still quite long, it seems acceptable. Calculating the CDF for some distributions requires numerical computation functions. Reducing the values of the test parameters could speed up the tests, but it may not be worth it. The longest time is still on PowerPC, which almost doesn't reduce in release mode and still takes about 15 minutes.

@dhardy
Copy link
Member

dhardy commented Oct 16, 2024

Regarding the integrations, you're correct that this is not ideal. Also, testing using -r is not a good idea in general since e.g. unintentional integer-wrapping is not trapped.

My suggestion is that we move sparkline and KS tests under the benches repo which has its own workflow and is only run once per PR. We don't need to run the tests on multiple platforms since we also have value-stability tests on results.

Logically we might want to rename benches to something else (extra?), but benches is just as good as anything short I can think of.

Note: at some point, rand_distr will be moved to a new repo, including all these tests and probably all of benches (to keep them all in a single place).

@JamboChen
Copy link
Contributor Author

My suggestion is that we move sparkline and KS tests under the benches repo which has its own workflow and is only run once per PR.

I think that's a good idea. I can move the KS tests to the benches repo now, which will also prevent the push test from being affected by other PRs. However, for sparkline and directory structure adjustments, it might be better to open a new PR. Otherwise, I may need to modify the details of this PR.

(1.0, 100.0),
];

println!("{}", cdf(5.0, 2.0, 1.5));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a debug print.

By the way there is a dbg! macro for this.

(10.0, 0.1),
(0.1, 10.0),
(15.0, 20.0),
// (1000.0, 0.001), // Fail case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case does not really make sense because of the small k (the expected values is bigger than the largest finite f64), I would use (1000, 0.01) instead which passes.

}

let parameters = [
// 0.01, // Fail case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should also be fine to drop (for now). The distribution has very high Kurtosis, so there will a significant amount of mass in the tails.

(60, 10, 7),
(70, 20, 50),
(100, 50, 10),
// (100, 50, 49), // Fail case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in the hypergeometric sampling, after this was merged this test should be included.


1.0 - gamma_lr(k as f64 + 1.0, lambda)
}
let parameters = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a test over f64? Then 1e9 should also pass.

We have to think about how to test over different floating point, but I would do everything in f64 for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use f64 here because its sample method has both f64 and u64 implementations, and version 1.61.0 cannot specify which implementation to use. This causes the MSRV test to fail during compilation.

However, if I move the KS test to benches, it will only be tested in nightly, allowing it to be performed on f64.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change the definition of test_discrete to replace the impl Trait with a generic

t
}

fn tf(h: f64, a: f64, ah: f64) -> f64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this function? Where did the code come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above code is mentioned in the paper cited in the comment of the owen_t function:

TF(H,A,AH) (H ≥ 0, 0 ≤ A ≤ 1). This in turn selects the appropriate method (T1,…, T6) for computing T(H,A) based on the input values of H and A according to the ranges given in Figure 2.

The paper provides Fortran code at the end, and I just translated it into Rust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could embed this function inside owen_t then. Or it would be neater to move the lot to a new file.

(2.0, 3.5),
(10.0, 1.0),
(100.0, 50.0),
// (10.0, 0.1), // Fail case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a very extreme distribution concentrated around 1.0
I will look into it in more detail later, for now we can drop this case.

@dhardy
Copy link
Member

dhardy commented Oct 17, 2024

However, for sparkline and directory structure adjustments, it might be better to open a new PR. Otherwise, I may need to modify the details of this PR.

@JamboChen yes, ideally the restructure is in a different PR. Since this PR already exists, it may be easiest to merge this first (without any restructuring).

I'll let you resolve @benjamin-lieser's concerns first then give your changes a glance over.

@benjamin-lieser
Copy link
Collaborator

I would also decrease the sample size so the tests run quick enough. This looses us sensitivity, but it should be fine until the release and we can think about a more permanent solution. I imagine some more exhaustive tests over the parameter space, maybe some code coverage tool to make sure we get all the different sampling strategies.

@dhardy
Copy link
Member

dhardy commented Oct 17, 2024

@benjamin-lieser the total run-time isn't too bad here, if tested once per CI run. My inclination is that we should move this to benches or perhaps better a dedicated repo and test that in release mode in its own workflow.

For now, lets just get this merged then move to a new repo and fix the CI jobs.

@JamboChen
Copy link
Contributor Author

JamboChen commented Oct 17, 2024

Here are the runtimes for each test (as a reference for relative speed):

$ cargo test --package rand_distr --test cdf -- -Z unstable-options --report-time

running 22 tests
test hypergeometric ... ok <1.203s>
test binomial ... ok <1.363s>
test geometric ... ok <2.098s>
test poisson ... ok <2.204s>
test zeta ... ok <1.456s>
test zipf ... ok <1.723s>
test uniform ... ok <3.782s>
test skew_normal ... ok <4.268s>
test triangular ... ok <4.900s>
test log_normal ... ok <4.908s>
test exp ... ok <5.031s>
test gumbel ... ok <5.086s>
test beta ... ok <5.370s>
test frechet ... ok <5.432s>
test studend_t ... ok <5.527s>
test normal ... ok <5.546s>
test cauchy ... ok <5.796s>
test weibull ... ok <5.823s>
test gamma ... ok <5.849s>
test pareto ... ok <6.254s>
test fisher_f ... ok <6.407s>
test chi_squared ... ok <6.800s>

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.80s

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (without looking too closely).

There are a couple of disabled tests; I'll note in #357.

rand_distr/tests/cdf.rs Outdated Show resolved Hide resolved
rand_distr/tests/cdf.rs Outdated Show resolved Hide resolved
rand_distr/tests/cdf.rs Outdated Show resolved Hide resolved
t
}

fn tf(h: f64, a: f64, ah: f64) -> f64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could embed this function inside owen_t then. Or it would be neater to move the lot to a new file.

@dhardy dhardy merged commit 1aa9a69 into rust-random:master Oct 17, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants