-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::rand: substitute an unbiased Exp1 temporarily. #10099
Conversation
The Ziggurat implementation of Exp1 is incorrect: it appears to have an expected value of around 0.994 rather than 1 and the density of a large number of sample means does not match the normal distribution expected by the central limit theorem (specifically, it's a normal distribution centred at 0.994). This replaces Ziggurat with a "plain" inverse-cdf implementation for now. This implementation has the expected properties, although it is approximately 4 times slower. cc rust-lang#10084 (this does *not* fix it).
So if I understand this correctly, using the ziggurat tables leads to the wrong distribution, but this slower implementation of calculating the distribution is more correct, but less fast. The idea though is to in the future use the ziggurat tables for speed, but you're investigating what's going wrong with the distribution that we're seeing. If that's the case, I may be more inclined to hold off ignoring ziggurat for a bit. It seems kinda odd to keep around all those tables when they're possibly incorrect and nothing is using them. If the bug with the tables can't be found soon, I'd be more in favor of removing them entirely and then bringing them back when the bugs have been found. It would also be nice to have unit tests for this kind of thing, but I suppose that it's difficult to test for seeing how you're taking a billion samples. Also, do you know what's up with that bump in the black line compared to the red line? Is that also something that we should be worried about? I found it odd that it was also a part of the previous graph (using the ziggurat tables)... |
Your understanding is correct. FWIW, I checked the Normal (also using Ziggurat) and plain
There's only 6 kB of tables (3 of them with 257
Appears to just be an artefact of the randomness, running a few more samples all look generally similar to:
It's not just the huge number of samples, since one can just run the tests with a smaller sample size: these are inherently random, so every reasonable test would have a non-zero chance of failure. Of course, if one is doing significance-level tests, one can choose the p-value such that the chance of failure is very small (when the implementation is correct), however this is at the cost of having less power to detect an incorrect implementation. Compensating for this by increasing the sample size makes the tests very slow. I'm planning to add Gamma (I have an implementation that is essentially blocked on this) and derived distributions next, and to land that I'll write either a pile of probabilistic tests in a separate repo (on Rust CI, so that failures don't block other PRs) or as a run-by-hand file in |
Just wanted to say, excellent analysis of this problem! In the meantime though, let's leave this open for awhile and see if you managed to uncover the underlying problem (no rush at all of course). If it does take longer than expected, we can probably merge this. |
I think I've tracked the actual bug in the Ziggurat implementation, will submit a PR doing that correction when I've checked it properly. |
Null fn lints Adds lints to check for code, that assumes nullable `fn()`. ### Lint examples: `transmute_null_to_fn`: ```rust error: transmuting a known null pointer into a function pointer --> $DIR/transmute_null_to_fn.rs:9:23 | LL | let _: fn() = std::mem::transmute(std::ptr::null::<()>()); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this transmute results in undefined behavior | = help: try wrapping your function pointer type in `Option<T>` instead, and using `None` as a null pointer value ``` `fn_null_check`: ```rust error: function pointer assumed to be nullable, even though it isn't --> $DIR/fn_null_check.rs:13:8 | LL | if (fn_ptr as *mut ()).is_null() {} | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: try wrapping your function pointer type in `Option<T>` instead, and using `is_none` to check for null pointer value ``` Closes rust-lang#1644 --- changelog: Improvement: [`transmuting_null`]: Now detects `const` pointers to all types [rust-lang#10099](rust-lang/rust-clippy#10099) changelog: New lint: [`transmute_null_to_fn`] [rust-lang#10099](rust-lang/rust-clippy#10099) changelog: New lint: [`fn_null_check`] [rust-lang#10099](rust-lang/rust-clippy#10099) <!-- changelog_checked (This is just a flag for me, please don't add it manually) -->
The Ziggurat implementation of Exp1 is incorrect: it appears to have an
expected value of around 0.994 rather than 1 and the density of a large
number of sample means does not match the normal distribution expected
by the central limit theorem (specifically, it's a normal distribution
centred at 0.994).
This replaces Ziggurat with a "plain" inverse-cdf implementation for
now. This implementation has the expected properties, although it is
approximately 4 times slower (56 seconds rather than 16 to generate
1 billion samples).
cc #10084 (this does not fix it).
Re-plot of the density estimate of 10000 sample means where each mean is over 100000 samples:
Clearly it matches far closer than the old implementation (cf. the plot in #10084, the scales are approximately the same).
Note: I'm going to investigate & fix the problem with the Ziggurat implementation, this is just temporary so that the implementation is correct.