-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimizations #13
Comments
This is probably because the algorithm only generates
I ran a benchmark and a hypothetical full-range API has the exact same performance as calling the function with a |
If I understand you correctly, this means that generating u16 is faster than usize (maybe two times?), but as fast as generating u32. I could get twice as fast by splitting the u32 into two u16. If this is true, this also means that using |
Yes, I think that's all correct with the current implementation. It is two times faster to generate a You don't need to transmute at all in your iterator. On beta and nightly (and stable in 12 days) you can write: repeat_with(|| rng.u32(..))
.flat_map(|v| array::IntoIter::new(v.to_ne_bytes()))
.take(10_000)
.collect() or on stable until then: repeat_with(|| rng.u32(..).to_ne_bytes())
.flat_map(|bytes| (0..4).map(move |i| bytes[i]))
.take(10_000)
.collect() |
One would have to check if the iterator implementations compile down to a transmute or something similarly fast. Also, maybe it's worth adding a convenience method for bulk-filling random data array? (Not sure about how the API should look like though.) Wasting half (or more) of the generated random values is something I think should easily be avoidable by storing the unused random bytes for future usage instead of throwing them away. Using a separate store for u8, u16 (and later on, u32) should make the branches easily predictable. Does this crate make any guarantees about the values that a certain seed returns across versions? |
I don't really think this is worth it. Especially if #14 gets merged the actual generation becomes really cheap, and caching values would require both adding branches and increasing the size of the RNG. Also, I don't know of any other RNG implementations that do this.
It's not explicitly specified anywhere, but I would assume not, otherwise there would never be any possibility to change the RNG algorithm. |
The new Rng algorithm is indeed faster, but not by orders of magnitude (about 15% maybe?). Interestingly, generating usize is still slower than u8/u16 for some reason. |
I have a theory the performance discrepancy might be due to way the code is written not the library. There is a branch in the middle of what looks like a hot loop and this can slow things down surprisingly. If there was only one code path the CPU can work ahead on computing the random index. With the if statement the CPU has to predict which path it will take to stay at full speed. If it predicts wrong it will slow down. Only generating usizes would likely help with performance also considering WyRand generates u64s internally. |
It seems https://github.com/wangyi-fudan/wyhash wyhash has evolved into https://github.com/Nicoshev/rapidhash could that be used to improve perf of this library further |
Randomly consistently shows up as a major bottleneck in my application. My basic problem is that I want to retrieve a lot of random elements of a Vec. I noticed a few things and would like to discuss them:
index = if len < 65536 { rng.u16(0..len) } else { rng.usize(0..len) }
and it did seem to help with the performance.The text was updated successfully, but these errors were encountered: