-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should SIMD_LEN be configurable? #11
Comments
AVX512 is 512bit, shouldn't it be different depending on what the data type length is? |
This is a great point! I've tried You can try it out here -- just modify the I have a 7950x (which has great If needed, there's probably several ways to choose |
probably, not sure |
Interesting. It seems to also generate avx-512 when you up the datatype size. When I up it to u32's it also generates avx-512 with smaller SIMD_LEN. I originally picked the 32 based on benchmarks on my machine (5900x no avx-512). |
I have a very weak machine that overheats and doesn't produce reproducible results, would you mind testing if the data type size matters for the SIMD_LEN variable too? |
@PaulDotSH Sure, no problem! |
Yea, I figured. I saw you mentioned Now that I think of it... perhaps we just need to benchmark. It may be safe to just use 64 in all cases, but the proof is in the pudding. I'll run the benchmarks. |
+1 for letting users set the constant themselves. The constant choice is probably going to be a function of the bit-width for the data type you're operating on, the target's vector register width, and LLVM's heuristics for code generation among other things, rather than a general constant. Users will likely want to use different constants whether they're targeting baseline SSE2 vs AVX2 vs AVX-512. This will make it easier to benchmark and tailor for individual use cases as desired codegen can be fiddly. LLVM seems to pessimize performance when the constant is too large for the target's max vector width for x64. Playing with From my experience with targeting the Rust baseline (SSE2), the rule of thumb for chunk size tends to be 256-bits or 2x the 128-bit SSE SIMD register size as shown in the table below. Beyond these sizes, the performance tends to degrade.
I've seen similar behavior for AVX2 with chunk size targeting some multiple of YMM register width before it gets suboptimally unrolled, but I have no experience with AVX-512. Of course, these numbers should be verified with measurement and depend on the corresponding vector instruction existing for the data type size. The practical length choices are probably going to lean more towards 32 and 64 than 8 or 16 but it's worth being able to reach for that when needed. |
Now that we have multiversioning and know the target max width and the type width maybe we could just pick a good len for the user? I found: https://docs.rs/target-features/latest/target_features/struct.Target.html#method.suggested_simd_width But I'm not quite sure how to make it work with const BEST_LEN: usize = selected_target!().suggested_simd_width::<T>().unwrap();
let (prefix, simd_data, suffix) = arr.as_simd::<BEST_LEN>(); Now I can get it to work with chunks_exact using let instead: let best_len = selected_target!().suggested_simd_width::<T>().unwrap();
for chunk in arr.chunks_exact(best_len){
...
} But the performance of as_simd is significantly better so I don't really want to with this. |
We could have a function that can take a const from the user, and one that doesn't and is a simple wrapper with |
Currently SIMD_LEN is set to 32 for everything. Should we just let users pass it for every method? Something like this:
The text was updated successfully, but these errors were encountered: