Surprising benchmark numbers #51

ralfbiedert · 2018-07-16T19:45:03Z

While working on #47 I noticed what looks like performance regressions in the cargo bench, in particular functions like map_simd and map_scalar, but quite a few others.

test tests::map_scalar                                ... bench:       2,022 ns/iter (+/- 264)
test tests::map_simd                                  ... bench:       6,898 ns/iter (+/- 392)

However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged.

I then assumed it's related to unfortunate default feature flags on my machine, but playing with avx2 and sse4.1 didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted for map_simd. (Tried to cross check that with radare2, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functions map_scalar and map_simd differ a bit, but even when I make them equal (e.g., sqrt vs. rsqrt) the difference remains.

Is that a "known issue"?
Did rustc became so good in auto-vectorization?
Any suggestions how to extract the disassembly from tests::map_simd and tests::map_scalar?

Running on rustc 1.29.0-nightly (9fd3d7899 2018-07-07), MBP 2015, i7-5557U.

Update: I linked the latest faster version from my SVM library and I don't see these problems in 'production':

csvm_predict_sv1024_attr1024_problems1 ... bench:     232,109 ns/iter (+/- 20,808) [faster AVX2]
csvm_predict_sv1024_attr1024_problems1 ... bench:     942,925 ns/iter (+/- 64,156) [scalar]

Update 2 Seems to be related to some intrinsics. When I dissect the benchmark, I get

test tests::map_scalar                                ... bench:         558 ns/iter (+/- 55) [without .abs()]
test tests::map_scalar                                ... bench:         556 ns/iter (+/- 33) [with .abs()]
test tests::map_simd                                  ... bench:         144 ns/iter (+/- 17) [without .abs()]
test tests::map_simd                                  ... bench:         883 ns/iter (+/- 64) [with .abs()]

I now think that each intrinsic should have its own benchmark, e.g. intrinsic_abs_scalar, intrinsic_abs_simd, ...

Update 3 ... oh boy. I think that by "arcane magic" Rust imports and prefers std::simd::f32x4 and friends over the faster types and methods.

So when you do my_f32s.abs(), it calls std::simd::f32x4::abs, not faster::arch::current::intrin::abs.

The reason I think that's the problem is you can now easily do my_f32s.sqrte(), which isn't implemented in faster, but in std::simd.

What's more annoying is that it doesn't warn about any collision, and that std::simd is actually slower than "vanilla" Rust.

TODO:

Investigate import tree why that happens
Clean up imports if import problem
Have single-intrinsic benchmarks to detect bad intrinsics
Have Rust warn somehow if similar name conflict happens again?
Remove all usages of #![feature(stdsimd)] except in lib.rs

Update 4 Now one more thing makes sense ... I sometimes got use of unstable library feature 'stdsimd' in test cases and I didn't understand why. Probably because that's where the std::simd built-ins were used.

The text was updated successfully, but these errors were encountered:

ralfbiedert · 2018-07-17T09:17:58Z

Unless I'm mistaken this doesn't look like an easy fix: std::simd types such as float32x4 seem to impl functions like abs() directly. That means we can't easily re-implement them in traits, as by default the std::simd variant will be picked.

The solutions I see:

Use newtype pattern in vektor. Upside: Should fix all these issues. Downside: Lots of work. Might introduce other issues.
Remove faster::abs and similar functions provided by std::simd. Upside: Cleaner code, let std::simd do the work. Downside: Currently apparently much slower than faster.
File std::simd issue and request abs and similar should reside in trait, similar to faster. Upside: Allows custom implementation and composition. Downside: if std::simd::abs actually works as intended might not be needed.

Will put this on hold until further discussed (but will PR some current improvements made in process).

AdamNiederer · 2018-07-17T18:38:48Z

Hm, I do want faster to seamlessly integrate with std::simd (as they have much more manpower than I do), but I can definitely see these name conflicts becoming a long-term problem as std::simd adds features - especially because I likely won't know about them until rustc starts warning me.

A good argument for moving away from those types would be that we could implement std::simd's traits on our types, but that'd be a manual process and I'd likely have more work to do to get it compiling on stable.

ralfbiedert mentioned this issue Jul 17, 2018

Intrinsic benchmarks & Performance tracing #52

Merged

ralfbiedert mentioned this issue Jul 18, 2018

Portable math functions don't get inlined rust-lang/stdarch#523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising benchmark numbers #51

Surprising benchmark numbers #51

ralfbiedert commented Jul 16, 2018 •

edited

Loading

ralfbiedert commented Jul 17, 2018

AdamNiederer commented Jul 17, 2018

Surprising benchmark numbers #51

Surprising benchmark numbers #51

Comments

ralfbiedert commented Jul 16, 2018 • edited Loading

ralfbiedert commented Jul 17, 2018

AdamNiederer commented Jul 17, 2018

ralfbiedert commented Jul 16, 2018 •

edited

Loading