Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising benchmark numbers #51

Open
ralfbiedert opened this issue Jul 16, 2018 · 2 comments
Open

Surprising benchmark numbers #51

ralfbiedert opened this issue Jul 16, 2018 · 2 comments

Comments

@ralfbiedert
Copy link
Contributor

ralfbiedert commented Jul 16, 2018

While working on #47 I noticed what looks like performance regressions in the cargo bench, in particular functions like map_simd and map_scalar, but quite a few others.

test tests::map_scalar                                ... bench:       2,022 ns/iter (+/- 264)
test tests::map_simd                                  ... bench:       6,898 ns/iter (+/- 392)

However, comparing #49 to the commit before the refactoring, the numbers are mostly unchanged.

I then assumed it's related to unfortunate default feature flags on my machine, but playing with avx2 and sse4.1 didn't have any effect either. I also have a first implementation of #48, and it actually looks like no fallbacks are emitted for map_simd. (Tried to cross check that with radare2, but have some problems locating the right symbol / disassembly for the benchmarks). Lastly, the functions map_scalar and map_simd differ a bit, but even when I make them equal (e.g., sqrt vs. rsqrt) the difference remains.

  • Is that a "known issue"?
  • Did rustc became so good in auto-vectorization?
  • Any suggestions how to extract the disassembly from tests::map_simd and tests::map_scalar?

Running on rustc 1.29.0-nightly (9fd3d7899 2018-07-07), MBP 2015, i7-5557U.

Update: I linked the latest faster version from my SVM library and I don't see these problems in 'production':

csvm_predict_sv1024_attr1024_problems1 ... bench:     232,109 ns/iter (+/- 20,808) [faster AVX2]
csvm_predict_sv1024_attr1024_problems1 ... bench:     942,925 ns/iter (+/- 64,156) [scalar]

Update 2 Seems to be related to some intrinsics. When I dissect the benchmark, I get

test tests::map_scalar                                ... bench:         558 ns/iter (+/- 55) [without .abs()]
test tests::map_scalar                                ... bench:         556 ns/iter (+/- 33) [with .abs()]
test tests::map_simd                                  ... bench:         144 ns/iter (+/- 17) [without .abs()]
test tests::map_simd                                  ... bench:         883 ns/iter (+/- 64) [with .abs()]

I now think that each intrinsic should have its own benchmark, e.g. intrinsic_abs_scalar, intrinsic_abs_simd, ...

Update 3 ... oh boy. I think that by "arcane magic" Rust imports and prefers std::simd::f32x4 and friends over the faster types and methods.

So when you do my_f32s.abs(), it calls std::simd::f32x4::abs, not faster::arch::current::intrin::abs.

The reason I think that's the problem is you can now easily do my_f32s.sqrte(), which isn't implemented in faster, but in std::simd.

What's more annoying is that it doesn't warn about any collision, and that std::simd is actually slower than "vanilla" Rust.

TODO:

  • Investigate import tree why that happens
  • Clean up imports if import problem
  • Have single-intrinsic benchmarks to detect bad intrinsics
  • Have Rust warn somehow if similar name conflict happens again?
  • Remove all usages of #![feature(stdsimd)] except in lib.rs

Update 4 Now one more thing makes sense ... I sometimes got use of unstable library feature 'stdsimd' in test cases and I didn't understand why. Probably because that's where the std::simd built-ins were used.

@ralfbiedert
Copy link
Contributor Author

Unless I'm mistaken this doesn't look like an easy fix: std::simd types such as float32x4 seem to impl functions like abs() directly. That means we can't easily re-implement them in traits, as by default the std::simd variant will be picked.

The solutions I see:

  • Use newtype pattern in vektor. Upside: Should fix all these issues. Downside: Lots of work. Might introduce other issues.
  • Remove faster::abs and similar functions provided by std::simd. Upside: Cleaner code, let std::simd do the work. Downside: Currently apparently much slower than faster.
  • File std::simd issue and request abs and similar should reside in trait, similar to faster. Upside: Allows custom implementation and composition. Downside: if std::simd::abs actually works as intended might not be needed.

Will put this on hold until further discussed (but will PR some current improvements made in process).

@AdamNiederer
Copy link
Owner

Hm, I do want faster to seamlessly integrate with std::simd (as they have much more manpower than I do), but I can definitely see these name conflicts becoming a long-term problem as std::simd adds features - especially because I likely won't know about them until rustc starts warning me.

A good argument for moving away from those types would be that we could implement std::simd's traits on our types, but that'd be a manual process and I'd likely have more work to do to get it compiling on stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants