Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about #42432

pedrocr · 2017-06-04T19:52:31Z

Issue #27731 already tracks the fine work being done to expose SIMD in ways that are explicit to the programmer. If you're able to code in those specific ways big gains can be obtained. However there is something simple can be done before to performance sensitive code that sometimes greatly improves its speed, just tell LLVM to take advantage of those instructions. The speedup from that is free in developer time and can be quite large. I extracted a simple benchmark from one of the computationally expensive functions in rawloader, matrix multiplying camera RGB values to get XYZ:

https://github.com/pedrocr/rustc-math-bench

I programmed the same multiplication over a 100MP image in both C and rust. Here are the results. All values are in ms/megapixel run on a i5-6200U. The runbench script in the repository will compile and run the tests for you with no other interaction.

Compiler	-O3	-O3 -march=native
rustc 1.19.0-nightly (`e0cc22b` 2017-05-31)	11.76	6.92 (-41%)
clang 3.8.0-2ubuntu4	13.31	5.69 (-57%)
gcc 5.4.0 20160609	7.77	4.70 (-40%)

So rust nightly is faster than clang (but that's probably llvm 3.8 vs 4.0) and the reduction in runtime is quite worthwile. The problem with doing this of course is that now the binary is not portable to architectures lower than mine and it's not optimized for archictures above it either.

My suggestion is to allow the developer to do something like #[makefast] fn(...). Anything that gets annotated like that gets compiled multiple times for each of the architecture levels and then at runtime, depending on the machine being used, the highest level gets used. Ideally also patch the call sites on program startup (or use ELF trickery) so the dispatch penalty disappears.

The text was updated successfully, but these errors were encountered:

leonardo-m · 2017-06-04T20:51:50Z

Note "0.0 as f32" is usually written 0f32 or 0.0_f32 or something like that.

pedrocr · 2017-06-04T22:11:21Z

@leonardo-m pushed that fix now. Should probably use that in a bunch of other places as well in my code

pedrocr · 2017-06-04T22:20:30Z

Searched around and this is how the GNU toolchain does runtime dispatching of different implementations of functions in a way that avoids the dispatch cost on every call:

http://www.agner.org/optimize/blog/read.php?i=167

parched · 2017-06-05T14:02:54Z

Hi @pedrocr, I have been working on something almost exactly as you describe using a procedural macro. It's not finished but I can publish it when I get home today if you would like to have a look / give feedback.

pedrocr · 2017-06-05T14:35:18Z

@parched sounds very interesting indeed :)

steveklabnik · 2017-06-05T14:39:13Z

This is an RFC-level change to the language; I'd encourage you to move this to a thread on internals. Thanks!

pedrocr · 2017-06-05T14:46:00Z

@steveklabnik you mean https://internals.rust-lang.org/c/internals ?

pedrocr mentioned this issue Jun 5, 2017

Are non-SIMD fallbacks and automatic multiple-target support planned? hsivonen/simd#9

Closed

steveklabnik closed this as completed Jun 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about #42432

Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about #42432

pedrocr commented Jun 4, 2017 •

edited

Loading

leonardo-m commented Jun 4, 2017

pedrocr commented Jun 4, 2017

pedrocr commented Jun 4, 2017

parched commented Jun 5, 2017

pedrocr commented Jun 5, 2017

steveklabnik commented Jun 5, 2017

pedrocr commented Jun 5, 2017

Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about #42432

Suggestion for a low-effort way to take advantage of SIMD and other architecture specific tricks LLVM knows about #42432

Comments

pedrocr commented Jun 4, 2017 • edited Loading

leonardo-m commented Jun 4, 2017

pedrocr commented Jun 4, 2017

pedrocr commented Jun 4, 2017

parched commented Jun 5, 2017

pedrocr commented Jun 5, 2017

steveklabnik commented Jun 5, 2017

pedrocr commented Jun 5, 2017

pedrocr commented Jun 4, 2017 •

edited

Loading