Use sqrt intrinsic for floating point sqrt #5686

brson · 2013-04-03T00:59:01Z

sqrt is still using libc

huonw · 2013-04-03T05:36:16Z

I'm running tests on a patch that uses intrinsics for sqrt, sin, cos, exp, etc.

thestinger · 2013-04-03T05:37:31Z

@dbaupp: Nice! It would be awesome to have all of the useful LLVM intrinsics for integers/floats exposed.

huonw · 2013-04-03T06:22:54Z

Yay! 5x faster:

use core::float::sqrt;

fn main() {
    let mut sum: float = 0.0;
    for core::uint::range(0, 100000000) |i| {
        sum += sqrt(i as float);
    }
    io::println(fmt!("%f", sum));
}

# using cmath
$ rustc --opt-level=3 intrinsics-bench.rs  && time ./intrinsics-bench 
666666661666.567017

real    0m2.729s
user    0m2.712s
sys     0m0.008s

# using intrinsics
$ x86_64-unknown-linux-gnu/stage2/bin/rustc --opt-level=3 intrinsics-bench.rs  && time ./intrinsics-bench 
666666661666.567017

real    0m0.513s
user    0m0.500s
sys     0m0.008s

brson · 2013-04-03T15:10:14Z

@dbaupp mentioned on reddit that using some of these clobbers the stack canary. I hadn't considered that before and we need to think about how these are implemented. How many of the intrinsics that we are already using are just trotting all over the stack red zone? Perhaps LLVM's intrinsics don't emit the proper stack growth prologues.

huonw · 2013-04-03T15:36:51Z

(The very brief reddit thread, for future reference.)

Do I continue working on this? (i.e. open a pull request, since I've got the code basically organised.)

The (most) dangerous intrinsics seem to be sin, cos and exp, so I've left sin as libc and added sin_intr for the intrinsic etc.

brson · 2013-04-03T16:00:34Z

@dbaupp I'd like to know what assembly they generate. Do you have a test case that demonstrates all of them?

Achieves up to 5x speed up! However, the intrinsics seem to do bad things to the stack, especially sin, cos and exp (rust-lang#5686 has discussion). Also, add f{32,64,loat}::powi, and reorganise the delegation code so that functions have the #[inline(always)] annotation, and reduce the repetition of delegate!(..).

huonw · 2013-04-03T16:24:52Z

@brson: #5697

pcwalton · 2013-04-20T05:39:02Z

@huonw The necessary LLVM changes have been made.

However, I didn't realize that intrinsics can clobber the stack. I guess we need to write wrappers with the fixedstacksegment attribute for the problematic intrinsics (sin, cos) as well! This can be done entirely in the Rust compiler now, no need to patch LLVM.

huonw · 2013-04-20T05:50:08Z

@pcwalton Does fixed_stack_segment work on extern/intrinsic functions? i.e.

#[abi = "rust-intrinsic"]
pub extern "rust-intrinsic" {
   #[fixed_stack_segment]
   pub fn sinf32(x: f32) -> f32;
}

pcwalton · 2013-04-20T06:23:11Z

@huonw Extern functions automatically require fixed stack segments, if the #[fast_ffi] attribute is used. (Eventually this attribute will no longer be necessary.) However, I don't think that it will work for functions with the rust-intrinsic ABI, because I didn't realize that some intrinsic functions would need it.

@pcwalton

…lton This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully. @pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect. I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved. Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million): ``` $ ./intrinsics-perf.sh f32 func new old speedup sin 0.80 2.75 3.44 cos 0.80 2.76 3.45 sqrt 0.56 2.73 4.88 ln 1.01 2.94 2.91 log10 0.97 2.90 2.99 log2 1.01 2.95 2.92 exp 0.90 2.85 3.17 exp2 0.92 2.87 3.12 pow 6.95 8.57 1.23 geometric mean: 2.97 $ ./intrinsics-perf.sh f64 func new old speedup sin 12.08 14.06 1.16 cos 12.04 13.67 1.14 sqrt 0.49 2.73 5.57 ln 4.11 5.59 1.36 log10 5.09 6.54 1.28 log2 2.78 5.10 1.83 exp 2.00 3.97 1.99 exp2 1.71 3.71 2.17 pow 5.90 7.51 1.27 geometric mean: 1.72 ``` So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup. (fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)

huonw · 2013-04-21T06:05:24Z

@brson #5975 fixed this.

brson · 2013-04-21T06:43:40Z

Thanks @huonw!

@phansch

Reorder sections of release documentation r? @phansch [Rendered](https://github.com/flip1995/rust-clippy/blob/release_doc/doc/release.md) changelog: none

@phansch

Replace all remaining occurrences of submodule with subtree r? @phansch I should have included this in rust-lang#5686, but forgot about it, so here we go again. changelog: none

huonw mentioned this issue Apr 3, 2013

Use more floating point intrinsics #5697

Closed

huonw mentioned this issue Apr 20, 2013

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

Closed

brson closed this as completed Apr 21, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sqrt intrinsic for floating point sqrt #5686

Use sqrt intrinsic for floating point sqrt #5686

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

thestinger commented Apr 3, 2013

huonw commented Apr 3, 2013

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

pcwalton commented Apr 20, 2013

huonw commented Apr 20, 2013

pcwalton commented Apr 20, 2013

huonw commented Apr 21, 2013

brson commented Apr 21, 2013

Use sqrt intrinsic for floating point sqrt #5686

Use sqrt intrinsic for floating point sqrt #5686

Comments

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

thestinger commented Apr 3, 2013

huonw commented Apr 3, 2013

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

brson commented Apr 3, 2013

huonw commented Apr 3, 2013

pcwalton commented Apr 20, 2013

huonw commented Apr 20, 2013

pcwalton commented Apr 20, 2013

huonw commented Apr 21, 2013

brson commented Apr 21, 2013