-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise floating point is_finite
(2x) and is_infinite
(1.6x).
#57353
Conversation
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
@@ -161,6 +161,11 @@ impl f32 { | |||
self != self | |||
} | |||
|
|||
#[inline] | |||
fn abs_(self) -> f32 { | |||
f32::from_bits(self.to_bits() & 0x7fff_ffff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary, rather than using the existing abs
method (which uses the LLVM fabsf32
intrinsic directly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only available in std
. #50145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bikeshedding here, but given typical naming conventions, I think it'd make sense to simply call this abs
(which shouldn't cause any conflicts) and add a comment explaining why this method is private for discoverability, e.g.
// FIXME(#50145): `abs` is publicly unavailable in libcore due to concerns
// about portability, so this implementation is for private use internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about adding a comment, but I don't think that naming it abs
is appropriate. I'd call it something like abs_hack
so it's clear that it's not using the proper abs
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about abs_private
with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can both rely on IEEE754 semantics to be made faster, by folding away the sign with an abs (left private for now), and then comparing to infinity, letting the NaN semantics of a direct float comparison handle NaN input properly. The `abs` bit-fiddling is simple (a single and), and so these new forms compile down to a few instructions, without branches, e.g. for f32: ```asm is_infinite: andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000 setae al ret is_finite: andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000 ucomiss xmm1, xmm0 seta al ret ``` When used in loops/repeatedly, they get even better: the memory operations (loading the mask 0x7FFFFFFF for abs, and infinity 0x7F80_0000) are likely to be hoisted out of the individual calls, to be shared, and the `seta`/`setae` are likely to be collapsed into conditional jumps or moves (or similar). The old `is_infinite` did two comparisons, and the old `is_finite` did three (with a branch), and both of them had to check the flags after every one of those comparison. These functions have had that old implementation since they were added in rust-lang@6284190 7 years ago. Benchmark (`abs` is the new form, `std` is the old): ``` test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10) test f32_is_finite_std ... bench: 118 ns/iter (+/- 5) test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1) test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6) test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12) test f64_is_finite_std ... bench: 128 ns/iter (+/- 25) test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5) test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23) ``` ```rust #![feature(test)] extern crate test; use std::{f32, f64}; use test::Bencher; const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f32_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite())); } #[bench] fn f32_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY)); } #[bench] fn f32_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite())); } #[bench] fn f32_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY)); } const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f64_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite())); } #[bench] fn f64_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY)); } #[bench] fn f64_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite())); } #[bench] fn f64_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY)); } ```
6a4473a
to
6e742db
Compare
Nice catch with the nit. I've updated to fix it. |
@bors r+ |
📌 Commit 6e742db has been approved by |
⌛ Testing commit 6e742db with merge ab00b4b23bee84a75a5ee5ceec8d72340c4795f8... |
💔 Test failed - status-appveyor |
@bors retry |
…odrAus Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x). These can both rely on IEEE754 semantics to be made faster, by folding away the sign with an abs (left private for now), and then comparing to infinity, letting the NaN semantics of a direct float comparison handle NaN input properly. The `abs` bit-fiddling is simple (a single and), and so these new forms compile down to a few instructions, without branches, e.g. for f32: ```asm is_infinite: andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000 setae al ret is_finite: andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000 ucomiss xmm1, xmm0 seta al ret ``` When used in loops/repeatedly, they get even better: the memory operations (loading the mask 0x7FFFFFFF for abs, and infinity 0x7F80_0000) are likely to be hoisted out of the individual calls, to be shared, and the `seta`/`setae` are likely to be collapsed into conditional jumps or moves (or similar). The old `is_infinite` did two comparisons, and the old `is_finite` did three (with a branch), and both of them had to check the flags after every one of those comparison. These functions have had that old implementation since they were added in rust-lang@6284190 7 years ago. Benchmark (`abs` is the new form, `std` is the old): ``` test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10) test f32_is_finite_std ... bench: 118 ns/iter (+/- 5) test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1) test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6) test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12) test f64_is_finite_std ... bench: 128 ns/iter (+/- 25) test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5) test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23) ``` ```rust #![feature(test)] extern crate test; use std::{f32, f64}; use test::Bencher; const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f32_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite())); } #[bench] fn f32_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY)); } #[bench] fn f32_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite())); } #[bench] fn f32_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY)); } const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f64_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite())); } #[bench] fn f64_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY)); } #[bench] fn f64_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite())); } #[bench] fn f64_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY)); } ```
…odrAus Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x). These can both rely on IEEE754 semantics to be made faster, by folding away the sign with an abs (left private for now), and then comparing to infinity, letting the NaN semantics of a direct float comparison handle NaN input properly. The `abs` bit-fiddling is simple (a single and), and so these new forms compile down to a few instructions, without branches, e.g. for f32: ```asm is_infinite: andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000 setae al ret is_finite: andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000 ucomiss xmm1, xmm0 seta al ret ``` When used in loops/repeatedly, they get even better: the memory operations (loading the mask 0x7FFFFFFF for abs, and infinity 0x7F80_0000) are likely to be hoisted out of the individual calls, to be shared, and the `seta`/`setae` are likely to be collapsed into conditional jumps or moves (or similar). The old `is_infinite` did two comparisons, and the old `is_finite` did three (with a branch), and both of them had to check the flags after every one of those comparison. These functions have had that old implementation since they were added in rust-lang@6284190 7 years ago. Benchmark (`abs` is the new form, `std` is the old): ``` test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10) test f32_is_finite_std ... bench: 118 ns/iter (+/- 5) test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1) test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6) test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12) test f64_is_finite_std ... bench: 128 ns/iter (+/- 25) test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5) test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23) ``` ```rust #![feature(test)] extern crate test; use std::{f32, f64}; use test::Bencher; const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f32_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite())); } #[bench] fn f32_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY)); } #[bench] fn f32_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite())); } #[bench] fn f32_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY)); } const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597]; #[bench] fn f64_is_infinite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite())); } #[bench] fn f64_is_infinite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY)); } #[bench] fn f64_is_finite_std(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite())); } #[bench] fn f64_is_finite_abs(b: &mut Bencher) { b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY)); } ```
Rollup of 16 pull requests Successful merges: - #57351 (Don't actually create a full MIR stack frame when not needed) - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).) - #57412 (Improve the wording) - #57436 (save-analysis: use a fallback when access levels couldn't be computed) - #57453 (lldb_batchmode.py: try `import _thread` for Python 3) - #57454 (Some cleanups for core::fmt) - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.) - #57473 (std: Render large exit codes as hex on Windows) - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.) - #57494 (Speed up item_bodies for large match statements involving regions) - #57496 (re-do docs for core::cmp) - #57508 (rustdoc: Allow inlining of reexported crates and crate items) - #57547 (Use `ptr::eq` where applicable) - #57557 (resolve: Mark extern crate items as used in more cases) - #57560 (hygiene: Do not treat `Self` ctor as a local variable) - #57564 (Update the const fn tracking issue to the new metabug) Failed merges: r? @ghost
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The
abs
bit-fiddling is simple (a single and), and so these newforms compile down to a few instructions, without branches, e.g. for
f32:
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the
seta
/setae
are likely to be collapsed intoconditional jumps or moves (or similar).
The old
is_infinite
did two comparisons, and the oldis_finite
didthree (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.
Benchmark (
abs
is the new form,std
is the old):