Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i686 floating point behavior does not agree with unit tests in debug mode #73288

Closed
ecstatic-morse opened this issue Jun 12, 2020 · 30 comments · Fixed by #113053
Closed

i686 floating point behavior does not agree with unit tests in debug mode #73288

ecstatic-morse opened this issue Jun 12, 2020 · 30 comments · Fixed by #113053
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. O-x86_32 Target: x86 processors, 32 bit (like i686-*) T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Jun 12, 2020

Compiling and running the following code with -C opt-level=0 for 32-bit Linux on a recent nightly (2020-06-07) results in a failing assertion.

fn main() {
    let masked_nan2 = f32::NAN.to_bits() ^ 0x0055_5555;
    assert_eq!(f32::from_bits(masked_nan2).to_bits(), masked_nan2);
}
rustc +nightly --target=i686-unknown-linux-gnu test.rs && ./test

Output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `2144687445`,
 right: `2140493141`', test.rs:3:5

However, this code is taken directly from a unit test for libstd added in #46012, so presumably it should succeed.

rust/src/libstd/f32.rs

Lines 1508 to 1514 in 5949391

let masked_nan1 = f32::NAN.to_bits() ^ 0x002A_AAAA;
let masked_nan2 = f32::NAN.to_bits() ^ 0x0055_5555;
assert!(f32::from_bits(masked_nan1).is_nan());
assert!(f32::from_bits(masked_nan2).is_nan());
assert_eq!(f32::from_bits(masked_nan1).to_bits(), masked_nan1);
assert_eq!(f32::from_bits(masked_nan2).to_bits(), masked_nan2);

@ecstatic-morse ecstatic-morse added C-bug Category: This is a bug. O-x86 labels Jun 12, 2020
@ecstatic-morse ecstatic-morse changed the title i686 floating point behavior does not agree with unit tests i686 floating point behavior does not agree with unit tests in debug mode Jun 12, 2020
@RalfJung
Copy link
Member

This is probably caused by x87 instructions... I think we have an issue about that somewhere already, not sure where though.
Cc @hanna-kruppe

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jun 12, 2020

Indeed. After the mask, the value in masked_nan2 is the following. On x86, this value represents a signalling NaN.

0 11111111 00101010101010101010101
    exp           significand

The call to f32::from_bits puts the same value onto the top of the x87 FPU stack with an FLD instruction before returning. After this is called, the #IA flag (invalid arithmetic operand exception) is set in the x87 status register, and the value on top of the stack is an 80-bit quiet NaN (0x7fffc000010000000000). According to the reference manual, this occurs only if the operand to FLD is not also a double extended-precision float.

When the return value is popped off the FPU stack in main and stored in memory with an FSTP instruction, the signalling/quiet bit (the most-significant bit of the significand) is set. On x86, this represents a quiet NaN.

0 11111111 10101010101010101010101
    exp           significand

@hanna-kruppe
Copy link
Contributor

Independently of x87, the intent behind the test is on shaky ground. LLVM optimizations do not generally preserve NaN bit strings, see e.g. #55131.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jun 12, 2020

Seems like we should update the docs for {f32,f64}::from_bits then? Currently, they claim

This is currently identical to transmute::<u32, f32>(v) on all platforms

and

this implementation favors preserving the exact bits

@hanna-kruppe
Copy link
Contributor

The first part isn't wrong, as far as I can tell. It literally is implemented as transmute, it's just that handling any float values in general does not always "preserves the exact bits".

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jun 12, 2020

fn main() {
    let snan: u32 = 0xff << 23 | 1;

    let round_trip = unsafe {
        let float: f32 = std::mem::transmute(snan);
        std::mem::transmute(float)
    };

    assert_eq!(snan, round_trip);
    // assert_eq!(f32::from_bits(snan).to_bits(), snan);
}

@hanna-kruppe This succeeds in debug mode unless the last line is uncommented, so the docs are misleading at the very least.

@hanna-kruppe
Copy link
Contributor

I believe that difference is due to the transmute getting const-eval'd, the LLVM IR just stores the same u32 bit pattern to the stack several times. While this does end up making a difference today:

  1. Several changes to rustc could make the difference disappear again (enabling MIR inlining, not running const-prop in debug builds, forbidding const-prop from evaluating tranmsutes, etc.)
  2. Conversely, all sorts of harmless changes to the version that uses transmute directly can break it. For example, anything that stops constant propagation. Or introducing your own function boundaries.

It's not that the functions in std do something different from transmute, it's that -- contrary to all usual intuition -- function boundaries and IR optimizations matter for this behavior. This whole situation is an incredibly mess and headache, and picking on this one sentence from the docs is not going to help clarify it. If we want to give users proper guidelines about this, we need to go back to the drawing board and figure out what the rules actually are and should be.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jun 12, 2020

I believe that difference is due to the transmute getting const-eval'd

If you mean MIR const propagated, it does not. Not sure about LLVM const propagation, but adding black_box and wrapping the round trip in a function call did not change behavior. I believe the fact that std::mem::transmute is direct re-export of the intrinsic accounts for the difference.

I'm not saying that this area isn't a mess, but the docs from_bits don't acknowledge this at all, and are trivially wrong in the case I've shown above. I'm going to update them to reflect this.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Jun 12, 2020

@Mark-Simulacrum Besides the unresolved problems around NaN payloads, this raises another issue: Should we be running unit tests for the standard library in debug mode on CI?

As far as I know, we don't distribute a non-optimized libstd, but maybe we should be building one and running its tests as part of the nopt CI jobs?

@Mark-Simulacrum
Copy link
Member

I am fairly certain (hopeful, I guess) that most of our builders do have debug assertions enabled... however, it sounds like what you actually want here is a debug libstd. I don't know how feasible that is from a performance perspective (I imagine running those tests is considerably slower?).

We do actually have a i686-gnu-nopt builder, and I think some other nopt builders, (https://github.com/rust-lang/rust/blob/master/src/ci/docker/i686-gnu-nopt/Dockerfile#L23). That's currently configured to not enable optimizations for tests, but that only affects compiletest-run tests. For tests like this which are heavily dependent on such, we can move them to run-pass tests to get benefits of debug testing. Note that debug assertions are currently disabled on that builder due to CI time concerns.

@Mark-Simulacrum
Copy link
Member

Basically, as with all such questions, it comes down to someone sitting down and timing how long a debug build of std/std tests + running those tests would take, then timing some current task (e.g. release std tests) and scaling that to CI time. If that's sufficiently small in CI timescales, we can add it, if not we'll have to not do so.

@RalfJung
Copy link
Member

RalfJung commented Jun 13, 2020

In terms of the bug here, should we have a proper meta-bug (maybe even labelled I-unsound) about LLVM FP semantics around NaN payloads being a mess? This is not immediately actionable, but it is certainly something we should track in a single place. Or should we convert either this or #55131 into that meta-bug?

Cc @rust-lang/lang -- the short version is, looking at the bits of NaNs in LLVM is broken. :/

@RalfJung RalfJung added T-lang Relevant to the language team, which will review and decide on the PR/issue. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Jun 13, 2020
@ecstatic-morse
Copy link
Contributor Author

@RalfJung I went ahead and opened #73328 to track documenting our guarantees around NaN. I think we can close this issue once we remove the tests that are relying on unspecified behavior. I can look into running libstd unit tests in debug mode on CI. Perhaps we just want to make the existing floating point unit tests into UI tests to see if there's any other issues like this?

@ecstatic-morse
Copy link
Contributor Author

This is a duplicate of #46948.

@RalfJung
Copy link
Member

RalfJung commented Jun 14, 2020

I can look into running libstd unit tests in debug mode on CI.

Don't our existing debug CI builders already do that?
Maybe not for i686 though.

EDIT: Ah I see, the point is that we do it for ui tests but not unit tests.
FWIW, unit tests are in general preferable because we can more easily run them in Miri.

@RalfJung
Copy link
Member

This is a duplicate of #46948.

So which of these would you prefer to close?

@nikomatsakis
Copy link
Contributor

the short version is, looking at the bits of NaNs in LLVM is broken. :/

From reading the thread, it sounds like the bits of a NaN can basically change arbitrarily, as a result of LLVM optimizations that are not guaranteed to preserve them? (And other things, I suppose)

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 14, 2020
…r=Mark-Simulacrum

Run standard library unit tests without optimizations in `nopt` CI jobs

This was discussed in rust-lang#73288 as a way to catch similar issues in the future. This builds an unoptimized standard library with the bootstrap compiler and runs the unit tests. This takes about 2 minutes on my laptop.

I confirmed that this method works locally, although there may be a better way of implementing it. It would be better to use the stage 2 compiler instead of the bootstrap one.

Notably, there are currently four `libstd` unit tests that fail in debug mode on `i686-unkown-linux-gnu` (a tier one target):

```
failures:
    f32::tests::test_float_bits_conv
    f32::tests::test_total_cmp
    f64::tests::test_float_bits_conv
    f64::tests::test_total_cmp
```

These are the tests that prompted rust-lang#73288 as well as the ones added in rust-lang#72568, which is currently broken due to rust-lang#73328.
@RalfJung
Copy link
Member

I always thought the x86-specific problems that arise due to x87 are side-stepped when SSE2 is enabled. But now @ecstatic-morse points out that this bug still remains. How is that possible, isn't SSE2 replacing the x87 mess?

@RalfJung
Copy link
Member

@ecstatic-morse wrote

Yes. The x86 calling convention mandates that floating point values are returned on the FPU stack. Values on the FPU stack are extended-precision, so storing them into an 8-byte f64 involves truncation and thus is an "arithmetic operation", which canonicalizes NaNs, according to the x86 manual.

So I guess that also alleviates any question about whether LLVM could adjust codegen to not have this problem.

But this also points to a way out, doesn't it? If we adopt a model where we say something like "floating-point operations may change the NaN bits in a non-deterministic way" (which is what wasm does so arguably LLVM has to support it), then we "just" have to also say that as a special exception, on x86-32, returning floating-point values from a function also is considered a "floating-point operation". (Does this also affect arguments passed to a function?)

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Sep 14, 2020

The x87 stack is only used for the return value, not for passing arguments. That formulation would be correct on i686 AFAIK.

An alternative would be to use a different calling convention for extern "Rust" functions on 32-bit x86. I think this would break a lot of tooling, however, and we would still have this problem across an FFI boundary. It's also a lot more work.

@programmerjake
Copy link
Member

If we adopt a model where we say something like "floating-point operations may change the NaN bits in a non-deterministic way"

the above is true with some important exceptions:
no operation (other than a bitcast from an integer or conversion from text to fp) produces a signaling NaN where the input did not have a signaling NaN, and all arithmetic operations (other than sign-bit manipulation) only produce quiet NaNs when they produce NaNs.

@RalfJung
Copy link
Member

RalfJung commented Sep 14, 2020

The wasm spec doesn't guarantee anything like that, does it? We cannot guarantee more than our compilation targets do.

Or maybe that's what the "arithmetic NaN" stuff is about, which I didn't really understand.

@programmerjake
Copy link
Member

The wasm spec doesn't guarantee anything like that, does it? We cannot guarantee more than our compilation targets do.

Or maybe that's what the "arithmetic NaN" stuff is about, which I didn't really understand.

arithmetic NaN is another name for quiet NaN on all but a few unusual platforms: HPPA and MIPS pre-2008.

See the definitions for PlatformProperties in the simple-soft-float crate, which I wrote.

@workingjubilee
Copy link
Member

Is it possible for us to adjust the Rust float calling convention for the pentium4-* ("i686") targets so that they always use the xmm registers for Rust code (essentially requiring they follow the x86-64 ABI conventions), never passing through the x87 stack-registers (unless a value is given to us from outside of Rust, which is of course not something we can meaningfully control)?

@bjorn3
Copy link
Member

bjorn3 commented Dec 5, 2021

It should be possible for the rust abi, though I don't know how to do this with LLVM. For the C abi I think the only option that doesn't break compatibility with existing C compilers would be to roundtrip between x87 and xmm registers to remove the excess precision.

@workingjubilee
Copy link
Member

I think that would be fine. If we implemented that strategy, that would resolve this bug for Rust for the i686 target, and any other unusual behavior would be functionally Not A Bug: if you want to be sure you are actually moving 64 bits across an FFI barrier, then you can pass it as a u64 and remake it as a float on the other side, otherwise we should follow whatever idiosyncratic behavior a given ABI mandates.

About the previous #73288 (comment) in this thread, I am actually somewhat skeptical as to how much tooling it would break (not that I think it would be zero) as opposed to how many issues would be closed (which would be at least one). For instance, any debuggers should be looking at all the registers, not just the x87 stack, and MSVC actually sets the x87 stack to a different precision than other C compilers, apparently, so we should be avoiding it in general if we want portable results: https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/

@programmerjake
Copy link
Member

apparently the x87 stack can be avoided by using the fastcc calling convention or by marking the floating-point return value as inreg:
https://github.com/llvm/llvm-project/blob/e3c2694da98d9e6585b47cebfedce8473f679fff/llvm/lib/Target/X86/X86CallingConv.td#L267-L273

// X86-32 C return-value convention.
def RetCC_X86_32_C : CallingConv<[
  // The X86-32 calling convention returns FP values in FP0, unless marked
  // with "inreg" (used here to distinguish one kind of reg from another,
  // weirdly; this is really the sse-regparm calling convention) in which
  // case they use XMM0, otherwise it is the same as the common X86 calling
  // conv.

@workingjubilee
Copy link
Member

Belatedly: if that is the case, then that answers why this changes! Our functions wind up annotated with fastcc in LLVM after we run optimizations, but not before, so we actually change our calling conventions when calling into Rust code. We can close this bug simply by tuning rustc to be more insistent and consistent in passing arguments.

@workingjubilee workingjubilee added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Mar 3, 2023
@RalfJung
Copy link
Member

I propose that we close this issue by documenting this as a known non-compliance on x86-32 targets: #113053.

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2023

Closing in favor of a dedicated tracking issue that summarizes what we know about the issue: #115567.

@RalfJung RalfJung closed this as completed Aug 4, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 3, 2023
…bilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 3, 2023
Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
@Noratrieb Noratrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. C-bug Category: This is a bug. O-x86_32 Target: x86 processors, 32 bit (like i686-*) T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants