Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and update i686 triples #31632

Closed
wants to merge 2 commits into from
Closed

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Feb 13, 2016

They now (should) match the behaviour of Clang and there is a brief comment in each documenting this.

As per discussion in #31110

Keep track of differences between Clang and rust choices about default
CPUs for 32-bit x86 targets.
Try to be consistently replicate Clang default CPU for i686 triples.
@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

@bors: r+ 8c840ee

Thanks again for the investigation!

@MagaTailor
Copy link

@dhuseby
The FreeBSD change looks slightly suspect - why call this target i686 and not i486 then? I could understand why clang wants to default to P4 on i586 but the obverse sounds misleading.

Either way, to avoid any potential user disappointment, the current BSD stage0 snapshot was probably built with the old codegen settings and won't run on the newly supported "i686" systems.

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 14, 2016

I reproduced locally the failure in #31646. This comparison from src/test/run-pass/issue-21634.rs

    if let Ok(mut x) = "3.1415".parse::<f64>() {
        assert_eq!(8.1415, { x += 5.0; x });
    }

fails because the f64 value 8.1415 is compared to the 80-bit result of the addition:

    1e17:       dd 44 24 5c             fldl   0x5c(%esp) ; load the result of parse
    1e1b:       d8 86 a7 20 00 00       fadds  0x20a7(%esi) ; add 5.0 from memory
    1e21:       dd 54 24 10             fstl   0x10(%esp) ; store the result
    1e25:       8d 44 24 10             lea    0x10(%esp),%eax
    1e29:       89 44 24 4c             mov    %eax,0x4c(%esp)
    1e2d:       8d 86 cf 20 00 00       lea    0x20cf(%esi),%eax
    1e33:       89 44 24 50             mov    %eax,0x50(%esp)
    1e37:       dd 86 9f 20 00 00       fldl   0x209f(%esi) ; load 8.1415
    1e3d:       d9 c9                   fxch   %st(1) ; exchange the two top elements of the FP stack (why???)
    1e3f:       da e9                   fucompp ; compare the two top elements and pop both

It is possible to fix this in several ways:

  • the value used for the test could be changed (for example to 3.125) so that adding 5.0 does not involve rounding
  • it might be possible to set the FPU control word so that results are always rounded to 53 bits of mantissa

Since it does not look like the purpose of the test was to check for the rounding behaviour, I think that the test should be fixed/made more robust.

@hanna-kruppe
Copy link
Contributor

This shouldn't be the only test that fails. src/libcoretest/num/dec2flt/mod.rs has a test that fails without SSE2. If it doesn't fail, that's a bug in the test, because that test was specifically written to break on x87. The different rounding in x87 breaks the fast path of float parsing (in that it stops being completely accurate and round-trip-safe).

Somehow you need to deal with this if you want a non-SSE2 target. In fact, the discussion of this problem prompted some platforms being changed to a pentium4 base CPU (though I don't know if 32 bit Darwin was among them or if it was already "yonah"). You're gonna have to deal with this somehow. The simplest possibility would be to disable the fast path on 32 bit Darwin, though this is a significant performance regression (at least an order of magnitude or two IIRC, there are float parsing benchmarks in libcoretest).

@bors
Copy link
Contributor

bors commented Feb 14, 2016

⌛ Testing commit 8c840ee with merge ec1f6c7...

@bors
Copy link
Contributor

bors commented Feb 14, 2016

💔 Test failed - auto-mac-32-opt

@@ -12,7 +12,9 @@ use target::Target;

pub fn target() -> Target {
let mut base = super::apple_base::opts();
base.cpu = "yonah".to_string();
// Use i686 as default CPU. Clang uses the same default.
base.cpu = "i686".to_string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the commit message that made this a yonah

Use more specific target CPUs on Darwin

Macs don't come with anything older than a Yonah (32bit) or Core2 (64bit), so we can default to those targets. Clang does the same.

I’m not sure why clang would change their default, but macs not existing with pre-yonah hardware seems like a pretty good reason to just use a yonah.

cc @dotdash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above: Clang defaults to yonah on i386-apple-darwin, but not on i686-apple-darwin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 but if there can’t possibly be such a combination of darwin+x86 which uses anything pre-yonah, why bother (EDIT: or, rather, restrict ourselves to) targeting a decade-older CPU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 Hm, where does clang do that distinction? Did you check the code or is there a command I could use to reproduce/check this (without owning a Mac that is ;-))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, found the command in the other PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm reading the code correctly, there's some special handling for Darwin that disables the automatic CPU selection for any x86 target except for the i386 one. I wonder whether that's actually intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa I do not know why Clang restricts itself to a decade-older CPU; we are about to do it in order to be consistent with Clang. I agree with you that it is surprising to have a sub-optimal default on Mac and that is the reason why I was suggesting to provide an i386-apple-darwin triple as default target for 32-bit Mac.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang in this regard seems... somewhat inconsistent? At the very least it seems fine to leave this as-is and perhaps document the oddity (to allow this PR to land)

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 14, 2016

@rkruppe I confirm that num::dec2flt::fast_path_correct fails. I experimented with fldcw and I can confirm that setting the rounding mode to 64 bits (instead of the default 80 bits) makes the test pass. Would this be ok?
EDIT: Obviously the 64-bits rounding would only be enabled for the fast path of the parser and afterwards the original control word would be restored.

@hanna-kruppe
Copy link
Contributor

I'm of two minds regarding fiddling with the FPU control word. On the one hand, it's nice to not leave the performance on the table. On the other hand, it's a very low level trick that has clear disadvantages especially when the target does have SSE2 (among other things, it's slightly slower, might inhibit compiler optimizations, and might break if optimizations get better). On the gripping hand, if we had a way to guarantee that this code path is only taken on targets without SSE2, even as target specs evolve, then I'd feel a lot better about it.

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 14, 2016

It might be possible to only touch the FPU control word on x86 targets when SIMD operations are not available (SSE for f32, SSE2 for f64) using cfg_target_feature.

@hanna-kruppe
Copy link
Contributor

Wait, cfg can check target CPU features? That'd be exactly what I wished for!

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 14, 2016

I should mention that LLVM does a similar fiddling with the control word when casting floating point types to integer in order to ensure truncation (see the lines after 22872 in lib/Target/X86/X86ISelLowering.cpp, I cannot link it because the file is 1+ MB of C++ code 😭 )

@hanna-kruppe
Copy link
Contributor

Aw, target_feature="sse" does not actually work :( See #31662

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 15, 2016

It does not work yet ;)
I am happy that this PR exposed other issues, we can fix them before they affect distros

bors added a commit that referenced this pull request Feb 15, 2016
bors added a commit that referenced this pull request Feb 15, 2016
@nagisa
Copy link
Member

nagisa commented Feb 16, 2016

Note, that IMHO claiming to follow “what Clang does” is a really brittle way forward. We should…

  1. Have some rules for what is acceptable common denominator to us (which might or might not match Clang’s lower-bound), so the PRs like this do not pop-up after every LLVM release modifying 10 or so target files;
  2. Decide what i386, i686 et al mean to us since their actual meaning seems not really match what i386, i686 et al really are (both here and elsewhere).

@ranma42
Copy link
Contributor Author

ranma42 commented Feb 16, 2016

I think it would be convenient if the meaning of rustc triples was consistent with that of Clang or with the GNU one as much as possible.
In any case, our choices (and their rationale) should be documented, in order to avoid PRs like this and to guide future choices when adding architectures and targets.
What is the best place to discuss the points listed by @nagisa ?

@nagisa
Copy link
Member

nagisa commented Feb 16, 2016

Internals forum would be a good place.

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to resubmit with the tests fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants