Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

Open
nbdd0121 opened this issue Jul 22, 2020 · 7 comments · May be fixed by #134626
Open

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

nbdd0121 opened this issue Jul 22, 2020 · 7 comments · May be fixed by #134626
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nbdd0121
Copy link
Contributor

nbdd0121 commented Jul 22, 2020

I tried this code when looking at the issue #73015.

use std::num::FpCategory;

fn conv(input: usize) -> FpCategory {
    match input {
        0b10000000 | 0b00000001 => FpCategory::Infinite,
        0b01000000 | 0b00000010 => FpCategory::Normal,
        0b00100000 | 0b00000100 => FpCategory::Subnormal,
        0b00010000 | 0b00001000 => FpCategory::Zero,
        0b100000000 | 0b1000000000 => FpCategory::Nan,
        _ => unsafe  { std::hint::unreachable_unchecked() },
    }
}

pub fn test(input: usize) -> bool {
    conv(input) == FpCategory::Zero
}

I expect all other match arms to be optimised out, so basically the code would be equivalent to

pub fn test(input: usize) -> bool {
    match input {
        0b00010000 | 0b00001000 => true
        _ => false,
    }
}

or maybe even

pub fn test(input: usize) -> bool {
    input & 0b00011000 != 0
}

On Rust 1.37 it is quite optimised but a huge chunk of code is generated for 1.38. Interestingly, if the unreachable_unchecked is replaced with a value such as FpCategory::Nan, test could still be optimised.

@rustbot modify labels: +I-slow +I-heavy +T-compiler +A-codegen

@rustbot rustbot added I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 22, 2020
@jonas-schievink jonas-schievink added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Jul 22, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 22, 2020
@LeSeulArtichaut LeSeulArtichaut added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 26, 2020
@LeSeulArtichaut
Copy link
Contributor

@MSxDOS
Copy link

MSxDOS commented Oct 10, 2020

As far as I can tell there's a regression in LLVM 9:

https://rust.godbolt.org/z/b9frfW
https://godbolt.org/z/5rc3v7

Check the assembly of these Rust (nightly) and C (clang trunk) examples then switch Rust to 1.37 and clang to 8.0.0. Looks familiar?

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 10, 2020
@nikic
Copy link
Contributor

nikic commented Oct 10, 2020

Probably LLVM should drop an unreachable default case if that allows folding a large number of cases into the default case.

@ssomers
Copy link
Contributor

ssomers commented Jan 27, 2021

#74693 (which replaced core::intrinsics::unreachable with panic) no longer helps performance, so I wonder if this issue was resolved in general.

@klensy
Copy link
Contributor

klensy commented Jan 27, 2021

Replacing hint with std::intrinsics::unreachable gives small size back.

@nbdd0121
Copy link
Contributor Author

I just checked the assembly produced on playground with nightly and it still generates redundant checks.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 15, 2024

The three tests are now within an instruction of each other per https://godbolt.org/z/qccfs9cK5

optimized assembly for 1.37
complex_test:
        cmp     rdi, 8
        je      .LBB0_3
        cmp     rdi, 16
        je      .LBB0_3
        xor     eax, eax
        ret
.LBB0_3:
        mov     al, 1
        ret

simpler_test:
        cmp     rdi, 16
        je      .LBB1_3
        cmp     rdi, 8
        jne     .LBB1_2
.LBB1_3:
        mov     al, 1
        ret
.LBB1_2:
        xor     eax, eax
        ret

complex_test:
        test    dil, 24
        setne   al
        ret
optimized assembly for 1.38
complex_test:
        xor     eax, eax
        cmp     rdi, 127
        jg      .LBB0_5
        add     rdi, -1
        cmp     rdi, 63
        ja      .LBB0_8
        movabs  rcx, -9223372034707292149
        bt      rcx, rdi
        jb      .LBB0_7
        mov     eax, 32896
        bt      rax, rdi
        jae     .LBB0_8
        mov     al, 1
        ret
.LBB0_5:
        cmp     rdi, 128
        je      .LBB0_7
        cmp     rdi, 256
.LBB0_7:
        ret
.LBB0_8:
        ud2

simpler_test:
        cmp     rdi, 16
        je      .LBB1_3
        cmp     rdi, 8
        jne     .LBB1_2
.LBB1_3:
        mov     al, 1
        ret
.LBB1_2:
        xor     eax, eax
        ret

simplest_test:
        test    dil, 24
        setne   al
        ret
optimized assembly for 1.81
complex_test:
        rep       bsf rax, rdi
        add     rax, -3
        cmp     rax, 2
        setb    al
        ret

simpler_test:
        add     rdi, -8
        test    rdi, -9
        sete    al
        ret

simplest_test:
        test    dil, 24
        setne   al
        ret

I think we should declare victory here with a regression test.

@workingjubilee workingjubilee added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Sep 15, 2024
@veera-sivarajan veera-sivarajan linked a pull request Dec 21, 2024 that will close this issue
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 27, 2024
…r=Mark-Simulacrum

Add Four Codegen Tests

Closes rust-lang#74615
Closes rust-lang#123216
Closes rust-lang#49572
Closes rust-lang#93514

This PR adds four codegen tests. The FileCheck assertions were generated with the help of `update_test_checks.py` and `update_llc_test_checks.py` from the LLVM project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants