Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

nbdd0121 · 2020-07-22T04:31:21Z

I tried this code when looking at the issue #73015.

use std::num::FpCategory;

fn conv(input: usize) -> FpCategory {
    match input {
        0b10000000 | 0b00000001 => FpCategory::Infinite,
        0b01000000 | 0b00000010 => FpCategory::Normal,
        0b00100000 | 0b00000100 => FpCategory::Subnormal,
        0b00010000 | 0b00001000 => FpCategory::Zero,
        0b100000000 | 0b1000000000 => FpCategory::Nan,
        _ => unsafe  { std::hint::unreachable_unchecked() },
    }
}

pub fn test(input: usize) -> bool {
    conv(input) == FpCategory::Zero
}

I expect all other match arms to be optimised out, so basically the code would be equivalent to

pub fn test(input: usize) -> bool {
    match input {
        0b00010000 | 0b00001000 => true
        _ => false,
    }
}

or maybe even

pub fn test(input: usize) -> bool {
    input & 0b00011000 != 0
}

On Rust 1.37 it is quite optimised but a huge chunk of code is generated for 1.38. Interestingly, if the unreachable_unchecked is replaced with a value such as FpCategory::Nan, test could still be optimised.

@rustbot modify labels: +I-slow +I-heavy +T-compiler +A-codegen

The text was updated successfully, but these errors were encountered:

LeSeulArtichaut · 2020-07-26T20:26:32Z

Assigning P-medium as discussed as part of the Prioritization WG Procedure.

MSxDOS · 2020-10-10T02:38:45Z

As far as I can tell there's a regression in LLVM 9:

https://rust.godbolt.org/z/b9frfW
https://godbolt.org/z/5rc3v7

Check the assembly of these Rust (nightly) and C (clang trunk) examples then switch Rust to 1.37 and clang to 8.0.0. Looks familiar?

nikic · 2020-10-10T08:37:07Z

Probably LLVM should drop an unreachable default case if that allows folding a large number of cases into the default case.

ssomers · 2021-01-27T12:04:27Z

#74693 (which replaced core::intrinsics::unreachable with panic) no longer helps performance, so I wonder if this issue was resolved in general.

klensy · 2021-01-27T17:14:41Z

Replacing hint with std::intrinsics::unreachable gives small size back.

nbdd0121 · 2021-01-27T21:42:56Z

I just checked the assembly produced on playground with nightly and it still generates redundant checks.

workingjubilee · 2024-09-15T23:26:28Z

The three tests are now within an instruction of each other per https://godbolt.org/z/qccfs9cK5

optimized assembly for 1.37

complex_test:
        cmp     rdi, 8
        je      .LBB0_3
        cmp     rdi, 16
        je      .LBB0_3
        xor     eax, eax
        ret
.LBB0_3:
        mov     al, 1
        ret

simpler_test:
        cmp     rdi, 16
        je      .LBB1_3
        cmp     rdi, 8
        jne     .LBB1_2
.LBB1_3:
        mov     al, 1
        ret
.LBB1_2:
        xor     eax, eax
        ret

complex_test:
        test    dil, 24
        setne   al
        ret

optimized assembly for 1.38

complex_test:
        xor     eax, eax
        cmp     rdi, 127
        jg      .LBB0_5
        add     rdi, -1
        cmp     rdi, 63
        ja      .LBB0_8
        movabs  rcx, -9223372034707292149
        bt      rcx, rdi
        jb      .LBB0_7
        mov     eax, 32896
        bt      rax, rdi
        jae     .LBB0_8
        mov     al, 1
        ret
.LBB0_5:
        cmp     rdi, 128
        je      .LBB0_7
        cmp     rdi, 256
.LBB0_7:
        ret
.LBB0_8:
        ud2

simpler_test:
        cmp     rdi, 16
        je      .LBB1_3
        cmp     rdi, 8
        jne     .LBB1_2
.LBB1_3:
        mov     al, 1
        ret
.LBB1_2:
        xor     eax, eax
        ret

simplest_test:
        test    dil, 24
        setne   al
        ret

optimized assembly for 1.81

complex_test:
        rep       bsf rax, rdi
        add     rax, -3
        cmp     rax, 2
        setb    al
        ret

simpler_test:
        add     rdi, -8
        test    rdi, -9
        sete    al
        ret

simplest_test:
        test    dil, 24
        setne   al
        ret

I think we should declare victory here with a regression test.

…r=Mark-Simulacrum Add Four Codegen Tests Closes rust-lang#74615 Closes rust-lang#123216 Closes rust-lang#49572 Closes rust-lang#93514 This PR adds four codegen tests. The FileCheck assertions were generated with the help of `update_test_checks.py` and `update_llc_test_checks.py` from the LLVM project.

jonas-schievink added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Jul 22, 2020

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 22, 2020

LeSeulArtichaut added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 26, 2020

This was referenced Aug 7, 2020

BTreeMap: enforce the panic rule imposed by replace #75071

Merged

Improve btree's unwrap_unchecked #74693

Closed

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 10, 2020

ssomers mentioned this issue Jun 7, 2021

BTree: encapsulate LeafRange better & some debug asserts #85980

Merged

workingjubilee added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Sep 15, 2024

veera-sivarajan linked a pull request Dec 21, 2024 that will close this issue

Add Four Codegen Tests #134626

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

nbdd0121 commented Jul 22, 2020 •

edited

Loading

LeSeulArtichaut commented Jul 26, 2020

MSxDOS commented Oct 10, 2020

nikic commented Oct 10, 2020

ssomers commented Jan 27, 2021 •

edited

Loading

klensy commented Jan 27, 2021

nbdd0121 commented Jan 27, 2021

workingjubilee commented Sep 15, 2024 •

edited

Loading

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

Performance regression from Rust 1.37 to 1.38 when using unreachable_unchecked #74615

Comments

nbdd0121 commented Jul 22, 2020 • edited Loading

LeSeulArtichaut commented Jul 26, 2020

MSxDOS commented Oct 10, 2020

nikic commented Oct 10, 2020

ssomers commented Jan 27, 2021 • edited Loading

klensy commented Jan 27, 2021

nbdd0121 commented Jan 27, 2021

workingjubilee commented Sep 15, 2024 • edited Loading

nbdd0121 commented Jul 22, 2020 •

edited

Loading

ssomers commented Jan 27, 2021 •

edited

Loading

workingjubilee commented Sep 15, 2024 •

edited

Loading