Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to optimise std::cmp::{min,max} on bool #114653

Closed
Kmeakin opened this issue Aug 9, 2023 · 6 comments
Closed

Failure to optimise std::cmp::{min,max} on bool #114653

Kmeakin opened this issue Aug 9, 2023 · 6 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@Kmeakin
Copy link
Contributor

Kmeakin commented Aug 9, 2023

I tried this code: (godbolt)

// DOES NOT optimise to `a & b`
pub fn min_std(a: bool, b: bool) -> bool {
    std::cmp::min(a, b)
}

// DOES NOT optimise to `a | b`
pub fn max_std(a: bool, b: bool) -> bool {
    std::cmp::max(a, b)
}

// DOES optimise to `a & b`
pub fn min_if(a: bool, b: bool) -> bool {
    if (a < b) {
        return a;
    }
    b
}

// DOES optimise to `a | b` on x86_64
// DOES NOT optimise to `a | b` on AArch64
pub fn max_if(a: bool, b: bool) -> bool {
    if (a > b) {
        return a;
    }
    b
}

pub fn min_and(a: bool, b: bool) -> bool {
    a & b
}
pub fn max_or(a: bool, b: bool) -> bool {
    a | b
}

I expected to see this happen:
All implementations of max should optimise to a & b. All implementations of min should optimise to a | b

Instead, this happened:
The generated assembly code is suboptimal for min_std, max_std, max_ternary and max_if (on AArch64)

min_std and max_std can be fixed by overriding the default implementation of min and max for bool:

impl Ord for bool {
    fn min(self, other: Self) -> Self { a & b }
    fn max(self, other: Self) -> Self { a | b }
}

But it might be worth investigating the LLVM-IR that rustc generates in min_if/max_if too to see why it isn't optimised to llvm.umin()/llvm.umax()

Meta

rustc --version --verbose:

rustc 1.73.0-nightly (f88a8b71c 2023-08-08)
binary: rustc
commit-hash: f88a8b71cebb730cbd5058c45ebcae1d4d9be377
commit-date: 2023-08-08
host: x86_64-unknown-linux-gnu
release: 1.73.0-nightly
LLVM version: 17.0.0
Backtrace

<backtrace>

@Kmeakin Kmeakin added the C-bug Category: This is a bug. label Aug 9, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 9, 2023
@lenawanel
Copy link
Contributor

lenawanel commented Aug 9, 2023

here is the llvm ir this generates:

@example::min_and = unnamed_addr alias i1 (i1, i1), ptr @example::min_if

; example::min_std
define noundef zeroext i1 @example::min_std(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr personality ptr @rust_eh_personality {
start:
  %.neg.i = sext i1 %a to i8
  %0 = zext i1 %b to i8
  %1 = xor i8 %0, -1
  %switch.i = icmp eq i8 %1, %.neg.i
  %_0.0.i = select i1 %switch.i, i1 %b, i1 %a
  ret i1 %_0.0.i
}

; example::max_std
define noundef zeroext i1 @example::max_std(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr personality ptr @rust_eh_personality {
start:
  %.neg.i = sext i1 %a to i8
  %0 = zext i1 %b to i8
  %1 = xor i8 %0, -1
  %switch.i = icmp eq i8 %1, %.neg.i
  %_0.0.i = select i1 %switch.i, i1 %a, i1 %b
  ret i1 %_0.0.i
}

; example::min_if
define noundef zeroext i1 @example::min_if(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %a.b = and i1 %a, %b
  ret i1 %a.b
}

; example::max_if
define noundef zeroext i1 @example::max_if(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = xor i1 %b, true
  %_3 = and i1 %0, %a
  %a.b = select i1 %_3, i1 %a, i1 %b
  ret i1 %a.b
}

; example::max_or
define noundef zeroext i1 @example::max_or(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %_0 = or i1 %a, %b
  ret i1 %_0
}

declare noundef i32 @rust_eh_personality(i32 noundef, i32 noundef, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

it first extends the i1s to i8s, negating b and sign extending a, and then finally does an icmp, which seems very convoluted to me.
(it's also interesting that it optimized max_if way better than min_if)

@lenawanel
Copy link
Contributor

https://github.com/rust-lang/rust/tree/master/library/core/src#L1397 explains the zero and sign extensions. ({min,max} call {min,max}_by with Ord::cmp under the hood, and Ord::cmp sign extends them both.)

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Aug 9, 2023

@rustbot label +I-slow

@rustbot rustbot added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Aug 9, 2023
@Kmeakin
Copy link
Contributor Author

Kmeakin commented Aug 9, 2023

Corresponding LLVM issue: llvm/llvm-project#64537

@Kmeakin Kmeakin changed the title Failure to optimise std::min::{min,max} on bool Failure to optimise std::cmp::{min,max} on bool Aug 10, 2023
@saethlin saethlin added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 12, 2023
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Aug 15, 2023
…r=cuviper

Optimizing the rest of bool's Ord implementation

After coming across issue rust-lang#66780, I realized that the other functions provided by Ord (`min`, `max`, and `clamp`) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue rust-lang#114653

[Comparison on Godbolt](https://rust.godbolt.org/z/5nb5P8e8j)

`max` assembly before:
```assembly
example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret
```
`max` assembly after:
```assembly
example::max:
        mov     eax, edi
        or      eax, esi
        ret
```
`clamp` assembly before:
```assembly
example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code
```
`clamp` assembly after:
```assembly
example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code
```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 16, 2023
…r=cuviper

Optimizing the rest of bool's Ord implementation

After coming across issue rust-lang#66780, I realized that the other functions provided by Ord (`min`, `max`, and `clamp`) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue rust-lang#114653

[Comparison on Godbolt](https://rust.godbolt.org/z/5nb5P8e8j)

`max` assembly before:
```assembly
example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret
```
`max` assembly after:
```assembly
example::max:
        mov     eax, edi
        or      eax, esi
        ret
```
`clamp` assembly before:
```assembly
example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code
```
`clamp` assembly after:
```assembly
example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code
```
@Qi-Hu
Copy link

Qi-Hu commented Sep 14, 2023

I made a PR to llvm-project to optimize the max_if case: llvm/llvm-project#66394

@Qi-Hu
Copy link

Qi-Hu commented Sep 18, 2023

I made a PR to llvm-project to optimize the max_if case: llvm/llvm-project#66394

Actually this feature has already been implemented by 074f23e3e199. This issue can be closed.

@Kmeakin Kmeakin closed this as completed Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants