Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split_at fails to optimize bounds check #74938

Closed
lcnr opened this issue Jul 30, 2020 · 19 comments · Fixed by #125347
Closed

split_at fails to optimize bounds check #74938

lcnr opened this issue Jul 30, 2020 · 19 comments · Fixed by #125347
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lcnr
Copy link
Contributor

lcnr commented Jul 30, 2020

https://rust.godbolt.org/z/E1PnPj

const N: usize = 3;
const T = u8;

pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
    let len = slice.len() / N;
    slice.split_at(len * N)
}

results in the following assembly

example::split_mutiple:
        push    rax
        mov     r8, rdx
        mov     rcx, rdi
        movabs  rdx, -6148914691236517205
        mov     rax, r8
        mul     rdx
        shr     rdx
        lea     rdx, [rdx + 2*rdx]
        mov     rax, r8
        sub     rax, rdx
        mov     rdi, r8
        sub     rdi, rax
        jb      .LBB0_1  ; Note that this check can never be actually hit
        mov     qword ptr [rcx], rsi
        add     rsi, rdi
        mov     qword ptr [rcx + 8], rdi
        mov     qword ptr [rcx + 16], rsi
        mov     qword ptr [rcx + 24], rax
        mov     rax, rcx
        pop     rcx
        ret
.LBB0_1:
        lea     rdx, [rip + .L__unnamed_1]
        mov     rsi, r8
        call    qword ptr [rip + core::slice::slice_index_len_fail@GOTPCREL]
        ud2

When using const N: usize = 4 the check is correctly optimized away:

example::split_mutiple:
        mov     rax, rdi
        mov     rcx, rdx
        and     rcx, -4
        and     edx, 3
        mov     qword ptr [rdi], rsi
        add     rsi, rcx
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rsi
        mov     qword ptr [rdi + 24], rdx
        ret

This happens both on all stable versions I have tested and on the most recent nightly.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 30, 2020
@bugadani
Copy link
Contributor

bugadani commented Jul 31, 2020

In fact, this has nothing to do with slices: https://rust.godbolt.org/z/7oG6qE change from 3 to 4 (or other powers of 2) to see everything optimized away.

Edit: reduced it some more: https://rust.godbolt.org/z/bEoj8x

@tesuji
Copy link
Contributor

tesuji commented Jul 31, 2020

There are no optimization in GCC and Clang either: https://godbolt.org/z/M88ndP

#include <stddef.h>

const size_t N = 3;

int foo(size_t len) {
	size_t newlen = (len / N) * N;
	return newlen <= len;
}

Should foo be optimized to only return true?

@tesuji
Copy link
Contributor

tesuji commented Jul 31, 2020

So this is more as LLVM part: @rustbot modify labels: +A-LLVM +A-mir-opt
Of course this might be improved with MIR-opt. But how hard it would be? I don't know.

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jul 31, 2020
@bugadani
Copy link
Contributor

So this is more as LLVM part

What's interesting is that the power of 2 optimization is done by rustc somewhere - the same MIR is translated to radically different LLVM-IR when N is 2^n.

Trying to find where this happens without knowledge of the compiler is rather difficult, though :)

@tesuji
Copy link
Contributor

tesuji commented Jul 31, 2020

Clang could do that, change N=4 in my snippet about to see. But still GCC doesn't do that.

@rustbot rustbot added the A-mir-opt Area: MIR optimizations label Jul 31, 2020
@nikic
Copy link
Contributor

nikic commented Jul 31, 2020

https://github.com/llvm/llvm-project/blob/63d3aeb529a7b0fb95c2092ca38ad21c1f5cfd74/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp#L311 needs to set a nuw flag. Then it will probably fold.

@bugadani
Copy link
Contributor

bugadani commented Jul 31, 2020

Cool. So if I understand correctly, the following is happening:

The bounds check boils down to basically (length / N) * N <= length.
This can be transformed to length - length % N <= length (fair enough)
This is further transformed to (as the llvm ir shows) length % N <= length; (I don't see how this transform is equivalent, but OK)

And for some reason, LLVM can't seem to deduce that this comparison will always be false. Adding this as an assumption will make the check be optimized:

https://rust.godbolt.org/z/vM1KPM

@xldenis
Copy link
Contributor

xldenis commented Jul 31, 2020

@bugadani, what's weird is that if you feed this IR straight into llc it does optimize it away: https://rust.godbolt.org/z/o6d6nM. Are we not calling a magic pass on the IR in rustc?

Ignore me, I missed that the assume caused the IR to have ret true

However, it does optimize if a udiv is inserted instead of a urem, InstructionSimplify has the following opt: x udiv y <=u x I don't know why the same doesn't exist for urem? maybe no one considered it until now?

LLVM doesn't know about this optimization but it is apparently valid (thanks John Regehr) so I'll submit a patch to LLVM.

@xldenis
Copy link
Contributor

xldenis commented Aug 2, 2020

I've submitted a patch to LLVM adding this optimization: https://reviews.llvm.org/D85092

@tesuji
Copy link
Contributor

tesuji commented Aug 2, 2020

Does anyone want to file a bug against GCC? I registered an account two days ago but they hasn't accepted my account yet.

nikic pushed a commit to llvm/llvm-project that referenced this issue Aug 4, 2020
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: rust-lang/rust#74938

Differential Revision: https://reviews.llvm.org/D85092
hanswinderix pushed a commit to hanswinderix/llvm-project that referenced this issue Aug 5, 2020
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: rust-lang/rust#74938

Differential Revision: https://reviews.llvm.org/D85092
@xldenis
Copy link
Contributor

xldenis commented Aug 5, 2020

^ as you can see above this optimization has landed in LLVM master, so whenever LLVM is bumped we should get this for free.

@tesuji
Copy link
Contributor

tesuji commented Aug 5, 2020

Does it land with LLVM 11?

@xldenis
Copy link
Contributor

xldenis commented Aug 5, 2020

it's not in RC-1 which was released a week ago, I don't know if they will include it in a future release candidate or not.

@xldenis
Copy link
Contributor

xldenis commented Aug 7, 2020

This will be solved in LLVM 12 as my commit missed the cutoff (it was before the issue was even filed)

@jrmuizel
Copy link
Contributor

I filed the gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97529

@jrmuizel
Copy link
Contributor

jrmuizel commented Mar 2, 2021

I confirmed that with #81451 the bounds check is removed.

arichardson pushed a commit to arichardson/llvm-project that referenced this issue Mar 22, 2021
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: rust-lang/rust#74938

Differential Revision: https://reviews.llvm.org/D85092
@xldenis
Copy link
Contributor

xldenis commented Jun 25, 2021

Can we close this issue?

@mati865
Copy link
Contributor

mati865 commented Jan 14, 2022

@lcnr could you test again and close this issue?

@lcnr
Copy link
Contributor Author

lcnr commented Jan 14, 2022

this has been fixed, would be nice to add a codegen test for this, asserting that split_multiple can't panic

@lcnr lcnr added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Jan 14, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: rust-lang/rust#74938

Differential Revision: https://reviews.llvm.org/D85092
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
tesuji added a commit to tesuji/rustc that referenced this issue May 20, 2024
tesuji added a commit to tesuji/rustc that referenced this issue May 20, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 8, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 11, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 13, 2024
@bors bors closed this as completed in 7ac6c2f Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.