Suboptimal code generation for omittable bound checks #91109

loskutov · 2021-11-21T15:37:04Z

Compiler Explorer link: https://godbolt.org/z/KMTfY18Wf

I tried this code:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        assert!(d[i].len() == n);
        for j in 0..n {
            d[i][j] = 0;
        }
    }
}

With -C llvm-args=-enable-constraint-elimination, I expected the bound checks to be optimized out as they're performed manually by assert!. However, both assert and bound checks were present in the compiled code. The inner loop looks really dumb in terms of code generation:

.LBB0_5:
        cmp     rsi, rdx
        je      .LBB0_9
        mov     dword ptr [rcx + 4*rdx], 0
        add     rdx, 1
        cmp     rsi, rdx
        jne     .LBB0_5

rsi != rdx is the loop invariant, but it's verified on each iteration.

At first I suspected the issue to be entirely on the LLVM side, but bound checks are omitted in equivalent C++ code (which is also present at the Compiler Explorer link), and the whole inner loop is replaced with a single memset call.

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (a77da2d45 2021-11-19)
binary: rustc
commit-hash: a77da2d454e6caa227a85b16410b95f93495e7e0
commit-date: 2021-11-19
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0

The text was updated successfully, but these errors were encountered:

the8472 · 2021-11-21T16:11:38Z

To get memset as in the clang version you can replace

        for j in 0..n {
            d[i][j] = 0;
        }

with

d[i][0..n].fill(0);

loskutov · 2021-11-21T16:31:14Z

The code example is pretty synthetic and doesn't make much sense on its own; however, similar effects can be observed in slightly more complicated programs, e.g. Floyd—Warshall algorithm.

My point is that the missed optimization is pretty important in real-life programs, as it's not uncommon to be unable to avoid array indexing, and bounds-checking overhead is often claimed to be negligible as the checks are likely to be optimized out. And of course I expect them to be optimized out when they are optimized out by clang in an equivalent C++ program.

leonardo-m · 2021-11-21T17:44:20Z

There are multiple ways to solve the problem, like:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0 .. n {
        let row = &mut d[i][.. n];
        for j in 0 .. n {
            row[j] = 0;
        }
    }
}

What's the best way to solve this problem depends on the specific situation.

loskutov · 2021-11-21T18:25:25Z

True. However, the issue is about poor codegen for this particular piece of code, not about looking for a way to rewrite it.

leonardo-m · 2021-11-21T18:32:58Z

Your original code contains an assert, its purpose is to tell something to the optimizer. Often that helps, but you can't be sure the compiler will use the information given by the assert to its fullest. It's a "best effort" for the compiler. So you are waiting for the compiler to do what you want based on "best efforts".

scottmcm · 2021-11-27T09:14:08Z

Thanks for including the vector::at C++ version!

I suggest that you prefer re-slicing to asserts for avoiding bounds checks. For example, this one works:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        let di = &mut d[i][..n];
        for j in 0..n {
            di[j] = 0;
        }
    }
}

https://rust.godbolt.org/z/f19nasGPh

And that's a general strategy for improving bound-check-elimination. It's how the the new reverse implementation works, for example (#90821).

loskutov added the C-bug Category: This is a bug. label Nov 21, 2021

the8472 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Nov 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal code generation for omittable bound checks #91109

Suboptimal code generation for omittable bound checks #91109

loskutov commented Nov 21, 2021

the8472 commented Nov 21, 2021

loskutov commented Nov 21, 2021

leonardo-m commented Nov 21, 2021

loskutov commented Nov 21, 2021

leonardo-m commented Nov 21, 2021

scottmcm commented Nov 27, 2021

Suboptimal code generation for omittable bound checks #91109

Suboptimal code generation for omittable bound checks #91109

Comments

loskutov commented Nov 21, 2021

Meta

the8472 commented Nov 21, 2021

loskutov commented Nov 21, 2021

leonardo-m commented Nov 21, 2021

loskutov commented Nov 21, 2021

leonardo-m commented Nov 21, 2021

scottmcm commented Nov 27, 2021