-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient codegen when accessing a vector with literal indices #50759
Comments
There's not much that can be done because rust is preserving the semantics that depending on which byte you access, you'll get a different panic. If you add a 100001cfd: movq %rdi, %r15 ; save main() Result out param
100001d00: leaq 307925(%rip), %rsi ; load str constant
100001d07: leaq -96(%rbp), %rbx ; create read() out param
100001d0b: movl $9, %edx ; length of str constant
100001d10: movq %rbx, %rdi ; load read() out param
100001d13: callq -1544 <__ZN3std2fs4read17h5248aa1330795a98E>
100001d18: cmpq $1, -96(%rbp) ; check Result discriminant
100001d1d: jne 20 <__ZN6_507594main17hb99f5070155b7525E+0x43>
100001d1f: movq 8(%rbx), %rax ; move Error case into out param
100001d23: movq 16(%rbx), %rcx
100001d27: movq %rcx, 8(%r15)
100001d2b: movq %rax, (%r15)
100001d2e: jmp 149 <__ZN6_507594main17hb99f5070155b7525E+0xD8>
100001d33: movq 24(%rbx), %rax ; move Vec into local stack variable h
100001d37: movq %rax, -32(%rbp)
100001d3b: movq 8(%rbx), %rax
100001d3f: movq 16(%rbx), %rcx
100001d43: movq %rcx, -40(%rbp)
100001d47: movq %rax, -48(%rbp)
100001d4b: cmpq $15, -32(%rbp) ; assert length of vec (be = below/eq)
100001d50: jbe 128 <__ZN6_507594main17hb99f5070155b7525E+0xE6>
100001d56: movq -48(%rbp), %r14 ; get Vec ptr
100001d5a: movq (%r14), %rax ; copy 16 bytes
100001d5d: movq 8(%r14), %rcx
100001d61: movq %rax, -112(%rbp)
100001d65: movq %rcx, -104(%rbp) |
I'd also say that this is working as intended: given a short h, if you don't assert beforehand, the code can panic at different points, with a different panic message depending on h.len(). |
I agree that this is somewhat expected, but LLVM could still do better nonetheless. What you get for this code is essentially a long list of |
Interesting idea, but seems a bit difficult to generalize this even a little bit (if you want to avoid regressions). While each of the branches is unlikely individually, they might be statistically independent for all LLVM knows, and with a large number of independent unlikely branches it quickly becomes rather likely that at least one of them is taken. For the specific case of "a bunch of constants checked against a common upper bound" you could pick the maximum of the constants and reason that, since |
Today's nightly gives:
|
No real difference as far as I can tell. It still has the 16 separate bounds checks at the start. |
My sense is that this isn't likely to lead to changes. The only proposed way of making the generated asm better is to inject the length comparison to skip a bunch of individual comparisons in the happy path, but as mentioned it's not clear how LLVM or rustc could gather the needed information that it's the right thing to do in this particular case. Users also have an easy way (via a manual assert!) to get LLVM to avoid individual bounds checks, which seems like the way to go to me rather than depending on an inherently flaky optimization for presumably critical code (otherwise the predictable branches are probably close to free). I'm going to go ahead and close. |
The
let b = ...
statement gets compiled to (1.26.0 stable, release mode):I was really hoping the optimizer would be able to do a single length check followed by a the equivalent of a memcpy instead.
The text was updated successfully, but these errors were encountered: