-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc (>= 1.20.0) fails to optimize moves in trivial cases #63631
Comments
Note that corresponding C++ code is NOT expected to optimize due to the braindead aliasing model in C++. Rust aliasing rules are more reasonable allowing more optimization opportunities, including the case presented above. In Rust, The ultimate cause of the misoptimization can still be aliasing-related given the long history of poorly optimized (or entirely broken) code for LLVM Edit: However, relying on LLVM's move optimizations is unacceptable, in my opinion, even if the root cause here was an LLVM bug. As a Rust user, I expect rustc to eliminate unnecessary moves in emitted LLVM-IR (at least in easy cases) given the importance of move semantics as a core feature of the Rust language. Another pragmatic argument is that rustc has extensive context knowledge for optimizing moves more efficiently than LLVM. TLDR: LLVM may resolve this and similar optimization issues in future, but rustc should still offer basic move optimizations because move semantics are a core feature of Rust. |
after bisecting this example I have found that it is due to the PR #42313 that the regression starts |
pub fn got() -> Vec<u32> {
let mut res = Vec::new();
let s1 = vec![1, 2, 3, 4];
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1);
res
}
pub fn expect() -> Vec<u32> {
let mut res = Vec::new();
let s1 = vec![1, 2, 3, 4];
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res
}
|
I managed to create a simplified test case using only primitive types Let's start with the simplified test case based on primitive types and borrows: #[no_mangle]
pub fn got() {
let x: u64 = 0x0123456789ABCDEF;
show(&x); let x = x;
show(&x); let x = x;
show(&x);
}
#[no_mangle]
pub fn expect() {
let x: u64 = 0x0123456789ABCDEF;
show(&x);
show(&x);
show(&x);
}
#[no_mangle]
#[inline(never)]
fn show(x: &u64) { println!("(0x{:x})0x{:x}", x as *const _ as usize, x); } Optimized assembly produced by $ rustc -C opt-level=3 -Z mir-opt-level=3 --crate-type=dylib poc.rs
$ r2 -qc 's sym.got;af;afv-*;pdf;s sym.expect;af;afv-*;pdf' libpoc.so
┌ 69: sym.got ();
│ 0x000476e0 4156 push r14
│ 0x000476e2 53 push rbx
│ 0x000476e3 4883ec18 sub rsp, 0x18
│ 0x000476e7 49beefcdab8967452301 movabs r14, 0x123456789abcdef
│ 0x000476f1 4c89742408 mov qword [rsp + 8], r14
│ 0x000476f6 488b1d5b410b00 mov rbx, qword [reloc.show]
│ 0x000476fd 488d7c2408 lea rdi, [rsp + 8]
│ 0x00047702 ffd3 call rbx
│ 0x00047704 4c893424 mov qword [rsp], r14
│ 0x00047708 4889e7 mov rdi, rsp
│ 0x0004770b ffd3 call rbx
│ 0x0004770d 488b0424 mov rax, qword [rsp]
│ 0x00047711 4889442410 mov qword [rsp + 0x10], rax
│ 0x00047716 488d7c2410 lea rdi, [rsp + 0x10]
│ 0x0004771b ffd3 call rbx
│ 0x0004771d 4883c418 add rsp, 0x18
│ 0x00047721 5b pop rbx
│ 0x00047722 415e pop r14
└ 0x00047724 c3 ret
┌ 54: sym.expect ();
│ 0x00047730 4156 push r14
│ 0x00047732 53 push rbx
│ 0x00047733 50 push rax
│ 0x00047734 48b8efcdab8967452301 movabs rax, 0x123456789abcdef
│ 0x0004773e 48890424 mov qword [rsp], rax
│ 0x00047742 4c8b350f410b00 mov r14, qword [reloc.show]
│ 0x00047749 4889e3 mov rbx, rsp
│ 0x0004774c 4889df mov rdi, rbx
│ 0x0004774f 41ffd6 call r14
│ 0x00047752 4889df mov rdi, rbx
│ 0x00047755 41ffd6 call r14
│ 0x00047758 4889df mov rdi, rbx
│ 0x0004775b 41ffd6 call r14
│ 0x0004775e 4883c408 add rsp, 8
│ 0x00047762 5b pop rbx
│ 0x00047763 415e pop r14
└ 0x00047765 c3 ret Rust Playground link. Running
|
Still an issue, https://godbolt.org/z/sKWPx7bon 1.60.0-nightly (17d29dc 2022-01-21) example for u64 type got:
push rbx
sub rsp, 32
movabs rax, 81985529216486895
mov qword ptr [rsp + 8], rax
mov rbx, qword ptr [rip + show@GOTPCREL]
lea rdi, [rsp + 8]
call rbx
mov rax, qword ptr [rsp + 8]
mov qword ptr [rsp + 16], rax
lea rdi, [rsp + 16]
call rbx
mov rax, qword ptr [rsp + 16]
mov qword ptr [rsp + 24], rax
lea rdi, [rsp + 24]
call rbx
add rsp, 32
pop rbx
ret expect:
push r14
push rbx
push rax
movabs rax, 81985529216486895
mov qword ptr [rsp], rax
mov r14, qword ptr [rip + show@GOTPCREL]
mov rbx, rsp
mov rdi, rbx
call r14
mov rdi, rbx
call r14
mov rdi, rbx
call r14
add rsp, 8
pop rbx
pop r14
ret |
On the LLVM side, one of the problems here is that this does not optimize: https://llvm.godbolt.org/z/d8KP7rqKe The pointer is not captured before the call, and the pointer is readonly at the call, so this would be safe. But LLVM currently doesn't distinguish between a capture before and at the call. |
The example code below generates extra stack copies of String (meta)data in function
got()
which is expected to produce identical optimized code withexpect()
. For quickly verifying the issue, comparesub rsp, $FRAME_SIZE
instructions which initialize stack frames in the beginning ofgot()
&expect()
functions compiled with-C opt-level=3
(or measure & compare the generated code sizes). Rust Playground link.rustc versions before 1.20.0 produce expected optimized assembly.
The text was updated successfully, but these errors were encountered: