-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ScalarPair for tagged enums #49420
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
src/librustc/ty/layout.rs
Outdated
FieldPlacement::Arbitrary { ref offsets, .. } => offsets, | ||
_ => bug!(), | ||
}; | ||
let mut fields = field_layouts.iter().zip(offsets).filter(|p| !p.0.is_zst()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi nox ❤️
tidy error: /checkout/src/librustc/ty/layout.rs:1682: line longer than 100 chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thank you.
@@ -139,6 +139,12 @@ pub fn return_slice(x: &[u16]) -> &[u16] { | |||
x | |||
} | |||
|
|||
// CHECK: i64 @enum_id(i64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this pre-existing? You'd have to take in two arguments and return { i32, i32 }
for the ScalarPair
to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I changed it to Result<i64, i64>
.
@@ -139,6 +139,12 @@ pub fn return_slice(x: &[u16]) -> &[u16] { | |||
x | |||
} | |||
|
|||
// CHECK: { i64, i64 } @enum_id(i64 %x.0, i64 %x.1) | |||
#[no_mangle] | |||
pub fn enum_id(x: Result<i64, i64>) -> Result<i64, i64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does Option
around this create problems? Does it require the combination of your 2 PRs? Can we wait to land one and actually have an Option<Result<i64, i64>>
test too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My other PR landed, let me rebase this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, with the combination of the two optimisations, Option<Result<i64, i64>>
also ends up returned as { i64, i64 }
.
r? @eddyb |
Cc @rkruppe. For #49437, this produces: ; bar::foo
; Function Attrs: norecurse nounwind readnone uwtable
define { i32, i32 } @_ZN3bar3foo17h361da50dc86c3b65E(i32, i32) unnamed_addr #0 {
start:
%switch = icmp eq i32 %0, 0
%. = zext i1 %switch to i32
%2 = insertvalue { i32, i32 } undef, i32 %., 0
%3 = insertvalue { i32, i32 } %2, i32 %1, 1
ret { i32, i32 } %3
}
; bar::bar
; Function Attrs: norecurse nounwind readnone uwtable
define { i32, i32 } @_ZN3bar3bar17h01d34a35402b7c22E(i32 %x.0, i32 %x.1) unnamed_addr #0 {
start:
%switch.i = icmp ne i32 %x.0, 0
%. = zext i1 %switch.i to i32
%0 = insertvalue { i32, i32 } undef, i32 %., 0
%1 = insertvalue { i32, i32 } %0, i32 %x.1, 1
ret { i32, i32 } %1
} For #38349, this produces: ; Function Attrs: norecurse nounwind readnone uwtable
define { i32, i32 } @id_result(i32, i32) unnamed_addr #0 {
start:
%switch = icmp ne i32 %0, 0
%. = zext i1 %switch to i32
%2 = insertvalue { i32, i32 } undef, i32 %., 0
%3 = insertvalue { i32, i32 } %2, i32 %1, 1
ret { i32, i32 } %3
} @eddyb says it's an LLVM limitation: it can't put range metadata on arguments. |
That matches my knowledge. We could add an |
Ping from triage, @eddyb — it looks like there are new commits for you. |
src/librustc/ty/layout.rs
Outdated
} | ||
} | ||
if let Some((prim, _)) = common_prim { | ||
Abi::ScalarPair(discr.clone(), scalar_unit(prim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to do the dance where you use scalar_pair
and check that the offset
inside common_prim
matches that computed by scalar_pair
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what I do line 1717?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check that they all have the same offset, but that might still not be the same as the offset that scalar_pair
would compute for the second scalar (the one from common_prim
).
Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem. Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Can someone reproduce this on Linux? I can't reproduce the failure, but it seems to me like the generated code with this PR is way worse. |
Disregard my last comment, it took me an embarrassingly long time to notice the test changed upstream. |
Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem. Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
I'll try to reduce the failing test that causes some memset optimisation to not trigger in LLVM. |
☔ The latest upstream changes (presumably #49981) made this pull request unmergeable. Please resolve the merge conflicts. |
Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem. Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors-servo r=eddyb pub fn enum_id_1(x: Option<Result<u16, u16>>) -> Option<Result<u16, u16>> { |
@nox: 🔑 Insufficient privileges: Not in reviewers |
@bors r+ |
📌 Commit 3ca6ad9 has been approved by |
⌛ Testing commit 3ca6ad9 with merge d1f3dcff5e7374bcc97d02309d41aef96a36c02f... |
💔 Test failed - status-appveyor |
This comment has been minimized.
This comment has been minimized.
@bors retry AppVeyor problem:
|
☀️ Test successful - status-appveyor, status-travis |
mem::swap the obvious way for types smaller than the SIMD optimization's block size LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion. Examples of the improvements: <details> <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary> ```rust type Demo = [u16; 3]; pub fn swap_demo(x: &mut Demo, y: &mut Demo) { std::mem::swap(x, y); } ``` nightly: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: .seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE sub rsp, 32 .seh_stackalloc 32 .seh_endprologue movzx eax, word ptr [rcx + 4] mov word ptr [rsp + 4], ax mov eax, dword ptr [rcx] mov dword ptr [rsp], eax movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax movzx eax, word ptr [rsp + 4] mov word ptr [rdx + 4], ax mov eax, dword ptr [rsp] mov dword ptr [rdx], eax add rsp, 32 ret .seh_handlerdata .section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE .seh_endproc ``` this PR: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: mov r8d, dword ptr [rcx] movzx r9d, word ptr [rcx + 4] movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax mov word ptr [rdx + 4], r9w mov dword ptr [rdx], r8d ret ``` </details> <details> <summary>`replace_with` optimizes down much better</summary> Inspired by rust-lang/rfcs#2490, ```rust fn replace_with<T, F>(x: &mut Option<T>, f: F) where F: FnOnce(Option<T>) -> Option<T> { *x = f(x.take()); } pub fn inc_opt(mut x: &mut Option<i32>) { replace_with(&mut x, |i| i.map(|j| j + 1)); } ``` Rust 1.26.0: ```asm _ZN4blah7inc_opt17heb0acb64c51777cfE: mov rax, qword ptr [rcx] movabs r8, 4294967296 add r8, rax shl rax, 32 movabs rdx, -4294967296 and rdx, r8 xor r8d, r8d test rax, rax cmove rdx, rax setne r8b or rdx, r8 mov qword ptr [rcx], rdx ret ``` Nightly (better thanks to ScalarPair, maybe?): ```asm _ZN4blah7inc_opt17h66df690be0b5899dE: mov r8, qword ptr [rcx] mov rdx, r8 shr rdx, 32 xor eax, eax test r8d, r8d setne al add edx, 1 mov dword ptr [rcx], eax mov dword ptr [rcx + 4], edx ret ``` This PR: ```asm _ZN4blah7inc_opt17h1426dc215ecbdb19E: xor eax, eax cmp dword ptr [rcx], 0 setne al mov dword ptr [rcx], eax add dword ptr [rcx + 4], 1 ret ``` Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)). </details>
mem::swap the obvious way for types smaller than the SIMD optimization's block size LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion. Examples of the improvements: <details> <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary> ```rust type Demo = [u16; 3]; pub fn swap_demo(x: &mut Demo, y: &mut Demo) { std::mem::swap(x, y); } ``` nightly: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: .seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE sub rsp, 32 .seh_stackalloc 32 .seh_endprologue movzx eax, word ptr [rcx + 4] mov word ptr [rsp + 4], ax mov eax, dword ptr [rcx] mov dword ptr [rsp], eax movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax movzx eax, word ptr [rsp + 4] mov word ptr [rdx + 4], ax mov eax, dword ptr [rsp] mov dword ptr [rdx], eax add rsp, 32 ret .seh_handlerdata .section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE .seh_endproc ``` this PR: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: mov r8d, dword ptr [rcx] movzx r9d, word ptr [rcx + 4] movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax mov word ptr [rdx + 4], r9w mov dword ptr [rdx], r8d ret ``` </details> <details> <summary>`replace_with` optimizes down much better</summary> Inspired by rust-lang/rfcs#2490, ```rust fn replace_with<T, F>(x: &mut Option<T>, f: F) where F: FnOnce(Option<T>) -> Option<T> { *x = f(x.take()); } pub fn inc_opt(mut x: &mut Option<i32>) { replace_with(&mut x, |i| i.map(|j| j + 1)); } ``` Rust 1.26.0: ```asm _ZN4blah7inc_opt17heb0acb64c51777cfE: mov rax, qword ptr [rcx] movabs r8, 4294967296 add r8, rax shl rax, 32 movabs rdx, -4294967296 and rdx, r8 xor r8d, r8d test rax, rax cmove rdx, rax setne r8b or rdx, r8 mov qword ptr [rcx], rdx ret ``` Nightly (better thanks to ScalarPair, maybe?): ```asm _ZN4blah7inc_opt17h66df690be0b5899dE: mov r8, qword ptr [rcx] mov rdx, r8 shr rdx, 32 xor eax, eax test r8d, r8d setne al add edx, 1 mov dword ptr [rcx], eax mov dword ptr [rcx + 4], edx ret ``` This PR: ```asm _ZN4blah7inc_opt17h1426dc215ecbdb19E: xor eax, eax cmp dword ptr [rcx], 0 setne al mov dword ptr [rcx], eax add dword ptr [rcx + 4], 1 ret ``` Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)). </details>
No description provided.