-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust can't serialize 11 fields efficiently #45068
Comments
Wild guess: A larger function trips the inlining thresholds differently, causing a pass ordering issue. |
Running llvm trunk opt doesn't fix it. @sunfishcode any ideas what might be happening here? |
I'm looking into it (building a compiler with the appropriate patch now). In the mean time, just to be clear, the problem here is the
updating the length field of the |
Exactly. |
It's due to the hard-coded limit in CaptureTracking here. The code does this:
and the key question is whether that store writes to the memory read by the following load. The |
Is there something we can do to make llvm's job easier? |
Could we up the limit as a compiler option here? For example, MemorySSA has a compiler option for this: http://llvm.org/doxygen/MemorySSA_8cpp.html#a5926ddc0f7c4225c6ced440baa2fb7a3 I know this is the kind of thing we can't select a good value for in general, but for WebRender we could select a value that we know always produces good serde codegen for us. |
@jrmuizel Doing this optimization ahead of time in rustc on MIR would help. It's easier for rustc to tell that pointers don't alias. Unfortunately, MIR optimizations are missing a lot of basic infrastructure to enable this kind of thing; for example, they can't do inlining or SROA yet… |
I'll write the LLVM patch to add an option if nobody objects. |
Patch is up: https://reviews.llvm.org/D38648 |
In the mean time I can work around this by making it so that instead of modifying length every write we just move the pointer and compute the length at the end. struct UnsafeVecWriter(*mut u8);
impl Write for UnsafeVecWriter {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
unsafe {
ptr::copy_nonoverlapping(buf.as_ptr(), self.0, buf.len());
self.0 = self.0.offset(buf.len() as isize);
}
Ok(buf.len())
}
fn flush(&mut self) -> io::Result<()> { Ok(()) }
} |
Instead of changing the length every write we just adjust the pointer. This avoids rust-lang/rust#45068. However we now need to ensure that we set the length when we are done.
@jrmuizel Does WR have good serialization codegen now? |
Instead of changing the length every write we just adjust the pointer. This avoids rust-lang/rust#45068. However we now need to ensure that we set the length when we are done.
Instead of changing the length every write we just adjust the pointer. This avoids rust-lang/rust#45068. However we now need to ensure that we set the length when we are done. With this patch only 1.2% of WebRender display list building is spent in serialization.
It's pretty good but could be better. If I run opt -O2 on the ir it improves: movl (%rsi), %ecx
movl %ecx, (%rax)
movl 4(%rsi), %ecx
movl %ecx, 4(%rax)
movl 8(%rsi), %ecx
movl %ecx, 8(%rax)
movl 12(%rsi), %ecx
movl %ecx, 12(%rax)
movl 16(%rsi), %ecx
movl %ecx, 16(%rax)
movl 20(%rsi), %ecx
movl %ecx, 20(%rax)
movl 24(%rsi), %ecx
movl %ecx, 24(%rax)
movl 28(%rsi), %ecx
movl %ecx, 28(%rax)
movl 32(%rsi), %ecx
movl %ecx, 32(%rax)
movl 36(%rsi), %ecx
movl %ecx, 36(%rax)
movl 40(%rsi), %ecx
movl %ecx, 40(%rax) after: movups (%rsi), %xmm0
movups %xmm0, (%rax)
movups 16(%rsi), %xmm0
movups %xmm0, 16(%rax)
movl 32(%rsi), %ecx
movl %ecx, 32(%rax)
movl 36(%rsi), %ecx
movl %ecx, 36(%rax)
movl 40(%rsi), %ecx
movl %ecx, 40(%rax) |
Seems like a pass ordering issue. Perhaps the new pass manager in LLVM will help someday… Or we could fiddle with the pass ordering ourselves. |
Make the writer even more unsafe. Instead of changing the length every write we just adjust the pointer. This avoids rust-lang/rust#45068. However we now need to ensure that we set the length when we are done. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/1830) <!-- Reviewable:end -->
FWIW, it looks like the movups vs movl 24(%rsi), %ecx problem can be solved by building with opt-level=3 |
I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1413285 on Mozilla side about this. |
https://rust.godbolt.org/z/hfG734fGf Replacing tuple with array and iterating over it eliminates bad codegen, even with |
Using noalias (#45012) lets rust generate much better code for the serialization of 10 fields in good_bake_bytes() however it falls back to terrible with the 11 fields of bad_bake_bytes()
The text was updated successfully, but these errors were encountered: