-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary memcpy caused by ordering of unwrap #56172
Comments
LLVM's ability to eliminate memcpy's across basic blocks is bad. Because of this, we want to make sure that we avoid putting branches in code where we want the memcpy elimination to happen. rust-lang/rust#56172 has a reduced test case of this happening. This change lifts the branch caused by unwrap() above the creation of the SpecificDisplayItem. It ends up saving a memcpy of 127 bytes along with reducing pop_reference_frame by 18 instructions.
Improve code quality for push_new_empty_item. LLVM's ability to eliminate memcpy's across basic blocks is bad. Because of this, we want to make sure that we avoid putting branches in code where we want the memcpy elimination to happen. rust-lang/rust#56172 has a reduced test case of this happening. This change lifts the branch caused by unwrap() above the creation of the SpecificDisplayItem. It ends up saving a memcpy of 127 bytes along with reducing pop_reference_frame by 18 instructions. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3341) <!-- Reviewable:end -->
cc @rust-lang/wg-codegen |
IIRC @pcwalton found that LLVM only elides copies in the same basic block (and |
Yes, cf. @jrmuizel's PR on Webrender:
|
There were some attempts to make memcpyopt work across BBs, but they were reverted due to regressions, see https://bugs.llvm.org/show_bug.cgi?id=35519. |
It might be worth noting that the memcpy here is especially pointless because it copies uninitialized memory. When the memcpy optimizations fold a memset into a memcpy it checks whether the memcpy copies more memory than what has been memset, and if so and the remainder is uninitialized, the memcpy for the uninitialized part is dropped. In this case here, SROA splits the alloca for the Another option would be to add an optimization that drops memcpys from uninitialized sources to a pass that does cross-bb memory dependence analysis anyway, but a quick look didn't reveal any obvious place to do that. Given the rather common use of enums like this, where only some variants have a payload, that might be worth a try though. |
This may get resolved with #72632 by doing the necessary work on MIR. |
The approach from #72632 breaks if you assign the same source to multiple destinations, because there's no simple chain that can be reduced to a single destination. I think you need to do copy-propagation (replacing the destination with the source, instead of the other way around) to handle that. The following doesn't get properly optimized by #72632, but is handled by the memcpy pass being run before the inliner. That approach of course also doesn't handle all the cases, thus the proposal for the optimization to catch copies from uninitialized memory. #[inline(never)]
pub fn f(clip: Option<&bool>) {
let item = SpecificDisplayItem::PopStackingContext;
clip.unwrap();
do_item(&DI {
item,
});
do_item(&DI {
item,
});
}} In fact #72632 even stops the patched (MemCpyOpt before Inliner) LLVM from optimizing this version, because SROA can no longer split the alloca and so there's no memcpy that copies only uninitialized memory. For the modified Edit: I'm not trying to criticize the work that went into #72632. I didn't notice the comment on the PR (and didn't realize it had progressed so far) when I started to look into this. After being made aware of it, I wanted to see that it works for myself and got confused by the way that optimization pass works because I (for some reason) always assumed it would do copy propagation and was just named weirdly, so I tried to figure out why it does things the way it does, and noticed that it breaks this modified example. |
Fixed by #82806 |
In the following code
f
andg
have these lines swapped:Unwrapping later causes
f
to have an additional memcpy.Compiles to:
Ideally,
f
andg
should compile to the same thing.The text was updated successfully, but these errors were encountered: