Unnecessary memcpy caused by ordering of unwrap #56172

jrmuizel · 2018-11-22T21:33:57Z

In the following code f and g have these lines swapped:

    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();

Unwrapping later causes f to have an additional memcpy.

#[inline(never)]
pub fn f(clip: Option<&bool>) {
    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();
    do_item(&DI {
            item,
    });
}

#[inline(never)]
pub fn g(clip: Option<&bool>) {
    clip.unwrap();
    let item = SpecificDisplayItem::PopStackingContext;
    do_item(&DI {
            item,
    });
}

pub enum SpecificDisplayItem {
    PopStackingContext,
    Other([f64; 22]),
}

struct DI {
    item: SpecificDisplayItem,
}


fn do_item(di: &DI) { unsafe { ext(di) } }
extern {
    fn ext(di: &DI);
}

Compiles to:

example::f:
        sub     rsp, 360
        test    rdi, rdi
        je      .LBB0_1
        mov     qword ptr [rsp], 0
        lea     rdi, [rsp + 8]
        lea     rsi, [rsp + 184]
        mov     edx, 176
        call    memcpy@PLT
        mov     rdi, rsp
        call    ext@PLT
        add     rsp, 360
        ret
.LBB0_1:
        lea     rdi, [rip + .Lbyte_str.2]
        call    core::panicking::panic@PLT
        ud2

example::g:
        sub     rsp, 184
        test    rdi, rdi
        je      .LBB1_1
        mov     qword ptr [rsp], 0
        mov     rdi, rsp
        call    ext@PLT
        add     rsp, 184
        ret
.LBB1_1:
        lea     rdi, [rip + .Lbyte_str.2]
        call    core::panicking::panic@PLT
        ud2

Ideally, f and g should compile to the same thing.

The text was updated successfully, but these errors were encountered:

LLVM's ability to eliminate memcpy's across basic blocks is bad. Because of this, we want to make sure that we avoid putting branches in code where we want the memcpy elimination to happen. rust-lang/rust#56172 has a reduced test case of this happening. This change lifts the branch caused by unwrap() above the creation of the SpecificDisplayItem. It ends up saving a memcpy of 127 bytes along with reducing pop_reference_frame by 18 instructions.

Improve code quality for push_new_empty_item. LLVM's ability to eliminate memcpy's across basic blocks is bad. Because of this, we want to make sure that we avoid putting branches in code where we want the memcpy elimination to happen. rust-lang/rust#56172 has a reduced test case of this happening. This change lifts the branch caused by unwrap() above the creation of the SpecificDisplayItem. It ends up saving a memcpy of 127 bytes along with reducing pop_reference_frame by 18 instructions.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/3341)

lqd · 2018-11-23T10:27:53Z

cc @rust-lang/wg-codegen

eddyb · 2018-11-23T10:31:05Z

IIRC @pcwalton found that LLVM only elides copies in the same basic block (and unwrap can panic, so it terminates a basic block, splitting the code before and after into separate basic blocks).

nox · 2018-11-23T10:32:26Z

Yes, cf. @jrmuizel's PR on Webrender:

LLVM's ability to eliminate memcpy's across basic blocks is bad.

nikic · 2018-12-01T11:57:35Z

There were some attempts to make memcpyopt work across BBs, but they were reverted due to regressions, see https://bugs.llvm.org/show_bug.cgi?id=35519.

dotdash · 2020-06-26T14:58:55Z

It might be worth noting that the memcpy here is especially pointless because it copies uninitialized memory. When the memcpy optimizations fold a memset into a memcpy it checks whether the memcpy copies more memory than what has been memset, and if so and the remainder is uninitialized, the memcpy for the uninitialized part is dropped.

In this case here, SROA splits the alloca for the item, into the discriminant part, and 176 uninitialized bytes. But since it runs without memory dependency analysis, is has no easy way to drop the memcpy for the uninitialized part. And before the MemCpy pass can kill the memcpy, the inliner comes and breaks the code into multiple basic blocks. So for this constellation, it would help to run the MemCpy pass before the inliner, but I have no real idea what tradeoff that makes.

Another option would be to add an optimization that drops memcpys from uninitialized sources to a pass that does cross-bb memory dependence analysis anyway, but a quick look didn't reveal any obvious place to do that. Given the rather common use of enums like this, where only some variants have a payload, that might be worth a try though.

oli-obk · 2020-06-26T15:23:27Z

This may get resolved with #72632 by doing the necessary work on MIR.

dotdash · 2020-06-26T15:58:08Z

The approach from #72632 breaks if you assign the same source to multiple destinations, because there's no simple chain that can be reduced to a single destination. I think you need to do copy-propagation (replacing the destination with the source, instead of the other way around) to handle that.

The following doesn't get properly optimized by #72632, but is handled by the memcpy pass being run before the inliner. That approach of course also doesn't handle all the cases, thus the proposal for the optimization to catch copies from uninitialized memory.

#[inline(never)]
pub fn f(clip: Option<&bool>) {
    let item = SpecificDisplayItem::PopStackingContext;
    clip.unwrap();
    do_item(&DI {
            item,
    });
   do_item(&DI {
            item,
    });
}}

In fact #72632 even stops the patched (MemCpyOpt before Inliner) LLVM from optimizing this version, because SROA can no longer split the alloca and so there's no memcpy that copies only uninitialized memory. For the modified f function #72632 still produces better code than nightly, but if you apply the same change to g, then nightly produces a properly optimized version, while dest-prop causes the lifetimes of the two DI instances to overlap, forcing double stack usage and a memcpy.

Edit: I'm not trying to criticize the work that went into #72632. I didn't notice the comment on the PR (and didn't realize it had progressed so far) when I started to look into this. After being made aware of it, I wanted to see that it works for myself and got confused by the way that optimization pass works because I (for some reason) always assumed it would do copy propagation and was just named weirdly, so I tried to figure out why it does things the way it does, and noticed that it breaks this modified example.

jrmuizel · 2021-03-12T20:07:57Z

Fixed by #82806

jrmuizel mentioned this issue Nov 22, 2018

Improve code quality for push_new_empty_item. servo/webrender#3341

Merged

nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Dec 1, 2018

jrmuizel mentioned this issue Jun 10, 2020

Implement a generic Destination Propagation optimization on MIR #72632

Merged

jonas-schievink added A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-mir-opt-nrvo Fixed by NRVO labels Jun 10, 2020

jrmuizel mentioned this issue Jun 10, 2020

Tracking issue for Rust codegen optimizations wanted by Gecko jrmuizel/gecko-rust-codegen-opt#1

Open

tgnottingham mentioned this issue Oct 14, 2020

An extra memcpy with -Zmir-opt-level=2 #77613

Closed

jrmuizel mentioned this issue Mar 6, 2021

Enable MemorySSA in MemCpyOpt #82806

Merged

jrmuizel closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary memcpy caused by ordering of unwrap #56172

Unnecessary memcpy caused by ordering of unwrap #56172

jrmuizel commented Nov 22, 2018

lqd commented Nov 23, 2018

eddyb commented Nov 23, 2018

nox commented Nov 23, 2018

nikic commented Dec 1, 2018

dotdash commented Jun 26, 2020

oli-obk commented Jun 26, 2020

dotdash commented Jun 26, 2020 •

edited

Loading

jrmuizel commented Mar 12, 2021

Unnecessary memcpy caused by ordering of unwrap #56172

Unnecessary memcpy caused by ordering of unwrap #56172

Comments

jrmuizel commented Nov 22, 2018

lqd commented Nov 23, 2018

eddyb commented Nov 23, 2018

nox commented Nov 23, 2018

nikic commented Dec 1, 2018

dotdash commented Jun 26, 2020

oli-obk commented Jun 26, 2020

dotdash commented Jun 26, 2020 • edited Loading

jrmuizel commented Mar 12, 2021

dotdash commented Jun 26, 2020 •

edited

Loading