Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary on-stack copy when calling Box<FnOnce()> closure #61042

Closed
pftbest opened this issue May 22, 2019 · 2 comments
Closed

Unnecessary on-stack copy when calling Box<FnOnce()> closure #61042

pftbest opened this issue May 22, 2019 · 2 comments
Labels
A-closures Area: Closures (`|…| { … }`) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-unsized_locals `#![feature(unsized_locals)]` I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@pftbest
Copy link
Contributor

pftbest commented May 22, 2019

When calling a boxed closure the captured data is copied from heap to the stack before the call. It happens both with trait object and a concrete type inside the box.

#[inline(never)]
pub fn do_call<F>(b: Box<F>) where F: FnOnce() {
//pub fn do_call(b: Box<dyn FnOnce()>) {
    b();
}

pub unsafe fn foo(large: [u32; 1000]) {
    do_call(
        Box::new(move || {
            dummy(&large);
        })
    );
}

extern "C" {
    fn dummy(data: &[u32; 1000]);
}
example::do_call:
        push    r14
        push    rbx
        sub     rsp, 4008
        mov     rbx, rdi
        lea     r14, [rsp + 8]
        mov     edx, 4000
        mov     rdi, r14
        mov     rsi, rbx
        call    qword ptr [rip + memcpy@GOTPCREL]
        mov     rdi, r14
        call    qword ptr [rip + dummy@GOTPCREL]
        mov     esi, 4000
        mov     edx, 4
        mov     rdi, rbx
        call    qword ptr [rip + __rust_dealloc@GOTPCREL]
        add     rsp, 4008
        pop     rbx
        pop     r14
        ret

Godbolt link: https://godbolt.org/z/srCEBf

I think this extra memcpy is unnecessary, and it should be possible to call boxed closures without moving it out from the heap.

Similar C++ code doesn't do any extra copies in do_call (even when compiled without optimizations -O0), godbolt link: https://godbolt.org/z/VY8OLn

@jonas-schievink jonas-schievink added A-closures Area: Closures (`|…| { … }`) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 22, 2019
@qnighy
Copy link
Contributor

qnighy commented May 22, 2019

This is because the current implementation of unsized_locals #48055 is really naive. An unstable book article also explains this.

I've wanted to fix this too but I don't have enough time recently. It may require some sort of MIR optimization based on liveness or lifetimes.

@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

I have no idea what change caused this, but the memcpy is no longer present since rust 1.44: https://godbolt.org/z/E7o6bs

@nikic nikic closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-closures Area: Closures (`|…| { … }`) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-unsized_locals `#![feature(unsized_locals)]` I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants