Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete optimization with opt-level=z compared to clang for possible pre-compiled expressions #102312

Open
arctic-penguin opened this issue Sep 26, 2022 · 3 comments
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@arctic-penguin
Copy link

arctic-penguin commented Sep 26, 2022

I noted that rust does not apply some size optimizations when opt-level=z is supplied, whereas in c they are applied.

See here: https://godbolt.org/z/1955WjcT8

I tried this code:

#[no_mangle]
fn iterate() -> i32 {
    let mut result = 0;
    for i in 0..=100 {
        result += i;
    }
    result
}

With opt-level=3

iterate:
        mov     eax, 5050
        ret

With opt-level=z

iterate:
        xor     ecx, ecx
        xor     edx, edx
        xor     eax, eax
.LBB0_1:
        test    dl, dl
        jne     .LBB0_3
        lea     esi, [rcx + 1]
        cmp     ecx, 100
        sete    dl
        cmove   esi, ecx
        add     eax, ecx
        mov     ecx, esi
        jmp     .LBB0_1
.LBB0_3:
        ret

I would expect opt-level=z and opt-level=3 to have the same output for this fairly simple case.

In contrast, clang 15.0.0 does this:

int something() {
    int result = 0;
    for (int i=0; i<=100; i++) {
        result += i;
    }
    return result;
}

with -O3

something:                              # @something
        mov     eax, 5050
        ret

with -Oz

something:                              # @something
        mov     eax, 5050
        ret

Meta

rustc --version --verbose:

1.64.0 (godbolt.org), I assume that's a55dd71d5

I understand that the c code is far easier to optimize, but nevertheless the rust-produced assembly code is about 7 x as long.

@arctic-penguin arctic-penguin added the C-bug Category: This is a bug. label Sep 26, 2022
@Rageking8
Copy link
Contributor

@rustbot label +T-compiler +A-codegen +I-slow

@rustbot rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 26, 2022
@the8472
Copy link
Member

the8472 commented Sep 26, 2022

This is a known issue with RangeInclusive. Either use a regular Range or iterate via iter.for_each() instead of a for _ in iter loop.

#45222

@the8472
Copy link
Member

the8472 commented Sep 26, 2022

It's a bit surprising that 1.64 did manage to optimize it on O3 (but not O2) and then nightly and beta again even on O3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants