Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range.contains failed to be inlined/optimized #90609

Closed
senevoldsen opened this issue Nov 5, 2021 · 7 comments
Closed

Range.contains failed to be inlined/optimized #90609

senevoldsen opened this issue Nov 5, 2021 · 7 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@senevoldsen
Copy link

I was suggested on Stack Overflow (https://stackoverflow.com/questions/69844819/rust-range-contains-failed-to-be-inlined-optimized) to ask here.

I am aware that optimization in complex situations can fail to apply. However, rather straightforward inlining "in the small" should still apply.

I was running my code through Clippy and it suggested changing the following:

const SPECIAL_VALUE: u8 = 0; // May change eventually.

pub fn version1(value: u8) -> bool {
    (value >= 1 && value <= 9) || value == SPECIAL_VALUE
}

Into

pub fn version2(value: u8) -> bool {
    (1..=9).contains(&value) || value == SPECIAL_VALUE
}

Since it is more readable. Unfortunately the resulting assembly output is twice as long, even with optimization level 3. Manually inlining it (2-nestings down), gives almost the same code as version1 and is as efficient.

pub fn manually_inlined(value: u8) -> bool {
    (1 <= value && value <= 9) || value == SPECIAL_VALUE
}

If I remove the || value == SPECIAL_VALUE they all resolve with the same (though with 1 more instruction added to decrement the parameter value before a compare). Also if I change SPECIAL_VALUE to something not adjacent to the range they all resolve to same assembly code as version2, which is the reason why I kept it 0 unless I eventually have to change it.

I have a link to Godbolt with the code here: https://rust.godbolt.org/z/d9PWYEKc8

Why is the compiler failing to properly inline/optimize version2? Is it an "optimization bug"? Or am I misunderstanding some semantics of Rust, maybe something with the borrowing prevents the optimization, but can't the compiler assume no mutation of value due to the aliasing and referencing rules? Because the optimization is applied in version1 it would suggest LLVM knows that because the value is unsigned it can simplify the comparison. So it may be that there is a missed optimization opportunity in the Rust frontend?

Trying to do something similar in C++ gives the optimum short assembly in GCC but not in Clang https://godbolt.org/z/erYPYsvhf

@the8472
Copy link
Member

the8472 commented Nov 5, 2021

Are you sure you posted the right links? Inlining does happen in the godbolt examples, there are no call instructions.

@senevoldsen
Copy link
Author

Maybe I was confusing/incorreect with my use of the term 'inlining'. Yes, the assembly has no call instruction, but the optimizer fails to "merge" the checks when I use contains, but I manually inline it in the source code it does see it is just 0 <= value <= 9.

@nikic
Copy link
Contributor

nikic commented Nov 5, 2021

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 5, 2021
@nikic
Copy link
Contributor

nikic commented Nov 9, 2021

https://reviews.llvm.org/D113510

@nikic nikic self-assigned this Nov 9, 2021
@senevoldsen
Copy link
Author

From what I can gather @nikic made an improvement to LLVM optimizer that fixes this issue (and perhaps more), and this will eventually make it into Rust? Can this issue then be closed? And if so does you @nikic want to answer the linked S.O. post?

@ocstl
Copy link

ocstl commented Feb 1, 2022

While I believe this issue has been resolved, I'm seeing an odd discrepancy between versions 1.52 and later versions depending on whether the special value is added at the left or right end of the range (see the first two rows of https://rust.godbolt.org/z/Ea149cjYo).

It looks like, starting with version 1.53, there is an improvement at the left end of the range in only one of the two function versions (the one without contains).

@nikic
Copy link
Contributor

nikic commented Feb 19, 2022

Fixed by the LLVM upgrade in #93577.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants