Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At opt-levels <= 1 the arithmetic operation methods do not get inlined, preventing other optimisations #75598

Closed
nagisa opened this issue Aug 16, 2020 · 5 comments · Fixed by #84061
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@nagisa
Copy link
Member

nagisa commented Aug 16, 2020

Consider code like this:

#![feature(core_intrinsics)]

pub fn method(a: usize, b: usize) -> usize {
    a.wrapping_sub(b)
}

pub fn intrinsic(a: usize, b: usize) -> usize {
    std::intrinsics::wrapping_sub(a, b)
}

compiler explorer

at -Copt-level=1 and lower, the generated assembly for versions with the method call will generate a function call, rather than direct operation. Once the inlining fails, other optimisations that could be done are inhibited, especially at -Copt-level=1`.

More generally speaking, I wonder if we might want to make these methods have a special annotation that would make the compiler generate the instructions directly much like it does for intrinsics right now. These seem like basic enough that #[inline(always)] might not be good enough (it being just a hint) and also possibly more expensive than necessary (something needs to do the inlining still).

@nagisa nagisa added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 16, 2020
@alecmocatta
Copy link
Contributor

Relatedly, #74362 is an instance where not inlining a function that wraps an intrinsic inhibits DCE.

@bugadani
Copy link
Contributor

bugadani commented Sep 20, 2020

#[inline(always)] might not be good enough

My naive assumption is that inline(always) would inline shorter functions more eagerly, since the relative cost of a function call is higher and also shorter functions don't contribute as much to binary bloat. Is this not the case?

@Lokathor
Copy link
Contributor

Lokathor commented Mar 7, 2021

Using the compiler explorer link, it seems like simply tagging the wrapping_ operations as #[inline(always)] will make them get inlined even at opt-level=0

@AngelicosPhosphoros
Copy link
Contributor

These seem like basic enough that #[inline(always)] might not be good enough (it being just a hint) and also possibly more expensive than necessary (something needs to do the inlining still).

This is not true for opt-level=1, AFAIK. In this mode, LLVM runs always-inline which inlines functions without much thinking.
In higher optimization levels this pass doesn't run and inline(always) becomes hint.

@AngelicosPhosphoros
Copy link
Contributor

Example of prevented optimization.
godbolt

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 17, 2021
…nline-always-arithmetic, r=nagisa

Add some #[inline(always)] to arithmetic methods of integers

I tried to add it only to methods which return results of intrinsics and don't have any branching.
Branching could made performance of debug builds (`-Copt-level=0`) worse.
Main goal of changes is allowing wider optimizations in `-Copt-level=1`.

Closes: rust-lang#75598

r? `@nagisa`
@bors bors closed this as completed in f8a12c6 Apr 18, 2021
hkratz added a commit to rusticstuff/rust that referenced this issue Jul 15, 2021
This is a follow-up change to the fix for rust-lang#75598. It simplifies the implementation of wrapping_neg() for all integer types by just calling 0.wrapping_sub(self) and always inlines it. This leads to much less assembly code being emitted for opt-level≤1.
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 4, 2021
…m-ou-se

Make wrapping_neg() use wrapping_sub(), #[inline(always)]

This is a follow-up change to the fix for rust-lang#75598. It simplifies the implementation of wrapping_neg() for all integer types by just calling 0.wrapping_sub(self) and always inlines it. This leads to much less assembly code being emitted for opt-level≤1 and thus much better performance for debug-compiled code.

Background is [this discussion on the internals forum](https://internals.rust-lang.org/t/why-does-rust-generate-10x-as-much-unoptimized-assembly-as-gcc/14930).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants