Suboptimal inlining decisions #49541

glandium · 2018-03-31T11:59:51Z

I suspect this can happen in more cases, but here is how I observed this:

pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}

This compiles to:

  sub rsp, 56
  lea rdx, [rsp + 8]
  mov edi, 1
  mov esi, 1
  call __rust_alloc@PLT
  test rax, rax
  je .LBB2_1
  mov byte ptr [rax], 0
  mov edx, 1
  add rsp, 56
  ret
.LBB2_1:
  (snip oom handling)

Which is pretty much to the point.

Now duplicate the function, so that you now have two functions calling into_boxed_slice(), and the compiler decides not to inline it at all anymore. Which:

adds the full blown Vec::into_boxed_slice implementation (63 lines of assembly)
adds ptr::drop_in_place
and changes the function above to:

  sub rsp, 56
  lea rdx, [rsp + 8]
  mov edi, 1
  mov esi, 1
  call __rust_alloc@PLT
  test rax, rax
  je .LBB4_1
  mov byte ptr [rax], 0
  mov qword ptr [rsp + 8], rax
  mov qword ptr [rsp + 16], 1
  mov qword ptr [rsp + 24], 1
  lea rdi, [rsp + 8]
  call <alloc::vec::Vec<T>>::into_boxed_slice
  add rsp, 56
  ret
.LBB4_1:
  (snip oom handling)

The threshold to stop inlining seems pretty low for this particular case, and even if it might make sense for some uses across the codebase to not be inlined, when the result of inlining is clearly beneficial, it would be good if we could still inline the calls where it's a win.

The text was updated successfully, but these errors were encountered:

glandium · 2018-03-31T12:18:45Z

Random thought: maybe this calls for an annotation to force-inline at the caller level.

nox · 2018-03-31T12:55:05Z

Cc @rust-lang/wg-codegen

oli-obk · 2018-03-31T13:30:31Z

I'd rather not consider more annotations. We already need to tell ppl that, yes there is #[inline(always)], no it will not make your code faster, it's a scalpel for edge cases.

We could probably more aggressively inline in the presence of constant information.

nox · 2018-03-31T13:55:51Z

Now duplicate the function, so that you now have two functions calling into_boxed_slice(), and the compiler decides not to inline it at all anymore.

I wonder how that interacts with #49479.

hanna-kruppe · 2018-03-31T14:26:42Z

MergeFunctions could help in the case where the entire callers are identical (edit:) and it runs before inlining. Which is not a very interesting case IMO for something like into_boxed_slice which probably is called from many different places.

I believe the root cause of this issue is that LLVM's inlining cost heuristic special cases functions with internal linkage and just one call site (because in those cases, inlining doesn't increase code size since you can eliminate the function entirely). If there are multiple call sites, IIRC it doesn't do anything to account for the fact that inlining all call sites would still allow eliminating the function. Probably because it can't actually know whether all call sites will inline it. Seems difficult to solve in general.

oli-obk · 2018-03-31T14:31:44Z

If inlining decreases the code size of the caller, we should always inline regardless of heuristics. Idk how to detect such cases before actually doing the inlining

hanna-kruppe · 2018-03-31T14:33:45Z

That too is part of the inlining heuristic. Of course, it's only an heuristic, and it could always be better, but tuning it further is notoriously fickle.

nox · 2018-03-31T14:51:40Z

All the functions involved in conversions between Vec<T> and Box<[T]> and between String and Box<str> should at least be annotated with #[inline], IMO. And for this purpose, shouldn't <Vec<T>>::shrink_to_fit be #[inline] and have its own self.capacity() != self.len check?

glandium · 2018-03-31T21:06:57Z

@oli-obk

I'd rather not consider more annotations. We already need to tell ppl that, yes there is #[inline(always)], no it will not make your code faster, it's a scalpel for edge cases.

The problem is that there is no scalpel for edge cases when the called function is not annotated at all, which the case here.

@rkruppe

If there are multiple call sites, IIRC it doesn't do anything to account for the fact that inlining all call sites would still allow eliminating the function. Probably because it can't actually know whether all call sites will inline it.

One way to look at it is that part of the problem is this all or nothing property of inlining. Either the function is always inlined or not (AIUI). I'd argue it should be decided case by case. There may be both cases where it makes sense for the function not to be inlined and cases where it doesn't, in the same codebase.

This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case. Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with `#[inline]`. For the record, that function was: ```rust pub fn foo() -> Box<[u8]> { vec![0].into_boxed_slice() } ``` To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.

hanna-kruppe · 2018-04-01T11:30:32Z

@glandium You misunderstand, inlining in LLVM is a per-call-site decision (though it is true that most of the heuristic only looks at the function to inline, not at the call site).

Inline most of the code paths for conversions with boxed slices This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case. Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with `#[inline]`. For the record, that function was: ```rust pub fn foo() -> Box<[u8]> { vec![0].into_boxed_slice() } ``` To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.

steveklabnik · 2020-06-20T19:35:54Z

Triage; no change

oli-obk added the WG-llvm Working group: LLVM backend code generation label Mar 31, 2018

nox mentioned this issue Apr 1, 2018

Inline most of the code paths for conversions with boxed slices #49555

Merged

XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal inlining decisions #49541

Suboptimal inlining decisions #49541

glandium commented Mar 31, 2018 •

edited

Loading

glandium commented Mar 31, 2018

nox commented Mar 31, 2018

oli-obk commented Mar 31, 2018

nox commented Mar 31, 2018

hanna-kruppe commented Mar 31, 2018 •

edited

Loading

oli-obk commented Mar 31, 2018

hanna-kruppe commented Mar 31, 2018

nox commented Mar 31, 2018

glandium commented Mar 31, 2018

hanna-kruppe commented Apr 1, 2018 •

edited

Loading

steveklabnik commented Jun 20, 2020

Suboptimal inlining decisions #49541

Suboptimal inlining decisions #49541

Comments

glandium commented Mar 31, 2018 • edited Loading

glandium commented Mar 31, 2018

nox commented Mar 31, 2018

oli-obk commented Mar 31, 2018

nox commented Mar 31, 2018

hanna-kruppe commented Mar 31, 2018 • edited Loading

oli-obk commented Mar 31, 2018

hanna-kruppe commented Mar 31, 2018

nox commented Mar 31, 2018

glandium commented Mar 31, 2018

hanna-kruppe commented Apr 1, 2018 • edited Loading

steveklabnik commented Jun 20, 2020

glandium commented Mar 31, 2018 •

edited

Loading

hanna-kruppe commented Mar 31, 2018 •

edited

Loading

hanna-kruppe commented Apr 1, 2018 •

edited

Loading