Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instance_def_size_estimate is not that good at estimating #69382

Open
andjo403 opened this issue Feb 22, 2020 · 4 comments
Open

instance_def_size_estimate is not that good at estimating #69382

andjo403 opened this issue Feb 22, 2020 · 4 comments
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@andjo403
Copy link
Contributor

when trying to see what the problem was with #66617
I noticed that the llvm-ir did not match the estimate e.g. core::ptr::drop_in_place
actual number of llvm-ir lines 17170 and the estimated size 24 so 715x larger then estimated.
this results in that when the CGUs shall be merged the CGU that contains these functions is merged multipel times and result in one CGU that is mush larger then the other and due to this the compile time is longer then needed.

have seen that especially core::ptr::drop_in_place have many x to smal estimates in real crates also.

I used a debug build of rustc to be able to trace and used the example from #66617
RUSTC_LOG=rustc_mir::monomorphize::partitioning=debug rustc +stage2 --emit=llvm-ir -Cno-prepopulate-passes main.rs to get llvm-ir and see the estimates.

I expected the estimate to be nere the actual number of llvm-ir lines for the functions.

one strange thing is that there is a lot of estimates that is 0 and that is not that common to have so feels like faults.
also some estimates is larger then the lines of llvm-ir e.g. actual: 5370 Estimated: 9129 diff: -3759 alloc::alloc::box_free

the rustc version is built from 03d2f5c

Some more not that good estimates

actual:    1074 Estimated:   1432  diff:   -358 core::ptr::unique::Unique<T>::cast
actual:      46 Estimated:     34  diff:     12 core::ptr::slice_from_raw_parts_mut
actual:      14 Estimated:     10  diff:      4 core::alloc::Layout::from_size_align_unchecked
actual:      68 Estimated:     48  diff:     20 <alloc::vec::Vec<T> as core::ops::index::IndexMut<I>>::index_mut
actual:      20 Estimated:     13  diff:      7 core::num::<impl usize>::saturating_mul
actual:      22 Estimated:     34  diff:    -12 <alloc::vec::Vec<T> as core::ops::deref::DerefMut>::deref_mut
actual:       5 Estimated:      8  diff:     -3 <alloc::alloc::Global as core::alloc::AllocRef>::dealloc
actual:       5 Estimated:      3  diff:      2 core::alloc::Layout::align
actual:       6 Estimated:     10  diff:     -4 <alloc::raw_vec::RawVec<T,A> as core::ops::drop::Drop>::drop
actual:      10 Estimated:      6  diff:      4 alloc::raw_vec::RawVec<T,A>::ptr
actual:    5370 Estimated:   9129  diff:  -3759 alloc::alloc::box_free
actual:      14 Estimated:     24  diff:    -10 core::slice::<impl core::ops::index::IndexMut<I> for [T]>::index_mut
actual:      18 Estimated:     32  diff:    -14 alloc::vec::Vec<T>::as_mut_ptr
actual:      18 Estimated:     10  diff:      8 main2::main
actual:      10 Estimated:     18  diff:     -8 core::ptr::mut_ptr::<impl *mut T>::is_null
actual:      10 Estimated:     18  diff:     -8 core::ptr::const_ptr::<impl *const T>::is_null
actual:     362 Estimated:    728  diff:   -366 core::ptr::unique::Unique<T>::as_ptr
actual:       6 Estimated:     13  diff:     -7 core::ptr::unique::Unique<T>::new_unchecked
actual:     716 Estimated:   1611  diff:   -895 core::ptr::unique::Unique<T>::as_ref
actual:      44 Estimated:    100  diff:    -56 core::slice::from_raw_parts_mut
actual:      13 Estimated:     30  diff:    -17 core::ptr::non_null::NonNull<T>::new_unchecked
actual:      13 Estimated:     32  diff:    -19 <alloc::vec::Vec<T> as core::ops::drop::Drop>::drop
actual:       7 Estimated:     18  diff:    -11 std::rt::lang_start
actual:       4 Estimated:     12  diff:     -8 core::ptr::non_null::NonNull<T>::as_ptr
actual:       5 Estimated:      1  diff:      4 core::ops::function::FnOnce::call_once{{vtable.shim}}
actual:      10 Estimated:      0  diff:     10 core::mem::size_of
actual:      10 Estimated:      0  diff:     10 core::mem::align_of
actual:      23 Estimated:      1  diff:     22 core::ops::function::FnOnce::call_once
actual:   17170 Estimated:     24  diff:  17146 core::ptr::drop_in_place

@rustbot modify labels to +I-compiletime

cc @michaelwoerister

@andjo403 andjo403 added the C-bug Category: This is a bug. label Feb 22, 2020
@rustbot rustbot added the I-compiletime Issue: Problems and improvements with respect to compile times. label Feb 22, 2020
@jonas-schievink jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-codegen Area: Code generation labels Feb 22, 2020
@andjo403
Copy link
Contributor Author

the estimates here is also part of what I talked about in #64913 (comment)
and fixing this will result in the same perf problem that was triggered in #65281 where the CGU merging maybe results in perf changes due to the merging by luck happened to be better before.

@andjo403
Copy link
Contributor Author

andjo403 commented Feb 22, 2020

and when building with optimization (-O) even more functions have an estimated size of 0. tried to compile librustc_macros with -O and more then 71% of the functions hade 0 as estimated size so almost only the name of the function that affects what CGUs that gets merged.

@michaelwoerister
Copy link
Member

There is definitely room for improvement here. The size-estimate is just a rough heuristic to get rid of the worst kinds of LLVM work mis-schedulings.

I think if we had something that predicated the number of LLVM instructions, that would probably be a big improvement. But first I would try to come up with proper requirements. What are the properties we need from a heuristic like this? It's probably more "compile-time impact" than "size" what we are interested in. And how does this interact with LLVM inlining, MIR inlining, etc.

@tmiasko
Copy link
Contributor

tmiasko commented Feb 24, 2021

Triage: The instance estimates are still based on a total number of statements (a drop shim often consist entirely of drop terminators and receive estimate of zero).

tmiasko added a commit to tmiasko/rust that referenced this issue Jul 1, 2021
For example, drop glue generated for struct below, doesn't have any
statements, only terminators. Previously it received an estimate of 0,
the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return).

struct S {
    a: String,
    b: String,
    c: String,
    d: String,
    e: String,
    f: String,
}

Originally reported in rust-lang#69382 (comment)
JohnTitor added a commit to JohnTitor/rust that referenced this issue Jul 1, 2021
…ebank

Include terminators in instance size estimate

For example, drop glue generated for struct below, doesn't have any
statements, only terminators. Previously it received an estimate of 0,
the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return).

```rust
struct S {
    a: String,
    b: String,
    c: String,
    d: String,
    e: String,
    f: String,
}
```

Originally reported in rust-lang#69382 (comment)
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 2, 2021
Include terminators in instance size estimate

For example, drop glue generated for struct below, doesn't have any
statements, only terminators. Previously it received an estimate of 0,
the new estimate is 13 (6+5 drop terminators, +1 resume, +1 return).

```rust
struct S {
    a: String,
    b: String,
    c: String,
    d: String,
    e: String,
    f: String,
}
```

Originally reported in rust-lang#69382 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants