Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the usage of cargo-llvm-lines and -Ztimings. #905

Merged
merged 2 commits into from
Oct 4, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 79 additions & 1 deletion src/profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,83 @@ Depending on what you're trying to measure, there are several different approach
See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md) for more information.

- If you want function level performance data or even just more details than the above approaches:
- Consider using a native code profiler such as [perf](profiling/with_perf.html).
- Consider using a native code profiler such as [perf](profiling/with_perf.html)
- or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision,
full-featured graphical interface.

- If you want a nice visual representation of the compile times of your crate graph,
you can use [cargo's `-Ztimings` flag](https://doc.rust-lang.org/cargo/reference/unstable.html#timings),
eg. `cargo -Ztimings build`.
You can use this flag on the compiler itself with `CARGOFLAGS="-Ztimings" ./x.py build`

## Optimizing rustc's bootstrap times with `cargo-llvm-lines`

Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
number of lines of LLVM IR across all instantiations of a generic function.
Since most of the time compiling rustc is spent in LLVM, the idea is that by
reducing the amount of code passed to LLVM, compiling rustc gets faster.

Example usage:
```
cargo install cargo-llvm-lines
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P

# Do a clean before every run, to not mix in the results from previous runs.
./x.py clean
RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc
jyn514 marked this conversation as resolved.
Show resolved Hide resolved

# Single crate, eg. rustc_middle
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle-a539a639bdab6513.ll > llvm-lines-middle.txt
# Specify all crates of the compiler. (Relies on the glob support of your shell.)
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt
```

Example output:
```
Lines Copies Function name
----- ------ -------------
11802479 (100%) 52848 (100%) (TOTAL)
1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like another good place to use dyn or just separate out most of the function. That's a lot of LLVM lines for only 400 instantiations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I haven't had much time yet to look into it, but it's tricky because the relevant traits aren't dyn safe.

683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place
568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl
472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash
306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl
158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once
119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific
104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats
80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}}
78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}}
69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt
56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich
49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label
```

Since this doesn't seem to work with incremental compilation or `x.py check`,
you will be compiling rustc _a lot_.
I recommend changing a few settings in `config.toml` to make it bearable:
```
[rust]
# A debug build takes _a third_ as long on my machine,
# but compiling more than stage0 rustc becomes unbearably slow.
optimize = false

# We can't use incremental anyway, so we disable it for a little speed boost.
incremental = false
# We won't be running it, so no point in compiling debug checks.
debug = false

# Using a single codegen unit gives less output, but is slower to compile.
codegen-units = 0 # num_cpus
```

The llvm-lines output is affected by several options.
`optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB.

MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Zmir-opt-level=1"`,
level 0 adds 0.3GB and level 2 removes 0.2GB.
Inlining currently only happens in LLVM, but this might change in the future.