Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rebuild LLVM for BOLT optimization #107521

Closed
wants to merge 4 commits into from
Closed

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Jan 31, 2023

Currently, we perform a separate rustc/LLVM build with BOLT instrumentation. However BOLT implementation works on compiled artifacts, so I don't think there is any reason to do a full rebuild. Instead, we should perform the full build, and then at the end instrument libLLVM.so, profile, and optimize libLLVM.so, without performing any rustc/LLVM rebuilds.

cc @Kobzol @Mark-Simulacrum

@rustbot
Copy link
Collaborator

rustbot commented Jan 31, 2023

r? @pietroalbini

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Jan 31, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Jan 31, 2023

I think that it's a good idea to move BOLT optimization outside of bootstrap (or, at least, outside of the LLVM build step) to avoid messing with bootstrap caches, so I like this direction.

I would suggest a slightly different workflow though, and perform the intermediate build steps manually, instead of running dist in the middle of the pipeline:

  1. Stage 3: build PGO optimized LLVM with relocations
  2. Stash away a copy of the built LLVM library and then BOLT instrument it. This can be a bit tricky, as there are a lot of (hard!) links used here, it bit me before.
  3. Gather profiles
  4. (do not delete LLVM directory, just keep it)
  5. BOLT optimize the stashed uninstrumented copy and replace the instrumented LLVM library with it.
  6. Run dist, which shouldn't need to rebuild LLVM again

I hope that this would work, as it's still a bit unclear to me how is the built LLVM exactly used (is rustc only linked to it? or is it used to actually build rustc? or does this differ for individual stages of rustc build - 0/1/2?)

We could optimize this further by building a PGO optimized LLVM sooner, in stage 2, and then stash it away, to avoid one rustc rebuild.

nikic and others added 2 commits January 31, 2023 21:05
We need to instrument and optimize the stage2 libLLVM.so file.
@nikic
Copy link
Contributor Author

nikic commented Jan 31, 2023

Tried to implement something along those lines but can't test locally right now, so...

@bors try

@bors
Copy link
Contributor

bors commented Jan 31, 2023

⌛ Trying commit 212cfa1 with merge 3b5a8f7fa72f048331b445d463f6758214ee06cd...

@bors
Copy link
Contributor

bors commented Feb 1, 2023

☀️ Try build successful - checks-actions
Build commit: 3b5a8f7fa72f048331b445d463f6758214ee06cd (3b5a8f7fa72f048331b445d463f6758214ee06cd)

1 similar comment
@bors
Copy link
Contributor

bors commented Feb 1, 2023

☀️ Try build successful - checks-actions
Build commit: 3b5a8f7fa72f048331b445d463f6758214ee06cd (3b5a8f7fa72f048331b445d463f6758214ee06cd)

@nikic
Copy link
Contributor Author

nikic commented Feb 1, 2023

Let's verify whether we actually end up using the optimized LLVM...

@rust-timer build 3b5a8f7fa72f048331b445d463f6758214ee06cd

From the build log, it looks like we do save one LLVM build (3 instead of 4), but we do still appear to rebuild rustc in the dist stage.

New build stats:

2023-02-01T00:13:46.2607661Z ---------------------------------------------------------
2023-02-01T00:13:46.2608080Z Build rustc (LLVM PGO generate):        1733.73s (21.89%)
2023-02-01T00:13:46.2608511Z Gather profiles (LLVM PGO):              604.11s ( 7.63%)
2023-02-01T00:13:46.2608897Z Build rustc (rustc PGO generate):        688.46s ( 8.69%)
2023-02-01T00:13:46.2609276Z Gather profiles (rustc PGO):            1056.09s (13.34%)
2023-02-01T00:13:46.2609697Z Build rustc (rustc PGO use, LLVM PGO use):      1533.21s (19.36%)
2023-02-01T00:13:46.2610092Z Bolt instrument LLVM:                    258.50s ( 3.26%)
2023-02-01T00:13:46.2610532Z Gather profiles (LLVM BOLT):             717.22s ( 9.06%)
2023-02-01T00:13:46.2610912Z Bolt optimize LLVM:                       37.05s ( 0.47%)
2023-02-01T00:13:46.2611264Z Dist rustc:                             1290.43s (16.30%)
2023-02-01T00:13:46.2611601Z Total duration:                         7918.80s
2023-02-01T00:13:46.2612196Z ---------------------------------------------------------

Build stats from a previous merge (https://pipelines.actions.githubusercontent.com/serviceHosts/1c7a4aeb-bdca-499c-8aff-37881d0d775b/_apis/pipelines/1/runs/26914/signedlogcontent/31?urlExpires=2023-02-01T08%3A40%3A53.6122741Z&urlSigningMethod=HMACV1&urlSignature=YZMD8t53%2FZHCOazFEnkffUUqJSDVxYM5HjufP3dvF6g%3D):


2023-02-01T07:46:52.1086106Z ---------------------------------------------------------
2023-02-01T07:46:52.1086531Z Build rustc (LLVM PGO):                 1460.78s (14.52%)
2023-02-01T07:46:52.1086928Z Gather profiles (LLVM PGO):              515.45s ( 5.12%)
2023-02-01T07:46:52.1087340Z Build rustc (rustc PGO):                 740.59s ( 7.36%)
2023-02-01T07:46:52.1087741Z Gather profiles (rustc PGO):            1131.28s (11.24%)
2023-02-01T07:46:52.1088130Z Build rustc (LLVM BOLT):                2189.50s (21.76%)
2023-02-01T07:46:52.1088534Z Gather profiles (LLVM BOLT):             891.15s ( 8.86%)
2023-02-01T07:46:52.1088910Z Final build:                            3134.70s (31.15%)
2023-02-01T07:46:52.1089562Z Total duration:                        10063.45s
2023-02-01T07:46:52.1090202Z ---------------------------------------------------------

So this does look worthwhile, but it's not the best we can do.

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3b5a8f7fa72f048331b445d463f6758214ee06cd): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.2% [0.7%, 8.4%] 209
Regressions ❌
(secondary)
3.8% [0.8%, 9.0%] 216
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.2% [0.7%, 8.4%] 209

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [1.0%, 4.8%] 56
Regressions ❌
(secondary)
3.4% [1.4%, 5.4%] 30
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.8% [1.0%, 4.8%] 56

@nikic
Copy link
Contributor Author

nikic commented Feb 1, 2023

Okay, looks like the optimized LLVM did not get picked up :(

@Kobzol
Copy link
Contributor

Kobzol commented Feb 1, 2023

I also had some problems with locating the right LLVM file. I think that the file being used for dist might not actually be the one that you use, but a (hard?)link to this file in some other directory. The hard link is broken, since llvm-bolt creates a new file at the original path, and that might be the reason why it doesn't work.

We should print the output of get_built_llvm_lib_path() in the previous bootstrap BOLT implementation to see what is the correct LLVM lib path to optimize.

@nikic
Copy link
Contributor Author

nikic commented Feb 1, 2023

Running ./x.py build -vv locally I see:

Copy "/home/npopov/repos/rust/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-dev.so" to "/home/npopov/repos/rust/build/x86_64-unknown-linux-gnu/stage1/lib/libLLVM-15-rust-dev.so"

So it looks like the libLLVM is taken from llvm/build/lib, while I'm overwriting the one in llvm/lib here.

@nikic
Copy link
Contributor Author

nikic commented Feb 1, 2023

Let's see if this works...

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Feb 1, 2023

One disadvantage of moving BOLT outside of bootstrap is that the paths to these files will be a bit fragile in the script (another one is that it will be more complicated to perform a BOLT build manually/locally, but that's not a big deal). If we change it in bootstrap, it will also need to be changed in the Python script. But I suppose that it should be visible pretty quickly, because perf. should regress if it fails to optimize some file.

@bors
Copy link
Contributor

bors commented Feb 1, 2023

⌛ Trying commit 912d55f with merge f5bc65f89910eb1a56709b791877543ed32989da...

@bors
Copy link
Contributor

bors commented Feb 1, 2023

☀️ Try build successful - checks-actions
Build commit: f5bc65f89910eb1a56709b791877543ed32989da (f5bc65f89910eb1a56709b791877543ed32989da)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f5bc65f89910eb1a56709b791877543ed32989da): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [2.5%, 5.6%] 2
Improvements ✅
(primary)
-1.6% [-1.6%, -1.6%] 1
Improvements ✅
(secondary)
-3.8% [-3.8%, -3.8%] 1
All ❌✅ (primary) -1.6% [-1.6%, -1.6%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Feb 1, 2023

Great, now it looks that it has worked! I still wonder if it's a good idea to move BOLT completely outside of bootstrap, because some of the mentioned problems (brittle LLVM path, no straightforward access to BOLT profiles, difficult to use BOLT locally). I think that having it as a different build step in bootstrap might be better.

But I wouldn't block this PR on it, it speeds up CI.

@@ -2220,10 +2220,6 @@ impl Step for ReproducibleArtifacts {
tarball.add_file(path, ".", 0o644);
added_anything = true;
}
if let Some(path) = builder.config.llvm_bolt_profile_use.as_ref() {
Copy link
Contributor

@Kobzol Kobzol Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another disadvantage of putting BOLT into Python is that we lose reproducibility of the profiles here :( I wonder if we could add a bootstrap step on top of LLVM that would create the BOLTed libraries in another place (in order not to interfere with the regular LLVM build step, and to sidestep its cache), and thus still keep BOLT as a first class citizen, with proper access to LLVM library paths, and with storage of profiles, while still avoiding the extra LLVM (re)build.

@nikic nikic changed the title Don't rebuild LLVM for BOLT optimization (WIP) Don't rebuild LLVM for BOLT optimization Feb 1, 2023
@nikic nikic marked this pull request as ready for review February 1, 2023 16:48
@nikic
Copy link
Contributor Author

nikic commented Feb 1, 2023

@Kobzol I think the ideal here might be to have a callback step in bootstrap which allow post-processing the artifacts. We could then hook into there to optimize the final stage2 artifacts, which avoids some dependence on details like what gets copied where, and would also make sure that there is no additional rustc rebuild. This would probably also be convenient for optimizing rustc itself with BOLT, as otherwise it might be tricky to convince the build system to actually use the optimized binaries.

This could either be a generic callback (run an extra command after assembling the stage2 compiler), or it could be something BOLT-specific, where we would perform BOLT instrumentation and optimization in bootstrap, and only make the callback provide the merged profile data file. Not sure which of those would be better, really.

Anyway, I'm on PTO from tomorrow, so I'd only be able to experiment with something like that once I get back.

@nikic
Copy link
Contributor Author

nikic commented Feb 16, 2023

Closing this in favor of #107723.

@nikic nikic closed this Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants