Don't rebuild LLVM for BOLT optimization #107521

nikic · 2023-01-31T16:43:06Z

Currently, we perform a separate rustc/LLVM build with BOLT instrumentation. However BOLT implementation works on compiled artifacts, so I don't think there is any reason to do a full rebuild. Instead, we should perform the full build, and then at the end instrument libLLVM.so, profile, and optimize libLLVM.so, without performing any rustc/LLVM rebuilds.

cc @Kobzol @Mark-Simulacrum

rustbot · 2023-01-31T16:43:12Z

r? @pietroalbini

(rustbot has picked a reviewer for you, use r? to override)

Kobzol · 2023-01-31T17:06:30Z

I think that it's a good idea to move BOLT optimization outside of bootstrap (or, at least, outside of the LLVM build step) to avoid messing with bootstrap caches, so I like this direction.

I would suggest a slightly different workflow though, and perform the intermediate build steps manually, instead of running dist in the middle of the pipeline:

Stage 3: build PGO optimized LLVM with relocations
Stash away a copy of the built LLVM library and then BOLT instrument it. This can be a bit tricky, as there are a lot of (hard!) links used here, it bit me before.
Gather profiles
(do not delete LLVM directory, just keep it)
BOLT optimize the stashed uninstrumented copy and replace the instrumented LLVM library with it.
Run dist, which shouldn't need to rebuild LLVM again

I hope that this would work, as it's still a bit unclear to me how is the built LLVM exactly used (is rustc only linked to it? or is it used to actually build rustc? or does this differ for individual stages of rustc build - 0/1/2?)

We could optimize this further by building a PGO optimized LLVM sooner, in stage 2, and then stash it away, to avoid one rustc rebuild.

We need to instrument and optimize the stage2 libLLVM.so file.

nikic · 2023-01-31T21:56:37Z

Tried to implement something along those lines but can't test locally right now, so...

@bors try

bors · 2023-01-31T21:56:46Z

⌛ Trying commit 212cfa1 with merge 3b5a8f7fa72f048331b445d463f6758214ee06cd...

bors · 2023-02-01T00:14:13Z

☀️ Try build successful - checks-actions
Build commit: 3b5a8f7fa72f048331b445d463f6758214ee06cd (3b5a8f7fa72f048331b445d463f6758214ee06cd)

bors · 2023-02-01T00:14:13Z

☀️ Try build successful - checks-actions
Build commit: 3b5a8f7fa72f048331b445d463f6758214ee06cd (3b5a8f7fa72f048331b445d463f6758214ee06cd)

nikic · 2023-02-01T08:55:47Z

Let's verify whether we actually end up using the optimized LLVM...

@rust-timer build 3b5a8f7fa72f048331b445d463f6758214ee06cd

From the build log, it looks like we do save one LLVM build (3 instead of 4), but we do still appear to rebuild rustc in the dist stage.

New build stats:

2023-02-01T00:13:46.2607661Z ---------------------------------------------------------
2023-02-01T00:13:46.2608080Z Build rustc (LLVM PGO generate):        1733.73s (21.89%)
2023-02-01T00:13:46.2608511Z Gather profiles (LLVM PGO):              604.11s ( 7.63%)
2023-02-01T00:13:46.2608897Z Build rustc (rustc PGO generate):        688.46s ( 8.69%)
2023-02-01T00:13:46.2609276Z Gather profiles (rustc PGO):            1056.09s (13.34%)
2023-02-01T00:13:46.2609697Z Build rustc (rustc PGO use, LLVM PGO use):      1533.21s (19.36%)
2023-02-01T00:13:46.2610092Z Bolt instrument LLVM:                    258.50s ( 3.26%)
2023-02-01T00:13:46.2610532Z Gather profiles (LLVM BOLT):             717.22s ( 9.06%)
2023-02-01T00:13:46.2610912Z Bolt optimize LLVM:                       37.05s ( 0.47%)
2023-02-01T00:13:46.2611264Z Dist rustc:                             1290.43s (16.30%)
2023-02-01T00:13:46.2611601Z Total duration:                         7918.80s
2023-02-01T00:13:46.2612196Z ---------------------------------------------------------

Build stats from a previous merge (https://pipelines.actions.githubusercontent.com/serviceHosts/1c7a4aeb-bdca-499c-8aff-37881d0d775b/_apis/pipelines/1/runs/26914/signedlogcontent/31?urlExpires=2023-02-01T08%3A40%3A53.6122741Z&urlSigningMethod=HMACV1&urlSignature=YZMD8t53%2FZHCOazFEnkffUUqJSDVxYM5HjufP3dvF6g%3D):


2023-02-01T07:46:52.1086106Z ---------------------------------------------------------
2023-02-01T07:46:52.1086531Z Build rustc (LLVM PGO):                 1460.78s (14.52%)
2023-02-01T07:46:52.1086928Z Gather profiles (LLVM PGO):              515.45s ( 5.12%)
2023-02-01T07:46:52.1087340Z Build rustc (rustc PGO):                 740.59s ( 7.36%)
2023-02-01T07:46:52.1087741Z Gather profiles (rustc PGO):            1131.28s (11.24%)
2023-02-01T07:46:52.1088130Z Build rustc (LLVM BOLT):                2189.50s (21.76%)
2023-02-01T07:46:52.1088534Z Gather profiles (LLVM BOLT):             891.15s ( 8.86%)
2023-02-01T07:46:52.1088910Z Final build:                            3134.70s (31.15%)
2023-02-01T07:46:52.1089562Z Total duration:                        10063.45s
2023-02-01T07:46:52.1090202Z ---------------------------------------------------------

So this does look worthwhile, but it's not the best we can do.

rust-timer · 2023-02-01T10:27:56Z

Finished benchmarking commit (3b5a8f7fa72f048331b445d463f6758214ee06cd): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.2%	[0.7%, 8.4%]	209
Regressions ❌ (secondary)	3.8%	[0.8%, 9.0%]	216
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.2%	[0.7%, 8.4%]	209

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[1.0%, 4.8%]	56
Regressions ❌ (secondary)	3.4%	[1.4%, 5.4%]	30
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.8%	[1.0%, 4.8%]	56

nikic · 2023-02-01T10:41:42Z

Okay, looks like the optimized LLVM did not get picked up :(

Kobzol · 2023-02-01T12:00:32Z

I also had some problems with locating the right LLVM file. I think that the file being used for dist might not actually be the one that you use, but a (hard?)link to this file in some other directory. The hard link is broken, since llvm-bolt creates a new file at the original path, and that might be the reason why it doesn't work.

We should print the output of get_built_llvm_lib_path() in the previous bootstrap BOLT implementation to see what is the correct LLVM lib path to optimize.

nikic · 2023-02-01T12:10:20Z

Running ./x.py build -vv locally I see:

Copy "/home/npopov/repos/rust/build/x86_64-unknown-linux-gnu/llvm/build/lib/libLLVM-15-rust-dev.so" to "/home/npopov/repos/rust/build/x86_64-unknown-linux-gnu/stage1/lib/libLLVM-15-rust-dev.so"

So it looks like the libLLVM is taken from llvm/build/lib, while I'm overwriting the one in llvm/lib here.

nikic · 2023-02-01T12:13:23Z

Let's see if this works...

@bors try @rust-timer queue

Kobzol · 2023-02-01T12:13:32Z

One disadvantage of moving BOLT outside of bootstrap is that the paths to these files will be a bit fragile in the script (another one is that it will be more complicated to perform a BOLT build manually/locally, but that's not a big deal). If we change it in bootstrap, it will also need to be changed in the Python script. But I suppose that it should be visible pretty quickly, because perf. should regress if it fails to optimize some file.

bors · 2023-02-01T12:13:32Z

⌛ Trying commit 912d55f with merge f5bc65f89910eb1a56709b791877543ed32989da...

bors · 2023-02-01T14:32:09Z

☀️ Try build successful - checks-actions
Build commit: f5bc65f89910eb1a56709b791877543ed32989da (f5bc65f89910eb1a56709b791877543ed32989da)

rust-timer · 2023-02-01T15:51:34Z

Finished benchmarking commit (f5bc65f89910eb1a56709b791877543ed32989da): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.0%	[2.5%, 5.6%]	2
Improvements ✅ (primary)	-1.6%	[-1.6%, -1.6%]	1
Improvements ✅ (secondary)	-3.8%	[-3.8%, -3.8%]	1
All ❌✅ (primary)	-1.6%	[-1.6%, -1.6%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

Kobzol · 2023-02-01T16:35:28Z

Great, now it looks that it has worked! I still wonder if it's a good idea to move BOLT completely outside of bootstrap, because some of the mentioned problems (brittle LLVM path, no straightforward access to BOLT profiles, difficult to use BOLT locally). I think that having it as a different build step in bootstrap might be better.

But I wouldn't block this PR on it, it speeds up CI.

Kobzol · 2023-02-01T16:28:47Z

src/bootstrap/dist.rs

@@ -2220,10 +2220,6 @@ impl Step for ReproducibleArtifacts {
            tarball.add_file(path, ".", 0o644);
            added_anything = true;
        }
-        if let Some(path) = builder.config.llvm_bolt_profile_use.as_ref() {


Another disadvantage of putting BOLT into Python is that we lose reproducibility of the profiles here :( I wonder if we could add a bootstrap step on top of LLVM that would create the BOLTed libraries in another place (in order not to interfere with the regular LLVM build step, and to sidestep its cache), and thus still keep BOLT as a first class citizen, with proper access to LLVM library paths, and with storage of profiles, while still avoiding the extra LLVM (re)build.

nikic · 2023-02-01T16:58:47Z

@Kobzol I think the ideal here might be to have a callback step in bootstrap which allow post-processing the artifacts. We could then hook into there to optimize the final stage2 artifacts, which avoids some dependence on details like what gets copied where, and would also make sure that there is no additional rustc rebuild. This would probably also be convenient for optimizing rustc itself with BOLT, as otherwise it might be tricky to convince the build system to actually use the optimized binaries.

This could either be a generic callback (run an extra command after assembling the stage2 compiler), or it could be something BOLT-specific, where we would perform BOLT instrumentation and optimization in bootstrap, and only make the callback provide the merged profile data file. Not sure which of those would be better, really.

Anyway, I'm on PTO from tomorrow, so I'd only be able to experiment with something like that once I get back.

nikic · 2023-02-16T08:03:43Z

Closing this in favor of #107723.

Don't rebuild LLVM for bolt optimization

238be98

rustbot assigned pietroalbini Jan 31, 2023

nikic and others added 2 commits January 31, 2023 21:05

Use correct libLLVM.so file

b30065b

We need to instrument and optimize the stage2 libLLVM.so file.

Keep dist run at the end

212cfa1

This comment has been minimized.

Sign in to view

Replace libLLVM in llvm/build/lib

912d55f

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 1, 2023

Kobzol reviewed Feb 1, 2023

View reviewed changes

nikic changed the title ~~Don't rebuild LLVM for BOLT optimization (WIP)~~ Don't rebuild LLVM for BOLT optimization Feb 1, 2023

nikic marked this pull request as ready for review February 1, 2023 16:48

Kobzol mentioned this pull request Feb 6, 2023

Apply BOLT optimizations without rebuilding LLVM #107723

Merged

nikic closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't rebuild LLVM for BOLT optimization #107521

Don't rebuild LLVM for BOLT optimization #107521

nikic commented Jan 31, 2023 •

edited

Loading

rustbot commented Jan 31, 2023

Kobzol commented Jan 31, 2023 •

edited

Loading

nikic commented Jan 31, 2023

bors commented Jan 31, 2023

bors commented Feb 1, 2023

bors commented Feb 1, 2023

nikic commented Feb 1, 2023

This comment has been minimized.

rust-timer commented Feb 1, 2023

nikic commented Feb 1, 2023

Kobzol commented Feb 1, 2023

nikic commented Feb 1, 2023 •

edited

Loading

nikic commented Feb 1, 2023

This comment has been minimized.

Kobzol commented Feb 1, 2023

bors commented Feb 1, 2023

bors commented Feb 1, 2023

This comment has been minimized.

rust-timer commented Feb 1, 2023

Kobzol commented Feb 1, 2023 •

edited

Loading

Kobzol Feb 1, 2023 •

edited

Loading

nikic commented Feb 1, 2023

nikic commented Feb 16, 2023

Don't rebuild LLVM for BOLT optimization #107521

Don't rebuild LLVM for BOLT optimization #107521

Conversation

nikic commented Jan 31, 2023 • edited Loading

rustbot commented Jan 31, 2023

Kobzol commented Jan 31, 2023 • edited Loading

nikic commented Jan 31, 2023

bors commented Jan 31, 2023

bors commented Feb 1, 2023

bors commented Feb 1, 2023

nikic commented Feb 1, 2023

This comment has been minimized.

rust-timer commented Feb 1, 2023

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

nikic commented Feb 1, 2023

Kobzol commented Feb 1, 2023

nikic commented Feb 1, 2023 • edited Loading

nikic commented Feb 1, 2023

This comment has been minimized.

Kobzol commented Feb 1, 2023

bors commented Feb 1, 2023

bors commented Feb 1, 2023

This comment has been minimized.

rust-timer commented Feb 1, 2023

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Kobzol commented Feb 1, 2023 • edited Loading

Kobzol Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

nikic commented Feb 1, 2023

nikic commented Feb 16, 2023

nikic commented Jan 31, 2023 •

edited

Loading

Kobzol commented Jan 31, 2023 •

edited

Loading

nikic commented Feb 1, 2023 •

edited

Loading

Kobzol commented Feb 1, 2023 •

edited

Loading

Kobzol Feb 1, 2023 •

edited

Loading