-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't rebuild LLVM for BOLT optimization #107521
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
I think that it's a good idea to move BOLT optimization outside of bootstrap (or, at least, outside of the LLVM build step) to avoid messing with bootstrap caches, so I like this direction. I would suggest a slightly different workflow though, and perform the intermediate build steps manually, instead of running
I hope that this would work, as it's still a bit unclear to me how is the built LLVM exactly used (is rustc only linked to it? or is it used to actually build We could optimize this further by building a PGO optimized LLVM sooner, in stage 2, and then stash it away, to avoid one rustc rebuild. |
We need to instrument and optimize the stage2 libLLVM.so file.
Tried to implement something along those lines but can't test locally right now, so... @bors try |
⌛ Trying commit 212cfa1 with merge 3b5a8f7fa72f048331b445d463f6758214ee06cd... |
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
Let's verify whether we actually end up using the optimized LLVM... @rust-timer build 3b5a8f7fa72f048331b445d463f6758214ee06cd From the build log, it looks like we do save one LLVM build (3 instead of 4), but we do still appear to rebuild rustc in the dist stage. New build stats:
Build stats from a previous merge (https://pipelines.actions.githubusercontent.com/serviceHosts/1c7a4aeb-bdca-499c-8aff-37881d0d775b/_apis/pipelines/1/runs/26914/signedlogcontent/31?urlExpires=2023-02-01T08%3A40%3A53.6122741Z&urlSigningMethod=HMACV1&urlSignature=YZMD8t53%2FZHCOazFEnkffUUqJSDVxYM5HjufP3dvF6g%3D):
So this does look worthwhile, but it's not the best we can do. |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (3b5a8f7fa72f048331b445d463f6758214ee06cd): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Okay, looks like the optimized LLVM did not get picked up :( |
I also had some problems with locating the right LLVM file. I think that the file being used for We should print the output of |
Running
So it looks like the libLLVM is taken from |
Let's see if this works... @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
One disadvantage of moving BOLT outside of bootstrap is that the paths to these files will be a bit fragile in the script (another one is that it will be more complicated to perform a BOLT build manually/locally, but that's not a big deal). If we change it in bootstrap, it will also need to be changed in the Python script. But I suppose that it should be visible pretty quickly, because perf. should regress if it fails to optimize some file. |
⌛ Trying commit 912d55f with merge f5bc65f89910eb1a56709b791877543ed32989da... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f5bc65f89910eb1a56709b791877543ed32989da): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Great, now it looks that it has worked! I still wonder if it's a good idea to move BOLT completely outside of bootstrap, because some of the mentioned problems (brittle LLVM path, no straightforward access to BOLT profiles, difficult to use BOLT locally). I think that having it as a different build step in bootstrap might be better. But I wouldn't block this PR on it, it speeds up CI. |
@@ -2220,10 +2220,6 @@ impl Step for ReproducibleArtifacts { | |||
tarball.add_file(path, ".", 0o644); | |||
added_anything = true; | |||
} | |||
if let Some(path) = builder.config.llvm_bolt_profile_use.as_ref() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another disadvantage of putting BOLT into Python is that we lose reproducibility of the profiles here :( I wonder if we could add a bootstrap step on top of LLVM that would create the BOLTed libraries in another place (in order not to interfere with the regular LLVM build step, and to sidestep its cache), and thus still keep BOLT as a first class citizen, with proper access to LLVM library paths, and with storage of profiles, while still avoiding the extra LLVM (re)build.
@Kobzol I think the ideal here might be to have a callback step in bootstrap which allow post-processing the artifacts. We could then hook into there to optimize the final stage2 artifacts, which avoids some dependence on details like what gets copied where, and would also make sure that there is no additional rustc rebuild. This would probably also be convenient for optimizing rustc itself with BOLT, as otherwise it might be tricky to convince the build system to actually use the optimized binaries. This could either be a generic callback (run an extra command after assembling the stage2 compiler), or it could be something BOLT-specific, where we would perform BOLT instrumentation and optimization in bootstrap, and only make the callback provide the merged profile data file. Not sure which of those would be better, really. Anyway, I'm on PTO from tomorrow, so I'd only be able to experiment with something like that once I get back. |
Closing this in favor of #107723. |
Currently, we perform a separate rustc/LLVM build with BOLT instrumentation. However BOLT implementation works on compiled artifacts, so I don't think there is any reason to do a full rebuild. Instead, we should perform the full build, and then at the end instrument libLLVM.so, profile, and optimize libLLVM.so, without performing any rustc/LLVM rebuilds.
cc @Kobzol @Mark-Simulacrum