Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce the compiler-builtins partitioning scheme #135395

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jan 12, 2025

compiler-builtins needs every intrinsic in its own CGU. Currently, the compiler-builtins crate puts every intrinsic in its own inline module then library/Cargo.toml uses a profile override so that when we build the sysroot, compiler-builtins is built with more codegen-units than we have intrinsics, and partitioning never merges two intrinsics together. This approach does not work with -Zbuild-std because the profile override gets ignored. And it's kludgey anyway, our own standard library should not be fighting with our own compiler in an attempt to override its behavior. We should change the compiler's behavior to do the right thing in the first place.

So that's what this PR does. There's some light refactoring of the CGU partitioning code, then in 3 places I've added a check for is_compiler_builtins:

  • There's a special case now in cross_crate_inlinable; every function in compiler-builtins that is not #[no_mangle] is made cross-crate-inlinable, which ensures we do not run into problems inlining helpers into intrinsics such as compiler-builtins: Int trait functions are not inlined on wasm #73135
  • When building compiler-builtins, the name of the CGU that a MonoItem is given is just the MonoItem's symbol name. This puts every GloballyShared item in its own CGU.
  • Then when building compiler-builtins, we skip CGU merging.

That should ensure that we have one object file per intrinsic, and if optimizations are enabled, there should be no extra extra CGUs full of helper functions (which is what currently happens in the precompiled standard library we distribute, my nightly libcompiler_builtins.rlib for x86_64-unknown-linux-gnu has 174 CGUs and with this PR we have 150).

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 12, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 12, 2025
…oboet

Add #[inline] to copy_from_slice

I'm doing cooked things to CGU partitioning for compiler-builtins (rust-lang#135395) and this was the lone symbol in my compiler-builtins rlib that wasn't an intrinsic. Adding `#[inline]` makes it go away.

Perf report indicates a marginal but chaotic effect on compile time, marginal improvement in codegen. As expected.
@saethlin saethlin force-pushed the compiler-builtins-cgus branch from 7cf1a94 to b371e7f Compare January 12, 2025 22:01
@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Jan 12, 2025
@saethlin saethlin force-pushed the compiler-builtins-cgus branch from b371e7f to 50dbf9c Compare January 12, 2025 22:58
@saethlin
Copy link
Member Author

r? bjorn3

@saethlin saethlin marked this pull request as ready for review January 12, 2025 23:14
@rustbot
Copy link
Collaborator

rustbot commented Jan 12, 2025

This PR modifies tests/run-make/. If this PR is trying to port a Makefile
run-make test to use rmake.rs, please update the
run-make port tracking issue
so we can track our progress. You can either modify the tracking issue
directly, or you can comment on the tracking issue and link this PR.

cc @jieyouxu

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

// See https://github.com/rust-lang/rust/issues/73135
if tcx.is_compiler_builtins(rustc_span::def_id::LOCAL_CRATE) {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make inlining inside the crate more likely without causing MIR for all functions in compiler-builtins to get encoded in the crate metadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're pointing out here is that these functions are not reachable as MIR, so we don't need to encode MIR for them. The problem as I see it is that our notion of reachable uses this worklist/visited algorithm that tracks items in a path-independent way:

while let Some(search_item) = self.worklist.pop() {
if !scanned.insert(search_item) {
continue;
}
self.propagate_node(&self.tcx.hir_node_by_def_id(search_item), search_item);

Also we already have an issue for the inverse inefficiency, emitting object code when we only need MIR: #119214

I put a hack in this place specifically because the compiler is designed around this function either true or false for whatever reason, past the first few checks. I'm not aware of anywhere else we could make a small localized change to get the behavior we want.

The only other place I could think of putting a hack is MonoItem::instantiation_mode, but that doesn't work because then we get linker errors because instantiation mode needs to agree with exported_symbols, and those disagree because because exported_symbols is based on reachable_set. I really think the inaccuracy of the reachable_set analysis is the root problem here, and it's net better to implement this in a non-invasive way that will be fixed automatically if reachable_set gets improved.

Copy link
Member Author

@saethlin saethlin Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I back up to my merge-base, x build library, then ar x the stage1-std libcompiler_builtins.rlib and run du -sch * I get:

808K	lib.rmeta
5.7M	total

Then with my changes:

968K	lib.rmeta
4.1M	total

So even though it's not perfect, this PR is still a net win.

@bjorn3
Copy link
Member

bjorn3 commented Jan 14, 2025

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jan 14, 2025

📌 Commit 50dbf9c has been approved by bjorn3

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2025
@bors
Copy link
Contributor

bors commented Jan 15, 2025

⌛ Testing commit 50dbf9c with merge 192c456...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 15, 2025
…jorn3

Enforce the compiler-builtins partitioning scheme

compiler-builtins needs every intrinsic in its own CGU. Currently, the compiler-builtins crate puts every intrinsic in its own inline module then `library/Cargo.toml` uses a profile override so that when we build the sysroot, compiler-builtins is built with more `codegen-units` than we have intrinsics, and partitioning never merges two intrinsics together. This approach does not work with `-Zbuild-std` because the profile override gets ignored. And it's kludgey anyway, our own standard library should not be fighting with our own compiler in an attempt to override its behavior. We should change the compiler's behavior to do the right thing in the first place.

So that's what this PR does. There's some light refactoring of the CGU partitioning code, then in 3 places I've added a check for `is_compiler_builtins`:
* There's a special case now in `cross_crate_inlinable`; every function in compiler-builtins that is not `#[no_mangle]` is made cross-crate-inlinable, which ensures we do not run into problems inlining helpers into intrinsics such as rust-lang#73135
* When building compiler-builtins, the name of the CGU that a MonoItem is given is just the MonoItem's symbol name. This puts every GloballyShared item in its own CGU.
* Then when building compiler-builtins, we skip CGU merging.

That should ensure that we have one object file per intrinsic, and if optimizations are enabled, there should be no extra extra CGUs full of helper functions (which is what currently happens in the precompiled standard library we distribute, my nightly libcompiler_builtins.rlib for x86_64-unknown-linux-gnu has 174 CGUs and with this PR we have 150).
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jan 15, 2025

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 15, 2025
@saethlin saethlin force-pushed the compiler-builtins-cgus branch from 50dbf9c to b3841ed Compare January 15, 2025 22:46
@saethlin
Copy link
Member Author

Test failed because it was checking for exactly 1 text section in each object file, but on some platforms, we have object files that have no text sections because they are just a static. I made the test permit 0 or 1 text sections. thumbv6m-none-eabi has 1319 CGUs in a debug build, phew.

@bors r=bjorn3

@bors
Copy link
Contributor

bors commented Jan 15, 2025

📌 Commit b3841ed has been approved by bjorn3

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 15, 2025
@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 17, 2025
@raoulstrackx
Copy link
Contributor

Thanks for updating the test @saethlin ! Unfortunately, the SGX platform with its LVI mitigations still causes issues:

09:40:58 --- stdout -------------------------------
09:40:58 Testing compiler_builtins CGU partitioning for x86_64-fortanix-unknown-sgx
09:40:58 Testing with profile debug and -Ccodegen-units=1
09:40:58 Inspecting object compiler_builtins-8fb8e81dc17de2d2.compiler_builtins.40632dfb9afeccfb-cgu.0000.rcgu.o
09:40:58 symbol: "_ZN17compiler_builtins4math4libm14rem_pio2_large14rem_pio2_large17haa2e3db50d99586cE"
09:40:58 symbol: "__llvm_lvi_thunk_r11"
09:40:58 ------------------------------------------
09:40:58 --- stderr -------------------------------
09:40:58 
09:40:58 thread 'main' panicked at /home/jenkins/workspace/rust-sgx-ci/rust/tests/run-make/compiler-builtins-partitioning/rmake.rs:103:9:
09:40:58 assertion failed: global_text_symbols <= 1
09:40:58 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
09:40:58 ------------------------------------------

@saethlin
Copy link
Member Author

@raoulstrackx I advise you try filing an issue. If you are complaining that the tests do not pass in a configuration that we do not test, then that configuration either needs to be added to the test suite, the test needs to be ignored on that target, or the test needs to be improved. I am not going to hold back this change because the tests do not pass on a configuration that we do not test.

@saethlin

This comment was marked as outdated.

@rustbot

This comment was marked as outdated.

@saethlin
Copy link
Member Author

@rustbot ping rfl

It looks like RFL has been relying on compiler-builtins being compiled into a single object file, and with this change we will always compile compiler-builtins to multiple object files. I can't tell from the diagnostic: #135395 (comment) whether this needs a compiler change or not. Please advise?

@tgross35
Copy link
Contributor

@rustbot ping rust-for-linux

Silent error on unrecognized team name?

(I don't know the answer to the question unfortunately)

@alex
Copy link
Member

alex commented Jan 17, 2025

(It's been forever since I looked at this code, so hopefully someone will correct me if I got that wrong)

I think that's correct that we expect a single .o, in the same way we might expect to have a single .o for any library: https://github.com/torvalds/linux/blob/master/rust/Makefile#L420-L422

However, we use our own compiler-builtins: https://github.com/torvalds/linux/blob/master/rust/compiler_builtins.rs

@bjorn3
Copy link
Member

bjorn3 commented Jan 17, 2025

RFL uses --emit obj and then links the raw objects rather than rlibs. --emit obj can only output to a known location (as passed in through -o) when a single codegen unit is used. When multiple codegen units are used the emitted object files have unpredictable names and thus Kbuild wouldn't be able to find them.

@bjorn3
Copy link
Member

bjorn3 commented Jan 17, 2025

However, we use our own compiler-builtins: https://github.com/torvalds/linux/blob/master/rust/compiler_builtins.rs

Which is marked with #![compiler_builtins] and thus gets the same special casing as the official compiler-builtins crate. Removing this attribute however would cause downstream crates to complain about a missing compiler-builtins I believe.

@workingjubilee
Copy link
Member

hm. based on the PR's motivation, it seems likely the way that RFL is doing this is increasing bloat for the kernel, making the engineering choices that RFL has made contrary to what RFL wants? do I understand everything correctly?

@saethlin
Copy link
Member Author

It is entirely unclear to me what effect this PR has on RFL's compiler-builtins crate in practice, because I haven't used this PR to compile it and I certainly don't know what linkage options or other processing they are using with that object file.

@bjorn3
Copy link
Member

bjorn3 commented Jan 17, 2025

This PR is forcing multiple codegen units for compiler-builtins, but RFL needs all crates to use a single codegen unit each.

@tgross35
Copy link
Contributor

Could these new settings be ignored if --emit obj is passed? Or maybe some kind of -Zno-compiler-builtins to specify that the symbols will be linked in manually, though that has to go through the process.

To be clear, the policy is that the RFL job can be disabled if it is getting in the way of things. Much nicer if we could figure out a workaround first though.

@saethlin
Copy link
Member Author

saethlin commented Jan 17, 2025

I'd like to actually understand why RFL wants exactly the opposite CGU partitioning of compiler-builtins from every other user of Rust before we continue adding hacks for RFL.

The diff in this PR is already kept deliberately small to minimize its impact on the compiler, both your suggestions would approximately double its size.

@ojeda
Copy link
Contributor

ojeda commented Jan 17, 2025

It is entirely unclear to me what effect this PR has on RFL's compiler-builtins crate in practice, because I haven't used this PR to compile it and I certainly don't know what linkage options or other processing they are using with that object file.

If it helps, we currently do something like:

RUSTC_BOOTSTRAP=1 rustc --edition=2021 -Cpanic=abort -Cembed-bitcode=n -Clto=n -Cforce-unwind-tables=n -Ccodegen-units=1 -Csymbol-mangling-version=v0 -Crelocation-model=static -Zfunction-sections=n --target=aarch64-unknown-none -Ctarget-feature="-neon" -Cforce-unwind-tables=n -Copt-level=2 -Cdebug-assertions=n -Coverflow-checks=y -Cforce-frame-pointers=y -Cdebuginfo=1 --cfg no_fp_fmt_parse --emit=obj=core.o --emit=metadata=libcore.rmeta --crate-type rlib -L. --crate-name core `rustc --print sysroot`/lib/rustlib/src/rust/library/core/src/lib.rs --sysroot=/dev/null

llvm-objcopy --redefine-sym __addsf3=__rust__addsf3 --redefine-sym __eqsf2=__rust__eqsf2 --redefine-sym __extendsfdf2=__rust__extendsfdf2 --redefine-sym __gesf2=__rust__gesf2 --redefine-sym __lesf2=__rust__lesf2 --redefine-sym __ltsf2=__rust__ltsf2 --redefine-sym __mulsf3=__rust__mulsf3 --redefine-sym __nesf2=__rust__nesf2 --redefine-sym __truncdfsf2=__rust__truncdfsf2 --redefine-sym __unordsf2=__rust__unordsf2 --redefine-sym __adddf3=__rust__adddf3 --redefine-sym __eqdf2=__rust__eqdf2 --redefine-sym __ledf2=__rust__ledf2 --redefine-sym __ltdf2=__rust__ltdf2 --redefine-sym __muldf3=__rust__muldf3 --redefine-sym __unorddf2=__rust__unorddf2 --redefine-sym __muloti4=__rust__muloti4 --redefine-sym __multi3=__rust__multi3 --redefine-sym __udivmodti4=__rust__udivmodti4 --redefine-sym __udivti3=__rust__udivti3 --redefine-sym __umodti3=__rust__umodti3 --redefine-sym __ashrti3=__rust__ashrti3 --redefine-sym __ashlti3=__rust__ashlti3 --redefine-sym __lshrti3=__rust__lshrti3 core.o

RUSTC_BOOTSTRAP=1 rustc --edition=2021 -Cpanic=abort -Cembed-bitcode=n -Clto=n -Cforce-unwind-tables=n -Ccodegen-units=1 -Csymbol-mangling-version=v0 -Crelocation-model=static -Zfunction-sections=n --target=aarch64-unknown-none -Ctarget-feature="-neon" -Cforce-unwind-tables=n -Copt-level=2 -Cdebug-assertions=n -Coverflow-checks=y -Cforce-frame-pointers=y -Cdebuginfo=1 --emit=obj=compiler_builtins.o --emit=metadata=libcompiler_builtins.rmeta --crate-type rlib -L. --crate-name compiler_builtins compiler_builtins.rs --sysroot=/dev/null

llvm-objcopy -w -W '__*' compiler_builtins.o

Cc @nbdd0121

Could these new settings be ignored if --emit obj is passed?

Something like that sounds nice, but if there is something that we should be doing to improve our builds, happy to adjust.

Much nicer if we could figure out a workaround first though.

Yeah, if the breakage is unintentional and the PR not urgent, then it is best to figure out first.

@saethlin
Copy link
Member Author

It's not clear to me why above, you need everything in one object file as opposed to operating on an archive file (if llvm-objcopy can do that) or having a loop over all the object files.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Jan 17, 2025

RfL build system doesn't handle rlibs, it handles object files only. So we require CGU=1 and emit object files.

The reason that compiler_builtins want as many CGUs as possible is that if an intrinsic is already present and linked in libgcc then we use it instead of linking in our compiler_builtins. I believe fundamental there's nothing in compiler_builtins that require it to have as many CGUs as possible. So in essence I see it as being a hack to always ignore CGU settings and always emit as many CGUs as possible and I disagree the statement that restoring the correct meaning of the options is a hack.

RfL doesn't have libgcc linked so it has no reason to have multiple CGUs. Why dealing with archive files or multiple .o files if one suffices? I believe this is also not needed for embedded use cases.

@nbdd0121
Copy link
Contributor

hm. based on the PR's motivation, it seems likely the way that RFL is doing this is increasing bloat for the kernel, making the engineering choices that RFL has made contrary to what RFL wants? do I understand everything correctly?

I don't get it. If anything a single CGU is a size reduction.

@saethlin
Copy link
Member Author

Why does RFL need to use the #![compiler_builtins] attribute? What happens if you remove it?

@nbdd0121
Copy link
Contributor

nbdd0121 commented Jan 18, 2025

Then it doesn't compile, as Rust injects compiler_builtins as a dependency and now self-dependency happens:

let names: &[Symbol] = if attr::contains_name(pre_configured_attrs, sym::no_core) {
return 0;
} else if attr::contains_name(pre_configured_attrs, sym::no_std) {
if attr::contains_name(pre_configured_attrs, sym::compiler_builtins) {
&[sym::core]
} else {
&[sym::core, sym::compiler_builtins]
}
} else {
&[sym::std]
};

This patch would make RfL compile, although now we need #![no_core]

diff --git a/rust/Makefile b/rust/Makefile
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -438,6 +438,7 @@ $(obj)/core.o: scripts/target.json
 endif

 $(obj)/compiler_builtins.o: private rustc_objcopy = -w -W '__*'
+$(obj)/compiler_builtins.o: private rustc_target_flags = --extern core
 $(obj)/compiler_builtins.o: $(src)/compiler_builtins.rs $(obj)/core.o FORCE
        +$(call if_changed_rule,rustc_library)

diff --git a/rust/compiler_builtins.rs b/rust/compiler_builtins.rs
--- a/rust/compiler_builtins.rs
+++ b/rust/compiler_builtins.rs
@@ -19,11 +19,13 @@
 //! [`compiler_builtins`]: https://github.com/rust-lang/compiler-builtins
 //! [`compiler-rt`]: https://compiler-rt.llvm.org/

-#![allow(internal_features)]
-#![feature(compiler_builtins)]
-#![compiler_builtins]
+#![feature(no_core)]
 #![no_builtins]
-#![no_std]
+// We need `no_core` to avoid pulling in `compiler_builtins` (which creates a self-recursion)
+// since all crates get injected with `core` and `compiler_builtins` dependencies.
+#![no_core]
+
+use core::{concat, panic, stringify};

 macro_rules! define_panicking_intrinsics(
     ($reason: tt, { $($ident: ident, )* }) => {

@bjorn3
Copy link
Member

bjorn3 commented Jan 18, 2025

This patch would make RfL compile, although now we need #![no_core]

Wouldn't that need to be applied to literally every crate rather than just compiler-builtins?

@bjorn3
Copy link
Member

bjorn3 commented Jan 18, 2025

By the way a while back I looked into getting RFL to link rlibs rather than raw object files (to among other things make incr comp possible and to increase parallelism during compilation). Unfortunately llvm-ar doesn't accept adding the contents of a regular archive to a thin archive (binutils ar does allow this by having the thin archive reference individual members of the regular archive, but neither llvm-ar nor lld support this) and Linux builds a thin archive for each directory and there may be multiple crates in a single directory which thus need to be merged into a single thin archive.

@nbdd0121
Copy link
Contributor

Wouldn't that need to be applied to literally every crate rather than just compiler-builtins?

No, there's still a crate named compiler_builtins, so no_std crates will be using it, despite that it does not have #![compiler_builtins] attribute.

@bjorn3
Copy link
Member

bjorn3 commented Jan 18, 2025

That seems like a bug. A user defined compiler-builtins crate shouldn't be able to take precedence. Because of this you can do --extern compiler_builtins=/path/to/libcore.rlib on stable to prevent linking of compiler-builtins.

@nbdd0121
Copy link
Contributor

Well, I don't see why the user shouldn't be allowed to override compiler_builtins if they wish to do so.

@bjorn3
Copy link
Member

bjorn3 commented Jan 18, 2025

compiler-builtins is an internal implementation detail. We don't provide any guarantees about which functions it needs to define for linking to succeed.

Kobzol pushed a commit to Kobzol/rustc-dev-guide that referenced this pull request Jan 20, 2025
Add #[inline] to copy_from_slice

I'm doing cooked things to CGU partitioning for compiler-builtins (rust-lang/rust#135395) and this was the lone symbol in my compiler-builtins rlib that wasn't an intrinsic. Adding `#[inline]` makes it go away.

Perf report indicates a marginal but chaotic effect on compile time, marginal improvement in codegen. As expected.
@raoulstrackx
Copy link
Contributor

@raoulstrackx I advise you try filing an issue. If you are complaining that the tests do not pass in a configuration that we do not test, then that configuration either needs to be added to the test suite, the test needs to be ignored on that target, or the test needs to be improved. I am not going to hold back this change because the tests do not pass on a configuration that we do not test.

@saethlin sorry for not being more elaborate. I wanted to ask to ignore this test for the x86_64-fortanix-unknown-sgx target. This was previously done by adding //@ ignore-sgx to the rmake.rs file. I know it's annoying. These LVI mitigations are causing multiple issues, but we really need them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.