-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the bazel-out structure to avoid busybox symlinks. #4505
Conversation
@@ -75,3 +105,35 @@ symlink_file = rule( | |||
"symlink_label": attr.label(allow_single_file = True), | |||
}, | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you make any changes below here? I ask because I don't see any changes compared to the code that was previously at the top of the file, but if there aren't any, it's bizarre that the diff is treating it as new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was just shuffling as I was trying to figure out organization.
Rather than using a python script to intercept commands targeting the busybox, this *copies* the busybox into the installation (precluding a symlink), and arranges for symlinks pointing at it to always have the desired basename. This won't support a symlink called `my_special_carbon` getting the same busybox treatment as a symlink called `carbon`, but that seems OK to me. And it avoids the complexity and overhead of the Python script. Notably, a symlink from `my_special_clang` to `clang++` won't get the implicit C++ language behavior that a symlink whose filename is spelled exactly `clang++` does. So this kind of filename constraint on symlinks seems precedented and might be a tolerable burden. And in reality, it is only *inside Bazel* that these fancier symlink spellings won't work. In an installation, even they should work.
This restores the symlinks for the installation, but teaches the busybox info search to look for a relative path to the busybox binary itself before walking through symlinks. This let's it find the tree structure when directly invoking `prefix_root/bin/carbon` or similar, either inside of a Bazel rule or from the command line, and mirrors how we expect the installed tree to look. This works even when Bazel resolves the symlink target fully, and potentially to something nonsensical like a CAS file. In order to make a convenient Bazel target that can be used with `bazel run //toolchain`, this adds an override to explicitly set the desired argv[0] to use when selecting a mode for the busybox and a busybox binary. Currently, the workaround uses an environment variable because that required the least amount of plumbing, and seems a useful override mechanism generally, but I'm open to other approaches. This should allow a few things to work a bit more nicely: - It should handle sibling symlinks like `clang++` to `clang` or `ld.lld` to `lld`, where that symlink in turn points at the busybox. We want to use *initial* `argv[0]` value to select the mode there. - It avoids bouncing through Python (or other subprocesses) when invoking the `carbon` binary in Bazel rules, which will be nice for building the example code and benchmarking. It does come at a cost of removing one feature: the initial symlink can't be some unrelated alias like `my_carbon_symlink` -- we expect the *first* argv[0] name to have the meaningful filename for selecting a busybox mode. It also trades the complexity of the Python script for some complexity in the busybox search in order to look for a relative `carbon-busybox` binary. On the whole, I think that tradeoff is worthwhile, but it isn't free.
This restores the symlinks for the installation, but teaches the busybox info search to look for a relative path to the busybox binary itself before walking through symlinks. This let's it find the tree structure when directly invoking `prefix_root/bin/carbon` or similar, either inside of a Bazel rule or from the command line, and mirrors how we expect the installed tree to look. This works even when Bazel resolves the symlink target fully, and potentially to something nonsensical like a CAS file. In order to make a convenient Bazel target that can be used with `bazel run //toolchain`, this adds an override to explicitly set the desired argv[0] to use when selecting a mode for the busybox and a busybox binary. Currently, the workaround uses an environment variable because that required the least amount of plumbing, and seems a useful override mechanism generally, but I'm open to other approaches. This should allow a few things to work a bit more nicely: - It should handle sibling symlinks like `clang++` to `clang` or `ld.lld` to `lld`, where that symlink in turn points at the busybox. We want to use *initial* `argv[0]` value to select the mode there. - It avoids bouncing through Python (or other subprocesses) when invoking the `carbon` binary in Bazel rules, which will be nice for building the example code and benchmarking. It does come at a cost of removing one feature: the initial symlink can't be some unrelated alias like `my_carbon_symlink` -- we expect the *first* argv[0] name to have the meaningful filename for selecting a busybox mode. It also trades the complexity of the Python script for some complexity in the busybox search in order to look for a relative `carbon-busybox` binary. On the whole, I think that tradeoff is worthwhile, but it isn't free.
This restores the symlinks for the installation, but teaches the busybox info search to look for a relative path to the busybox binary itself before walking through symlinks. This let's it find the tree structure when directly invoking `prefix_root/bin/carbon` or similar, either inside of a Bazel rule or from the command line, and mirrors how we expect the installed tree to look. This works even when Bazel resolves the symlink target fully, and potentially to something nonsensical like a CAS file. In order to make a convenient Bazel target that can be used with `bazel run //toolchain`, this adds an override to explicitly set the desired argv[0] to use when selecting a mode for the busybox and a busybox binary. Currently, the workaround uses an environment variable because that required the least amount of plumbing, and seems a useful override mechanism generally, but I'm open to other approaches. This should allow a few things to work a bit more nicely: - It should handle sibling symlinks like `clang++` to `clang` or `ld.lld` to `lld`, where that symlink in turn points at the busybox. We want to use *initial* `argv[0]` value to select the mode there. - It avoids bouncing through Python (or other subprocesses) when invoking the `carbon` binary in Bazel rules, which will be nice for building the example code and benchmarking. It does come at a cost of removing one feature: the initial symlink can't be some unrelated alias like `my_carbon_symlink` -- we expect the *first* argv[0] name to have the meaningful filename for selecting a busybox mode. It also trades the complexity of the Python script for some complexity in the busybox search in order to look for a relative `carbon-busybox` binary. On the whole, I think that tradeoff is worthwhile, but it isn't free.
As described in symlink_helpers.bzl, copied here for visibility:
Symlinking busybox things needs special logic.
This is because Bazel doesn't cache the actual symlink, resulting in essentially resolved symlinks being produced in place of the expected tool. As a consequence, we can't rely on the symlink name when dealing with busybox entries.
An example repro of this using a local build cache is:
We could in theory get reasonable behavior with
ctx.actions.declare_symlink
, but that's disallowed in our.bazelrc
for cross-environment compatibility.The particular approach here uses the Python script as a launching pad so that the busybox still receives an appropriate location in argv[0], allowing it to find other files in the lib directory. Arguments are inserted to get equivalent behavior as if symlink resolution had occurred.
The underlying bug is noted at: bazelbuild/bazel#23620