Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the bazel-out structure to avoid busybox symlinks. #4505

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

jonmeow
Copy link
Contributor

@jonmeow jonmeow commented Nov 8, 2024

As described in symlink_helpers.bzl, copied here for visibility:

Symlinking busybox things needs special logic.

This is because Bazel doesn't cache the actual symlink, resulting in essentially resolved symlinks being produced in place of the expected tool. As a consequence, we can't rely on the symlink name when dealing with busybox entries.

An example repro of this using a local build cache is:

bazel build //toolchain
bazel clean
bazel build //toolchain

We could in theory get reasonable behavior with ctx.actions.declare_symlink, but that's disallowed in our .bazelrc for cross-environment compatibility.

The particular approach here uses the Python script as a launching pad so that the busybox still receives an appropriate location in argv[0], allowing it to find other files in the lib directory. Arguments are inserted to get equivalent behavior as if symlink resolution had occurred.

The underlying bug is noted at: bazelbuild/bazel#23620

@@ -75,3 +105,35 @@ symlink_file = rule(
"symlink_label": attr.label(allow_single_file = True),
},
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make any changes below here? I ask because I don't see any changes compared to the code that was previously at the top of the file, but if there aren't any, it's bizarre that the diff is treating it as new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this was just shuffling as I was trying to figure out organization.

@jonmeow jonmeow added this pull request to the merge queue Nov 8, 2024
Merged via the queue into carbon-language:trunk with commit 6dce164 Nov 8, 2024
8 checks passed
@jonmeow jonmeow deleted the install-hacks branch November 8, 2024 22:14
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Nov 14, 2024
Rather than using a python script to intercept commands targeting the
busybox, this *copies* the busybox into the installation (precluding
a symlink), and arranges for symlinks pointing at it to always have the
desired basename.

This won't support a symlink called `my_special_carbon` getting the same
busybox treatment as a symlink called `carbon`, but that seems OK to me.
And it avoids the complexity and overhead of the Python script.

Notably, a symlink from `my_special_clang` to `clang++` won't get the
implicit C++ language behavior that a symlink whose filename is spelled
exactly `clang++` does. So this kind of filename constraint on symlinks
seems precedented and might be a tolerable burden.

And in reality, it is only *inside Bazel* that these fancier symlink
spellings won't work. In an installation, even they should work.
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Nov 21, 2024
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking `prefix_root/bin/carbon` or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with `bazel
run //toolchain`, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:
- It should handle sibling symlinks like `clang++` to `clang` or
  `ld.lld` to `lld`, where that symlink in turn points at the busybox.
  We want to use *initial* `argv[0]` value to select the mode there.
- It avoids bouncing through Python (or other subprocesses) when
  invoking the `carbon` binary in Bazel rules, which will be nice for
  building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like `my_carbon_symlink` -- we expect the
*first* argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative `carbon-busybox`
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Nov 21, 2024
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking `prefix_root/bin/carbon` or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with `bazel
run //toolchain`, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:
- It should handle sibling symlinks like `clang++` to `clang` or
  `ld.lld` to `lld`, where that symlink in turn points at the busybox.
  We want to use *initial* `argv[0]` value to select the mode there.
- It avoids bouncing through Python (or other subprocesses) when
  invoking the `carbon` binary in Bazel rules, which will be nice for
  building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like `my_carbon_symlink` -- we expect the
*first* argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative `carbon-busybox`
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.
chandlerc added a commit to chandlerc/carbon-lang that referenced this pull request Nov 21, 2024
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking `prefix_root/bin/carbon` or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with `bazel
run //toolchain`, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:
- It should handle sibling symlinks like `clang++` to `clang` or
  `ld.lld` to `lld`, where that symlink in turn points at the busybox.
  We want to use *initial* `argv[0]` value to select the mode there.
- It avoids bouncing through Python (or other subprocesses) when
  invoking the `carbon` binary in Bazel rules, which will be nice for
  building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like `my_carbon_symlink` -- we expect the
*first* argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative `carbon-busybox`
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants