-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace #4505 with a different set of workarounds #4527
base: trunk
Are you sure you want to change the base?
Conversation
Maybe I'm missing something, but the use of This change tries to create a symlink chain such as:
While that's typical output, Bazel could use its CAS (content-addressable storage) cache. In this model, Bazel's promise is to provide something which matches the checksum of the regular file, not the file/symlink structure (this is bazelbuild/bazel#23620). This is also true of
Note that things like remote caching can be a lot firmer about removing indirections, even more than the local cache. That's why we need bazelbuild/bazel#23620 to be fixed if we want to use symlinks with Bazel, since it's about retaining symlink structure. As far as alternative workarounds:
So that's why I went with something like the 2nd workaround: scripts rather than symlinks. I believe Google's Bazel support makes much more eager decisions about when to reuse cache content, and how to structure symlinks. If you want to go this route, perhaps it would make sense to verify that it works robustly using your access to Google's infrastructure? |
Oh, and to be sure, I believe that the main difference prior to #4505 is in changing the name "run_carbon" to "run/carbon". But since it's going back to symlinks, I do think that this is brittle. Maybe I could have been clearer in-person, but I think any solution using symlinks is going to require a split implementation: one for this repository on GitHub, one for Google. |
Yes, it was trying to couple this change with the change to prevent a symlink for the busybox itself. But thanks for the context on worrying about remote caches forcing even the original busybox to be a symlink into some CAS or other thing that loses all the interesting context. That was what I was missing that #4505 was trying to work around. Sorry if it wasn't clear -- this PR was mostly a question, I wasn't at all confident. I have another idea that I'm going to play with, but not sure it will work. But now I know how to test that. |
42849ee
to
d751db4
Compare
This restores the symlinks for the installation, but teaches the busybox info search to look for a relative path to the busybox binary itself before walking through symlinks. This let's it find the tree structure when directly invoking `prefix_root/bin/carbon` or similar, either inside of a Bazel rule or from the command line, and mirrors how we expect the installed tree to look. This works even when Bazel resolves the symlink target fully, and potentially to something nonsensical like a CAS file. In order to make a convenient Bazel target that can be used with `bazel run //toolchain`, this adds an override to explicitly set the desired argv[0] to use when selecting a mode for the busybox and a busybox binary. Currently, the workaround uses an environment variable because that required the least amount of plumbing, and seems a useful override mechanism generally, but I'm open to other approaches. This should allow a few things to work a bit more nicely: - It should handle sibling symlinks like `clang++` to `clang` or `ld.lld` to `lld`, where that symlink in turn points at the busybox. We want to use *initial* `argv[0]` value to select the mode there. - It avoids bouncing through Python (or other subprocesses) when invoking the `carbon` binary in Bazel rules, which will be nice for building the example code and benchmarking. It does come at a cost of removing one feature: the initial symlink can't be some unrelated alias like `my_carbon_symlink` -- we expect the *first* argv[0] name to have the meaningful filename for selecting a busybox mode. It also trades the complexity of the Python script for some complexity in the busybox search in order to look for a relative `carbon-busybox` binary. On the whole, I think that tradeoff is worthwhile, but it isn't free.
d751db4
to
f150e8b
Compare
Ok, after this helpful feedback and some offline discussions, I think a new attempt. The core of this is to look for a relative Should get us symlink overhead but be resilient to all the perplexities of symlinks (I hope!). PTAL! |
(Also, just flagging that I updated the PR description with new context.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood you intend things to change, I still would appreciate tests demonstrating the behavior that's not intended to work anymore. It would make it clearer that it's an intended break, not an accidental one. (suggestions for specific things below)
MakeDir(prefix); | ||
MakeDir(prefix / "lib"); | ||
MakeDir(prefix / "lib/carbon"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd been minimalist about this, but you could switch MakeDir
to std::create_directories
so that you don't need the repeat calls.
@@ -147,8 +148,8 @@ TEST_F(BusyboxInfoTest, AbsoluteSymlink) { | |||
|
|||
auto info = GetBusyboxInfo(target.string()); | |||
ASSERT_TRUE(info.ok()) << info.error(); | |||
EXPECT_THAT(info->bin_path, Eq(busybox)); | |||
EXPECT_THAT(info->mode, Eq("carbon")); | |||
EXPECT_TRUE(info->bin_path.is_absolute()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is trying to verify that absolute symlinks are correctly handled. That was in specific response to your concern that path concatenation for an absolute symlink would concatenate instead of replacing (also, is_absolute
won't validate the difference). IIUC, the behavior here changed and now is a subset of LayerSymlinksInstallTree
due to your logic changes; had you considered adjusting the file structure in this test so that it continues to test absolute symlink handling?
EXPECT_TRUE(info->bin_path.is_absolute()); | ||
EXPECT_THAT(info->mode, Eq(std::nullopt)); | ||
|
||
info = GetBusyboxInfo(clang_target.string()); | ||
ASSERT_TRUE(info.ok()) << info.error(); | ||
EXPECT_TRUE(info->bin_path.is_absolute()); | ||
EXPECT_THAT(info->mode, Eq("clang")); | ||
|
||
info = GetBusyboxInfo(clangplusplus_target.string()); | ||
ASSERT_TRUE(info.ok()) << info.error(); | ||
EXPECT_TRUE(info->bin_path.is_absolute()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_absolute
isn't going to test much; a return value of "/bin/carbon" would satisfy this check. Is this test doing enough that you're confident this is the correct absolute path?
TEST_F(BusyboxInfoTest, ExtraSymlink) { | ||
MakeFile(dir_ / "carbon-busybox"); | ||
MakeSymlink(dir_ / "carbon", "carbon-busybox"); | ||
auto target = MakeSymlink(dir_ / "c", "carbon"); | ||
MakeSymlink(dir_ / "c", "carbon-busybox"); | ||
auto target = MakeSymlink(dir_ / "carbon", "c"); | ||
|
||
auto info = GetBusyboxInfo(target.string()); | ||
ASSERT_TRUE(info.ok()) << info.error(); | ||
EXPECT_THAT(info->bin_path, Eq(dir_ / "carbon-busybox")); | ||
EXPECT_THAT(info->mode, Eq("carbon")); | ||
EXPECT_THAT(info->mode, Eq(std::nullopt)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a new test which tests the original symlink order, and that "c" is now the result? i.e., something like OriginalSymlinkNameRetained
@@ -166,5 +167,58 @@ TEST_F(BusyboxInfoTest, NotBusyboxSymlink) { | |||
EXPECT_FALSE(info.ok()); | |||
} | |||
|
|||
TEST_F(BusyboxInfoTest, LayerSymlinksInstallTree) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yesterday, one of the concerns I mentioned was incorrect install selection. You'd said you're okay with that, but would it be worth adding a test that demonstrates the expected result?
(note, you could share install tree creation via a helper with LayerSymlinksInstallTree
)
e.g., choosing "/opt" because that's a thing:
TEST_F(BusyBoxInfoTest, SymlinkAtSameLevelReturnsWrongInstall) {
// Just enough of the system install that incorrect looks can find the busybox.
MakeDir(dir_ / "lib");
MakeDir(dir_ / "lib/carbon");
MakeFile(dir_ / "lib/carbon/carbon-busybox");
// What it would look like if someone also did an install in "opt".
MakeDir(dir_ / "opt");
MakeDir(dir_ / "opt/lib");
MakeDir(dir_ / "opt/lib/carbon");
MakeFile(dir_ / "opt/lib/carbon/carbon-busybox");
MakeDir(dir_ / "opt/lib/carbon/llvm");
MakeDir(dir_ / "opt/lib/carbon/llvm/bin");
MakeSymlink(dir_ / "opt/lib/carbon/llvm/bin/clang", "../../carbon-busybox")
MakeDir(dir_ / "opt/bin");
MakeSymlink(dir_ / "opt/bin/carbon", "../lib/carbon/carbon-busybox")
auto target = MakeSymlink(dir_ / "opt/carbon", "bin/carbon");
auto info = GetBusyboxInfo(target.string());
ASSERT_TRUE(info.ok()) << info.error();
EXPECT_THAT(info->bin_path, Eq(dir_ / "opt/../lib/carbon/carbon-busybox"));
EXPECT_THAT(info->mode, Eq("clang"));
}
Co-authored-by: Jon Ross-Perkins <[email protected]>
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking
prefix_root/bin/carbon
or similar, eitherinside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.
In order to make a convenient Bazel target that can be used with
bazel run //toolchain
, this adds an override to explicitly set the desiredargv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.
This should allow a few things to work a bit more nicely:
clang++
toclang
orld.lld
tolld
, where that symlink in turn points at the busybox.We want to use initial
argv[0]
value to select the mode there.invoking the
carbon
binary in Bazel rules, which will be nice forbuilding the example code and benchmarking.
It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like
my_carbon_symlink
-- we expect thefirst argv[0] name to have the meaningful filename for selecting
a busybox mode.
It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative
carbon-busybox
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.