Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctx.actions.symlink(target_file = ...) uses absolute paths as destinations, breaking under --incompatible_sandbox_hermetic_tmp=true + /tmp/... output bases #20518

Closed
rrbutani opened this issue Dec 13, 2023 · 12 comments
Assignees
Labels
team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@rrbutani
Copy link
Contributor

rrbutani commented Dec 13, 2023

Issue

Another issue like #20515 1: when building Bazel at HEAD with Bazel 7.0.0 and an output base on /tmp, I get errors about proto paths when actions for proto_library targets are run. For example:

bazel-7.0.0-linux-x86_64 build @@protobuf~21.7//:any_proto
INFO: Invocation ID: 1c5d3a4f-a8b4-43cb-b910-fef6fa142418
INFO: Analyzed target @@protobuf~21.7//:any_proto (0 packages loaded, 0 targets configured).
SUBCOMMAND: # @@protobuf~21.7//:any_proto [action 'Generating Descriptor Set proto_library @@protobuf~21.7//:any_proto', configuration: 47445fac1fec9271fc0db38c9d43ff67189b592b2ddbef4782fa0b297ddd5b14, execution platform: //:default_host_platform, mnemonic: GenProtoDescriptorSet]
(cd /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/execroot/_main && \
  exec env - \
    PATH=/usr/local/bin:/usr/bin:/bin \
  bazel-out/k8-opt-exec-ST-fad1763555eb/bin/external/protobuf~21.7/protoc --direct_dependencies google/protobuf/any.proto '--direct_dependencies_violation_msg=%s is imported, but @@protobuf~21.7//:any_proto doesn'\''t directly depend on a proto_library that '\''srcs'\'' it.' '--descriptor_set_out=bazel-out/k8-fastbuild/bin/external/protobuf~21.7/any_proto-descriptor-set.proto.bin' -Ibazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto -I. bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto)
# Configuration: 47445fac1fec9271fc0db38c9d43ff67189b592b2ddbef4782fa0b297ddd5b14
# Execution platform: //:default_host_platform
ERROR: /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/external/protobuf~21.7/BUILD.bazel:263:14: Generating Descriptor Set proto_library @@protobuf~21.7//:any_proto failed: (Exit 1): protoc failed: error executing GenProtoDescriptorSet command (from target @@protobuf~21.7//:any_proto) bazel-out/k8-opt-exec-ST-fad1763555eb/bin/external/protobuf~21.7/protoc --direct_dependencies google/protobuf/any.proto ... (remaining 5 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
Could not make proto path relative: bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto: No such file or directory

If --incompatible_sandbox_hermetic_tmp=false is passed, the above command succeeds.

Details

Running with --sandbox_debug and poking around the failing actions execroot reveals that unlike #20515 the file in question here is a symlink into the sandbox-execroot directory (as it should be):

ls /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto -l
lrwxrwxrwx 1 rrbutani users 128 Dec 12 15:44 /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto -> /tmp/bazel-execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto

However if we follow that symlink (while running under the sandbox invocation from --sandbox_debug) we arrive, once again, at symlink to a host-execroot path:

/tmp/bazel-rrbutani/install/89a68939cbf63eb54205fdf943a58b15/linux-sandbox \
    -W /tmp/bazel-working-directory/_main \
    -t 15 \
    -w /tmp/bazel-execroot/_main \
    -w /tmp \
    -w /dev/shm \
    -M /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/execroot \
        -m /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/_hermetic_tmp/bazel-execroot \
    -M /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/execroot \
        -m /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/_hermetic_tmp/bazel-working-directory \
    -M /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/_hermetic_tmp \
        -m /tmp \
    -S /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/stats.out \
    -D /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/sandbox/linux-sandbox/6208/debug.out \
    -- ls -l --color /tmp/bazel-execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto
lrwxrwxrwx 1 rrbutani users 105 Dec 12 16:10 /tmp/bazel-execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto -> /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/external/protobuf~21.7/src/google/protobuf/any.proto

Looking at this file (bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto) in the execroot (note: the actual execroot under the output base, not the action's execroot) corroborates this:

ls /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto -l
lrwxrwxrwx 1 rrbutani users 105 Dec 12 16:10 /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/execroot/_main/bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto -> /tmp/bazel-rrbutani/705c9771f27ae01ebf0993be3b99a716/external/protobuf~21.7/src/google/protobuf/any.proto

Cause

aquery on @@protobuf~21.7//:any_proto reveals the action generating bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto:

action 'Symlinking virtual .proto sources for @@protobuf~21.7//:any_proto'
  Mnemonic: Symlink
  Target: @com_google_protobuf//:any_proto
  Configuration: k8-fastbuild
  Execution platform: //:default_host_platform
  ActionKey: 7a39b61d5744a8903119e17b3ddc7e56e330f240d2bead98c3ba081a81b5c9ac
  Inputs: [external/protobuf~21.7/src/google/protobuf/any.proto]
  Outputs: [bazel-out/k8-fastbuild/bin/external/protobuf~21.7/_virtual_imports/any_proto/google/protobuf/any.proto]

This then leads back to proto_library's starlark implementation function (via _process_srcs)

virtual_srcs = []
for src in srcs:
# Remove strip_import_prefix
if not src.short_path.startswith(full_strip_import_prefix):
fail(".proto file '%s' is not under the specified strip prefix '%s'" %
(src.short_path, full_strip_import_prefix))
import_path = src.short_path[len(full_strip_import_prefix):]
# Add import_prefix
virtual_src = ctx.actions.declare_file(_join(virtual_imports, import_prefix, import_path))
ctx.actions.symlink(
output = virtual_src,
target_file = src,
progress_message = "Symlinking virtual .proto sources for %{label}",
)
virtual_srcs.append(virtual_src)
return proto_path, virtual_srcs

... which in turn leads to SymlinkAction (via here) which appears to always create symlinks that have absolute path destinations (IIUC):

@Override
public ActionResult execute(ActionExecutionContext actionExecutionContext)
throws ActionExecutionException {
maybeVerifyTargetIsExecutable(actionExecutionContext);
Path srcPath;
if (inputPath == null) {
srcPath = actionExecutionContext.getInputPath(getPrimaryInput());
} else {
srcPath = actionExecutionContext.getExecRoot().getRelative(inputPath);
}
Path outputPath = getOutputPath(actionExecutionContext);
try {
// Delete the empty output directory created prior to the action execution when the output is
// a tree artifact. All other actions that produce tree artifacts expect it to exist prior to
// their execution. It's not worth complicating ActionOutputDirectoryHelper just to avoid this
// small amount of overhead.
outputPath.delete();
outputPath.createSymbolicLink(srcPath);
} catch (IOException e) {
String message =
String.format(
"failed to create symbolic link '%s' to '%s' due to I/O error: %s",
getPrimaryOutput().getExecPathString(), printInputs(), e.getMessage());
DetailedExitCode code = createDetailedExitCode(message, Code.LINK_CREATION_IO_EXCEPTION);
throw new ActionExecutionException(message, e, this, false, code);
}
updateInputMtimeIfNeeded(actionExecutionContext);
return ActionResult.EMPTY;
}


I am not sure how to go about solving this in a way that's not likely to cause other breakage. As best as I can tell actions are not keyed on --incompatible_sandbox_hermetic_tmp; having ctx.actions.symlink behave differently based on this option seems fraught. If relative symlink destinations are not viable, perhaps we can rewrite these symlinks during sandbox spawn setup? (i.e. instead of making a symlink within the action's execroot that points to the symlink within the execroot, just have the action-execroot symlink point to the destination, adjusted to be sandbox-execroot-relative).

Note: #14224 is somewhat related to this though this specific case (proto_library) involves an invocation of ctx.actions.symlink(...) with target_file instead of target_path.

Footnotes

  1. in Paths to Param Files in Command Lines Refer to Host Execroot Paths (instead of sandbox execroot paths) With --incompatible_sandbox_hermetic_tmp=true #20515, many of the actions were already in my disk cache (with results produced by a previous invocation with --incompatible_sandbox_hermetic_tmp disabled); this masked other errors like this one

@rrbutani
Copy link
Contributor Author

Given that issues like this and #20515 exist, can we maybe — for now — disable --incompatible_sandbox_hermetic_tmp if an output base under /tmp is used?

Alternatively, I think bind mounting the execroot directory to its host path — provided this does not collide with special directories like /tmp/bazel-execroot, etc — after the hermetic tmp dir is mounted onto /tmp would be (though unsatisfying) an effective stopgap solution to these issues.

cc: @fmeum

@lberki
Copy link
Contributor

lberki commented Dec 13, 2023

Oof, this is pretty bad, I have no idea how this escaped our test battery :( (probably because we don't exercise the case where the output base is under /tmp enough)

For now, I'll classify this issue as extremely embarrassing but not a fire drill because it only affects people who have their output bases under /tmp and there is a clear workaround (--noincompatible_sandbox_hermetic_tmp)

I don't immediately have any clever ideas how to fix this, other than by running readlink or readlink -f on every symlink on the action inputs then special-casing them in some way.

@lberki
Copy link
Contributor

lberki commented Dec 13, 2023

@oquenchil do you have any clever ideas here?

@fmeum
Copy link
Collaborator

fmeum commented Dec 13, 2023

@lberki I remember one of your comments (fmeum@a2fdd4a#r82080580) where you suggested one could dig out the action generating an artifact and downcast it to a concrete action class. Wouldn't this help here? If we could check that the artifact is generated by SymlinkAction, we could have it stage the underlying artifact or at least fix up the path.

@tjgq
Copy link
Contributor

tjgq commented Dec 13, 2023

If we could check that the artifact is generated by SymlinkAction, we could have it stage the underlying artifact or at least fix up the path.

The issue likely isn't exclusive to SymlinkAction; it's also possible to do the equivalent of ctx.actions.symlink via ctx.actions.(run|run_shell).

@fmeum
Copy link
Collaborator

fmeum commented Dec 18, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Dec 18, 2023
@fmeum
Copy link
Collaborator

fmeum commented Dec 18, 2023

@bazel-io fork 7.0.1

@fmeum
Copy link
Collaborator

fmeum commented Dec 18, 2023

@lberki For Bazel 7.0.1, should we just disable hermetic tmp if the output base lies under /tmp? Then we can take our time to figure out a solution. This being broken by default in subtle ways is arguably worse than the previous non-hermetic tmp was.

@lberki
Copy link
Contributor

lberki commented Dec 19, 2023

#20603 is a draft fix for this.

@tjgq tjgq removed the untriaged label Jan 2, 2024
copybara-service bot pushed a commit that referenced this issue Jan 2, 2024
…sted themselves when the output base was under `/tmp`:

1. Virtual action inputs didn't work. This was a fairly simple omission in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using `declare_file`) did not work. This is fixed by keeping track of the realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes #20515 and #20518.

RELNOTES: None.
PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 2, 2024
…sted themselves when the output base was under `/tmp`:

1. Virtual action inputs didn't work. This was a fairly simple omission in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using `declare_file`) did not work. This is fixed by keeping track of the realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes bazelbuild#20515 and bazelbuild#20518.

RELNOTES: None.
PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1
@lberki
Copy link
Contributor

lberki commented Jan 2, 2024

Fixed in fb6658c .

@lberki lberki closed this as completed Jan 2, 2024
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 2, 2024
…sted themselves when the output base was under `/tmp`:

1. Virtual action inputs didn't work. This was a fairly simple omission in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using `declare_file`) did not work. This is fixed by keeping track of the realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes bazelbuild#20515 and bazelbuild#20518.

RELNOTES: None.
PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1
iancha1992 pushed a commit that referenced this issue Jan 3, 2024
…manifested themselves when the output base was under /tmp (#20718)

1. Virtual action inputs didn't work. This was a fairly simple omission
in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using
`declare_file`) did not work. This is fixed by keeping track of the
realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes #20515 and #20518.

RELNOTES: None.
Commit
fb6658c

PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1

Co-authored-by: Googler <[email protected]>
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Jan 5, 2024
…sted themselves when the output base was under `/tmp`:

1. Virtual action inputs didn't work. This was a fairly simple omission in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using `declare_file`) did not work. This is fixed by keeping track of the realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes bazelbuild#20515 and bazelbuild#20518.

RELNOTES: None.
PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1
@iancha1992
Copy link
Member

@bazel-io fork 7.1.0

github-merge-queue bot pushed a commit that referenced this issue Jan 8, 2024
…manifested themselves when the output base was under /tmp (#20766)

1. Virtual action inputs didn't work. This was a fairly simple omission
in the path transformation logic.
2. Artifacts which are resolved symlinks (i.e. declared using
`declare_file`) did not work. This is fixed by keeping track of the
realpath of the symlink in `FileArtifactValue` / `TreeArtifactValue`.

Fixes #20515 and #20518.

RELNOTES: None.
Commit
fb6658c

PiperOrigin-RevId: 595145191
Change-Id: I833025928374c78bc719d8e3be1efb2b2950b9f1

Co-authored-by: Googler <[email protected]>
@lberki lberki added this to the 7.0.1 release blockers milestone Jan 16, 2024
@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 7.0.1 RC2. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

8 participants