Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The associated file is either missing or is an invalid symlink #20408

Closed
UebelAndre opened this issue Dec 1, 2023 · 11 comments
Closed

The associated file is either missing or is an invalid symlink #20408

UebelAndre opened this issue Dec 1, 2023 · 11 comments
Assignees
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@UebelAndre
Copy link
Contributor

UebelAndre commented Dec 1, 2023

Description of the bug:

I have custom rules that create symlinks that at the time of the build are intentionally broken links that are then fed into another rule that composes them into an artifact where the links are valid. I'm trying to upgrade to Bazel 6.3.0 and am running into the following error when I get remote cache hits using --remote_download_toplevel.

ERROR: /Users/User/Code/directory_bug/BUILD.bazel:9:14: Error while validating output TreeArtifact File:[[<execution_root>]bazel-out/darwin_arm64-fastbuild/bin]directory_bug : Failed to resolve relative path symlink_directory inside TreeArtifact /private/var/tmp/_bazel_user/68b71d322ac1da8a4878040e6895dbc8/execroot/directory_bug/bazel-out/darwin_arm64-fastbuild/bin/directory_bug. The associated file is either missing or is an invalid symlink.
ERROR: /Users/User/Code/directory_bug/BUILD.bazel:9:14: Action directory_bug failed: not all outputs were created or valid

This did not occur on Bazel 6.2.1. Is there an incompatibility flag that I can use to restore the previous behavior?

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Using the following workspace, I'm able to build on Bazel 6.3.0 with a remote cache configured to see the bug. Building on Bazel 6.2.1 does not observe the issue.

shasum file
e430bf04f9d2e5e4a3a82f54993514d05d4224b2decf65b353175f0fb1bc2a3a directory_bug.zip

.bazelrc

build --remote_download_toplevel

BUILD.bazel

load("//:defs.bzl", "directory_bug")

sh_binary(
    name = "action",
    srcs = ["action.sh"],
    visibility = ["//visibility:public"],
)

directory_bug(
    name = "directory_bug",
)

action.sh

#!/usr/bin/env bash

set -euo pipefail

OUT_DIR="${OUT_DIR}"

mkdir -p "${OUT_DIR}/a"
touch "${OUT_DIR}/a/a.txt"

ln -s  a "${OUT_DIR}/symlink_directory"

mkdir -p "${OUT_DIR}/symlink"
ln -s ../a "${OUT_DIR}/symlink/relative"

defs.bzl

"""Example"""

def _directory_bug_impl(ctx):
    out_dir = ctx.actions.declare_directory(ctx.label.name)

    ctx.actions.run(
        outputs = [out_dir],
        executable = ctx.executable._executable,
        env = {
            "OUT_DIR": out_dir.path,
        }
    )

    return [DefaultInfo(files = depset([out_dir]))]

directory_bug = rule(
    implementation = _directory_bug_impl,
    attrs = {
        "_executable": attr.label(
            executable = True,
            cfg = "exec",
            default = Label("//:action"),
        )
    }
)

Which operating system are you running Bazel on?

Linux, MacOS

What is the output of bazel info release?

release 6.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

This is a regression but I do not know the commit it was produced on.

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Dec 1, 2023
tjgq added a commit to tjgq/bazel that referenced this issue Dec 1, 2023
…ilding without the bytes.

This is the same bug as bazelbuild#19143, except that the fix in 3a48457 missed the case
where the symlink occurs inside an output directory.

Fixes bazelbuild#20408.
@tjgq
Copy link
Contributor

tjgq commented Dec 1, 2023

I have custom rules that create symlinks that at the time of the build are intentionally broken links

Are you sure the provided example is representative of this scenario? None of the symlinks created by action.sh dangle. (If they did, I'd expect the test case to fail with every Bazel version, since dangling symlinks in tree artifacts aren't supported - see #15454 for a feature request that would allow them.)

Assuming that the example is as intended: this is the same bug as #19143, except that the fix didn't handle the case where the symlink is tucked inside an output directory. Similar to the original issue, it regressed between 6.2.0 and 6.3.0, but has been fixed in 7.0.0.

I have a draft PR with a fix at #20409 (against 6.4.0) but I don't know whether further 6.x releases are planned. /cc @meteorcloudy

@UebelAndre
Copy link
Contributor Author

Are you sure the provided example is representative of this scenario?

My description of the problem may be off but this is the exact scenario I have in my repo.

Assuming that the example is as intended: this is the same bug as #19143, except that the fix didn't handle the case where the symlink is tucked inside an output directory. Similar to the original issue, it regressed between 6.2.0 and 6.3.0, but has been fixed in 7.0.0.

So this issue is fixed in 7.0? When I try building with 7.0.0-pre.20231018.3 I run into

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unexpected Exception 'Cannot get node id for DirectoryArtifactValue{mtime=1701445982914}' when closing BEP transports, this is a bug.
	at com.google.devtools.build.lib.buildeventservice.BuildEventServiceModule.waitForBuildEventTransportsToClose(BuildEventServiceModule.java:518)
	at com.google.devtools.build.lib.buildeventservice.BuildEventServiceModule.closeBepTransports(BuildEventServiceModule.java:606)
	at com.google.devtools.build.lib.buildeventservice.BuildEventServiceModule.afterCommand(BuildEventServiceModule.java:621)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:627)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:688)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:244)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:621)
	at io.grpc.Context$1.run(Context.java:566)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException: Cannot get node id for DirectoryArtifactValue{mtime=1701445982914}
	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:571)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:111)
	at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:247)
	at com.google.devtools.build.lib.buildeventservice.BuildEventServiceModule.waitForBuildEventTransportsToClose(BuildEventServiceModule.java:504)
	... 11 more
Caused by: java.lang.UnsupportedOperationException: Cannot get node id for DirectoryArtifactValue{mtime=1701445982914}
	at com.google.devtools.build.lib.remote.RemoteActionFileSystem$2.getNodeId(RemoteActionFileSystem.java:634)
	at com.google.devtools.build.lib.vfs.DigestUtils$CacheKey.<init>(DigestUtils.java:67)
	at com.google.devtools.build.lib.vfs.DigestUtils.manuallyComputeDigest(DigestUtils.java:193)
	at com.google.devtools.build.lib.vfs.DigestUtils.getDigestWithManualFallback(DigestUtils.java:160)
	at com.google.devtools.build.lib.remote.util.DigestUtil.compute(DigestUtil.java:72)
	at com.google.devtools.build.lib.remote.util.DigestUtil.compute(DigestUtil.java:67)
	at com.google.devtools.build.lib.remote.ByteStreamBuildEventArtifactUploader.readPathMetadata(ByteStreamBuildEventArtifactUploader.java:236)
	at com.google.devtools.build.lib.remote.ByteStreamBuildEventArtifactUploader.lambda$doUpload$8(ByteStreamBuildEventArtifactUploader.java:410)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableMap$MapSubscriber.onNext(FlowableMap.java:64)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.fastPath(FlowableFromIterable.java:185)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:129)
	at io.reactivex.rxjava3.internal.subscribers.BasicFuseableSubscriber.request(BasicFuseableSubscriber.java:153)
	at io.reactivex.rxjava3.internal.jdk8.FlowableCollectWithCollectorSingle$CollectorSingleObserver.onSubscribe(FlowableCollectWithCollectorSingle.java:102)
	at io.reactivex.rxjava3.internal.subscribers.BasicFuseableSubscriber.onSubscribe(BasicFuseableSubscriber.java:67)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:69)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47)
	at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:38)
	at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
	at io.reactivex.rxjava3.internal.jdk8.FlowableCollectWithCollectorSingle.subscribeActual(FlowableCollectWithCollectorSingle.java:71)
	at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
	at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
	at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
	at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
	at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
	at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
	at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
	at io.reactivex.rxjava3.internal.operators.single.SingleUsing.subscribeActual(SingleUsing.java:83)
	at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
	at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
	at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38)
	at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:25)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	... 3 more

@tjgq
Copy link
Contributor

tjgq commented Dec 1, 2023

The fix I described is for the Error while validating output TreeArtifact issue (which I believe only exists in 6.x).

The Cannot get node id for DirectoryArtifactValue issue in 7.0.0-pre.20231018.3 looks like #20246. Can you try with 7.0.0rc4 or 7.0.0rc5, which should both contain the fix for that one?

@UebelAndre
Copy link
Contributor Author

The fix I described is for the Error while validating output TreeArtifact issue (which I believe only exists in 6.x).

The Cannot get node id for DirectoryArtifactValue issue in 7.0.0-pre.20231018.3 looks like #20246. Can you try with 7.0.0rc4 or 7.0.0rc5, which should both contain the fix for that one?

I get the same error on 7.0.0rc4 and 7.0.0rc5 😞

@tjgq
Copy link
Contributor

tjgq commented Dec 1, 2023

Ok, so that's a different bug in 7.0.0rc5.

Can you double check that the repro in the first post of this issue is correct, and also include the Bazel flags you're using? It looks like you're using a build event service, but I played around with some BEP-related flags and couldn't reproduce it.

@UebelAndre
Copy link
Contributor Author

There may not be a dangling symlink in this case but this an exact copy of a failure I see in my larger repo.

I've been testing with the following .bazelrc

build --bes_results_url=https://app.buildbuddy.io/invocation/
build --bes_backend=grpcs://remote.buildbuddy.io
build --remote_cache=grpcs://remote.buildbuddy.io
build --remote_download_toplevel # Helps remove network bottleneck if caching is enabled
build --remote_timeout=3600
build --remote_header=x-buildbuddy-api-key=####################

All I did to setup a remote cache and BES was go to https://www.buildbuddy.io/ and used their free tier.

Building is simply

bazel clean && bazel build ...

Run once to populate the remote cache, which succeeds, then the second invocation will catch the error.

@tjgq
Copy link
Contributor

tjgq commented Dec 2, 2023

Thanks, I managed to reproduce it. I'll look into a fix next week.

@tjgq
Copy link
Contributor

tjgq commented Dec 2, 2023

I filed a separate issue for the 7.0.0rc5 bug at #20415 since this discussion started out with a different one and it gets confusing.

Let's keep this one open until #20409 is merged into a 6.x release.

tjgq added a commit to tjgq/bazel that referenced this issue Dec 7, 2023
…ilding without the bytes.

This is the same bug as bazelbuild#19143, except that the fix in 3a48457 missed the case
where the symlink occurs inside an output directory.

Fixes bazelbuild#20408.
tjgq added a commit to tjgq/bazel that referenced this issue Dec 7, 2023
…ilding without the bytes.

This is the same bug as bazelbuild#19143, except that the fix in 3a48457 missed the case
where the symlink occurs inside an output directory.

Fixes bazelbuild#20408.
Wyverald pushed a commit that referenced this issue Dec 7, 2023
…ilding without the bytes. (#20409)

This is the same bug as #19143, except that the fix in 3a48457 missed
the case where the symlink occurs inside an output directory.

Fixes #20408.
@bcsgh
Copy link

bcsgh commented Dec 20, 2023

I'm not following exactly what the issue is here, but I'm getting a very similar error message out of 7.0.0 (which I don't think I was getting from 6.4.0) while tying to build something depending on an oci_pull(). Building with --spawn_strategy=local makes the problem go away.

If this is unrelated, I'd be happy to open a new issue, but I'd prefer to avoid spamming.

WORKSPACE

(Just what seems to be the important bit.)

[...]
oci_pull(
    name = "ubuntu_20_04",
    # https://gallery.ecr.aws/docker/library/ubuntu
    digest = "sha256:218bb51abbd1864df8be26166f847547b3851a89999ca7bfceb85ca9b5d2e95d",
    image = "public.ecr.aws/docker/library/ubuntu",#:20.04",
)

Error:

ERROR: /home/bcs/.cache/bazel/_bazel_bcs/e7abb5ed16b1626c39b8bceb3e9fad2b/external/ubuntu_20_04_single/BUILD.bazel:16:18: Error while validating output TreeArtifact File:[[<execution_root>]bazel-out/k8-fastbuild/bin]external/ubuntu_20_04_single/blobs/sha256 : Failed to resolve relative path 218bb51abbd1864df8be26166f847547b3851a89999ca7bfceb85ca9b5d2e95d inside TreeArtifact /home/bcs/.cache/bazel/_bazel_bcs/e7abb5ed16b1626c39b8bceb3e9fad2b/execroot/tbd_server/bazel-out/k8-fastbuild/bin/external/ubuntu_20_04_single/blobs/sha256. The associated file is either missing or is an invalid symlink.
ERROR: /home/bcs/.cache/bazel/_bazel_bcs/e7abb5ed16b1626c39b8bceb3e9fad2b/external/ubuntu_20_04_single/BUILD.bazel:16:18: Copying files to directory ubuntu_20_04_single/blobs/sha256 failed: not all outputs were created or valid

Bazel version:

$ /usr/bin/bazel version
Build label: 7.0.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Dec 11 16:51:49 2023 (1702313509)
Build timestamp: 1702313509
Build timestamp as int: 1702313509

@alexeagle
Copy link
Contributor

@bcsgh I think your issue is bazel-contrib/rules_oci#425 and not a bug in Bazel.

@iancha1992
Copy link
Member

A fix for this issue has been included in Bazel 6.5.0 RC1. Please test out the release candidate and report any issues as soon as possible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

8 participants