-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failures and internal errors when switching from 7.0.0rc3 to 7.0.0rc4 #20246
Comments
Looking at the list of commits between rc3 and rc4, it does seem there were quite a few execution-log-related commits (for example c767aa4). cc @tjgq To pinpoint where the problem is, @rsalvador it would be nice if you ran a bazelisk bisect in your project, or alternatively make a minimal repro. Either would help us address this as quickly as possible. |
There may be a problem with my machine/environment, the first build problem is now also happening with rc3. |
@bazel-io fork 7.0.0 |
The first error:
happens because we have directory dependencies between actions and with Regarding the internal error:
bazelisk bisect found that it is due to this commit: c456082, the error occurs during builds using |
@rsalvador We have found another regression in Gerrit Code Review build machinery, that was hard to reproduce due to Bazel caching. I had to wipe out the whole cache entirely. See this issue for more details. |
The initial run of |
@rsalvador Thanks for the bisect! A couple of followup questions as I try to repro this:
|
@tjgq is this rule: https://github.com/aspect-build/rules_rollup/blob/main/rollup/defs.bzl
let me know if you need the |
This is an attempt to fix bazelbuild#20246 purely from guesswork. Note the salient features of the stack trace in the bug report: 1. The crash occurs while attempting to obtain a digest for a file. 2. DigestUtil#getDigestWithManualFallback falls back to computing the digest manually, implying that RAFS#getFastDigest returned null. 3. RAFS#stat() produces a FileStatus with a missing getNodeId() implementation. (3) implies that RAFS#statInMemory was successful, while (2) implies that it wasn't. One possibility is that the file in question is a symlink, so getFastDigest fails to retrieve the metadata for the symlink itself, while stat() follows the symlink and successfully returns the metadata for its target. PiperOrigin-RevId: 583987445 Change-Id: I65e586ea84635a279208e24c421f54ae46ee21b8
@rsalvador I have a tentative fix, but some guesswork is involved and I'm not sure it's the right one. Would it be possible for you to build a custom Bazel from https://github.com/tjgq/bazel/tree/execlog-digest-crash-fix (clone and checkout that brach, then run |
@tjgq that fixed it, thx! |
@rsalvador Thanks for confirming! I've sent the patch for internal review. For future me: this can also be reproed with |
The methods are documented as such in FileSystem. If we don't do this, there will be a discrepancy between getFastDigest and stat, as the latter can follow symlinks. This can manifest as a crash (see bazelbuild#20246) as the digest computation will take the missing fast digest for a symlink as a signal to compute the digest manually; this would fail when the symlink target is an in-memory file, which doesn't have an associated inode as required to compute the cache key (see DigestUtils#manuallyComputeDigest). Fixes bazelbuild#20246. PiperOrigin-RevId: 584297990 Change-Id: I65e586ea84635a279208e24c421f54ae46ee21b8
…c3 to 7.0.0rc4 (#20278) The methods are documented as such in FileSystem. If we don't do this, there will be a discrepancy between getFastDigest and stat, as the latter can follow symlinks. This can manifest as a crash (see #20246) as the digest computation will take the missing fast digest for a symlink as a signal to compute the digest manually; this would fail when the symlink target is an in-memory file, which doesn't have an associated inode as required to compute the cache key (see DigestUtils#manuallyComputeDigest). Fixes #20246. Commit aab19f7 PiperOrigin-RevId: 584297990 Change-Id: I65e586ea84635a279208e24c421f54ae46ee21b8 Co-authored-by: Googler <[email protected]>
Description of the bug:
Bazel 7.0.0rc3 was building without problems our big monorepo, but 7.0.0rc4 fails with errors, e.g.:
and
the first error we can probably fix in the generator, but the internal error may point to some regression?
The internal error goes away if we don't use the
--execution_log_json_file
and--noexecution_log_sort
flags.Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
We can't provide a minimal example, hit happens deep into the build of a very big monorepo.
Which operating system are you running Bazel on?
MacOS Version 14.1
What is the output of
bazel info release
?release 7.0.0rc4
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
Yes, it is a regression.
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: