-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically retry the build if encountered remote cache eviction error #17711
Conversation
But only for the blobs that were detected as evicted, right? If other blobs were evicted (which is probably the case because of TTL being reached), then a re-run could fail on similar blobs. If my understanding is correct, I'm currently recommending people perform |
True for 6.1. But with #17639 which I couldn't get into 6.1, Bazel knows when blobs will be expired and will discard them in advance. I don't know whether it is good to always discard all remote metadata when remote cache eviction happens. |
I've been running into the issue I laid out, now that I'm on 6.1. I once had to restart the build 6 times to get it to finally complete. Until we have the ttl based discarding, maybe it's best to discard all of it? I'm not sure either. The above behavior though is influencing my current recommendations for others to do just that. |
Is this related to what was discussed Jan 27 in the review of design doc with the following comments?
|
Thinking more, I agree. We should remote ALL metadata before the restart, so the next build is guaranteed to work. |
I probably won't have the chance to change that for 6.1. For later releases, we will have TTL based discarding.
Yes, i agree. The reason I didn't do that before was because of performance: we could still hit skyframe cache / local action cache if we keep other remote metadata. However, if remote cache evicted blobs, it's probably means it is under high load and Bazel won't know what else it evicted. Following restarts could still fail for the same reason. Additionally, with TTL based discarding, we can avoid most of these errors. So the performance penalty in the real remote cache eviction case should be acceptable. I will draft a PR for that. |
b566619
to
1a43dcb
Compare
Thanks @coeuvre! |
f997f5b
to
7795c48
Compare
7795c48
to
470424c
Compare
@bazel-io flag |
@bazel-io fork 6.2.0 |
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
…ror (#18171) With #17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With #17462 and #17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with #17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of #16660. Closes #17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
Hey @coeuvre, we do have users who prefer to set the invocation id manually so that they would be able to know the build URL in BES before triggering the build. This helps our users integrate the BES/RBE system against their internal CI system. In those cases, the UUID is generated externally and passed to Bazel with Now that For a simple fix, I would suggest document the anti-affinity between I don't think automatically retry with a new UUID is desirable for this use case as users would prefer to know the invocation ID for each retries to integrate with their system. A more complicated fix is to make WDYT? |
I agree if you set The fix you described sounds plausible, but I am not sure how well it can integrate with other tools/systems. |
Another thing I found today was that this field should be increased on retry attempt
Currently we are setting it to 1 always https://cs.opensource.google/bazel/bazel/+/a767fe78e8ab28c685c8ec580ba4dd47689b05cb:src/main/java/com/google/devtools/build/lib/buildeventservice/BuildEventServiceProtoUtil.java;l=89-98 |
With bazelbuild#17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With bazelbuild#17462 and bazelbuild#17747, Bazel is able to continue the build without bazel clean or bazel shutdown. However, even with bazelbuild#17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases. Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if `--experimental_remote_cache_eviction_retries` is set, or **build rewinding**. ``` $ bazel build --experimental_remote_cache_eviction_retries=5 ... INFO: Invocation ID: b7348bfa-9446-4c72-a888-0a0ad012f225 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt ERROR: .../workspace/a/BUILD:8:8: Executing genrule //a:bar failed: Failed to fetch blobs because they do not exist remotely: Missing digest: b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/4 Target //a:bar failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 0.447s, Critical Path: 0.05s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully Found remote cache eviction error, retrying the build... INFO: Invocation ID: 983f60dc-8bb9-4b82-aa33-a378469ce140 Loading: Loading: Loading: 0 packages loaded Analyzing: target //a:bar (0 packages loaded, 0 targets configured) INFO: Analyzed target //a:bar (0 packages loaded, 0 targets configured). INFO: Found 1 target... [0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt Target //a:bar up-to-date: bazel-bin/a/bar.out INFO: Elapsed time: 0.866s, Critical Path: 0.35s INFO: 3 processes: 1 internal, 1 processwrapper-sandbox, 1 remote. INFO: Build completed successfully, 3 total actions $ ``` Part of bazelbuild#16660. Closes bazelbuild#17711. PiperOrigin-RevId: 520610524 Change-Id: I20d43d1968767a03250b9c8f8a6dda4e056d4f52
Created #18591 to address #17711 (comment). |
With #17358, Bazel will exit with code 39 if remote cache evicts blobs during the build. With #17462 and #17747, Bazel is able to continue the build without bazel clean or bazel shutdown.
However, even with #17639 and following changes to extend the lease, remote cache can still evict blobs in some rare cases.
Based on above changes, this PR makes bazel retry the invocation if it encountered the remote cache eviction error during previous invocation if
--experimental_remote_cache_eviction_retries
is set, or build rewinding.Part of #16660.