Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel crashes with InterruptedException when certain benign errors happen #14787

Closed
wisechengyi opened this issue Feb 10, 2022 · 12 comments
Closed
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@wisechengyi
Copy link
Contributor

wisechengyi commented Feb 10, 2022

Slack discussion: https://bazelbuild.slack.com/archives/CA31HN1T3/p1644524174413049

Description of the problem:

bazel error with piped

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

% bazel query tools/... | head
Loading: 0 packages loaded
//tools:bzl_srcs
//tools:embedded_tools_srcs
//tools:srcs
//tools:upload_bazel_mirror
//tools/allowlists:srcs
//tools/allowlists/config_feature_flag:srcs
//tools/allowlists/function_transition_allowlist:srcs
//tools/android:aar_embedded_jars_extractor
//tools/android:aar_embedded_jars_extractor_test
//tools/android:aar_embedded_proguard_extractor

Cannot write to standard output; exiting...

Loading: 0 packages loaded
Loading: 0 packages loaded

What operating system are you running Bazel on?

macos, but also observed similar issue on linux

$ ./bazel query tools/... | head
INFO: Invocation ID: 8fed028a-18a6-4a6c-a0b2-5a0dcfed396b
Loading: 0 packages loaded
...

Cannot write to standard output; exiting...

Loading: 0 packages loaded
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: java.lang.InterruptedException: DefaultPromise@502c56e6(incomplete)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:454)
	at com.google.devtools.build.lib.remote.http.HttpCacheClient.close(HttpCacheClient.java:736)
	at com.google.devtools.build.lib.remote.disk.DiskAndRemoteCacheClient.close(DiskAndRemoteCacheClient.java:71)
	at com.google.devtools.build.lib.remote.RemoteCache.deallocate(RemoteCache.java:454)
	at io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86)
	at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
	at com.google.devtools.build.lib.remote.RemoteActionContextProvider.afterCommand(RemoteActionContextProvider.java:217)
	at com.google.devtools.build.lib.remote.RemoteModule.afterCommand(RemoteModule.java:862)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:626)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:609)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:231)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:579)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.InterruptedException: DefaultPromise@502c56e6(incomplete)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:244)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:35)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:451)
	... 16 more

What's the output of bazel info release?

release 5.0.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

Replace this line with your answer.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

% git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
https://github.com/bazelbuild/bazel
1deb3f7aa22ec322786360085f9eb723e624d7c7
1deb3f7aa22ec322786360085f9eb723e624d7c7

Have you found anything relevant by searching the web?

None yet

Other info

writing to a file works.

% bazel query tools/... > xxx && wc -l xxx
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
     282 xxx
% echo $?
0
@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Feb 10, 2022
@brentleyjones
Copy link
Contributor

Does this work in 4.2.2?

@wisechengyi
Copy link
Contributor Author

wisechengyi commented Feb 10, 2022

was on 4.2.1, and it seems to have this issue as well.

$ ./bazel version
Build label: 4.2.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Aug 30 15:17:47 2021 (1630336667)
Build timestamp: 1630336667
Build timestamp as int: 1630336667
$ ./bazel query tools/... | head
INFO: Invocation ID: 5793bfb8-1943-4b50-8ac3-d200bd81902c
Loading: 0 packages loaded
//tools/transitive-deps-updater/src/test/scala/com/twitter/transitiveDepsUpdater:tests
//tools/transitive-deps-updater/src/test/resources:resources
//tools/transitive-deps-updater/src/test/resources:resources-scala
//tools/transitive-deps-updater/src/test/resources:resources-py
//tools/transitive-deps-updater/src/main/scala/com/twitter/transitiveDepsUpdater:transitiveDepsUpdater
//tools/transitive-deps-updater/src/main/scala/com/twitter/transitiveDepsUpdater:transitiveDepsUpdater.semanticdb
//tools/toolchains:testing_toolchains_with_all_deps_impl
//tools/toolchains:testing_toolchain
//tools/toolchains:specs2_junit_toolchain_impl
//tools/toolchains:specs2_junit_toolchain

Cannot write to standard output; exiting...

Loading: 0 packages loaded
INFO: Build Event Protocol files produced successfully.
Loading: 0 packages loaded

however exit code seems to be 0.

@ulfjack
Copy link
Contributor

ulfjack commented Feb 10, 2022

Bazel writes progress and error messages to stderr while writing 'normal' query output to stdout. That looks weird when you redirect stdout (and keep stderr), but it is working as expected!

Try running it like so:
bazel query --noshow_progress tools/... | head

Or like so:
bazel query tools/... 2>/dev/null | head

Or like so (if you want to see the progress messages):
bazel query tools/... 2>&1 | head

This is 'standard' Unix behavior, although it is a bit unusual for a tool to generate that much stderr if it runs successfully.

@wisechengyi
Copy link
Contributor Author

wisechengyi commented Feb 10, 2022

Ok to clarify, the issue isn't

Cannot write to standard output; exiting...

but the exception which then causes bazel to crash:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: java.lang.InterruptedException: DefaultPromise@6fac5e26(incomplete)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:454)
	at com.google.devtools.build.lib.remote.http.HttpCacheClient.close(HttpCacheClient.java:736)
	at com.google.devtools.build.lib.remote.disk.DiskAndRemoteCacheClient.close(DiskAndRemoteCacheClient.java:71)
	at com.google.devtools.build.lib.remote.RemoteCache.deallocate(RemoteCache.java:454)
	at io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86)
	at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
	at com.google.devtools.build.lib.remote.RemoteActionContextProvider.afterCommand(RemoteAction
...

and then subsequent bazel invocation will have to start a new server.

So it seems to be remain as a serious issue.

And the issue did start with release 5.0.0

@Wyverald
Copy link
Member

Hmm. This might be a new thing in 5.0.0 after all. @coeuvre does this look like an issue with remote execution?

@eed3si9n
Copy link

I regularly run bazel query ... | head so initially I thought pipe was doing something wrong, but now I see that this also happens when I have a typo in a custom bzl file. (OP @wisechengyi and I work in the same team)

ERROR: /xxx/build_rules/targets/something_library.bzl:63:19: name 'merged' is not defined (did you mean 'merged0'?)

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: java.lang.InterruptedException: DefaultPromise@2eb04551(incomplete)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:454)
	at com.google.devtools.build.lib.remote.http.HttpCacheClient.close(HttpCacheClient.java:736)
	at com.google.devtools.build.lib.remote.disk.DiskAndRemoteCacheClient.close(DiskAndRemoteCacheClient.java:71)
	at com.google.devtools.build.lib.remote.RemoteCache.deallocate(RemoteCache.java:454)
	at io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86)
	at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
	at com.google.devtools.build.lib.remote.RemoteActionContextProvider.afterCommand(RemoteActionContextProvider.java:217)
	at com.google.devtools.build.lib.remote.RemoteModule.afterCommand(RemoteModule.java:862)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:626)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:619)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:231)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:579)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.InterruptedException: DefaultPromise@2eb04551(incomplete)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:244)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:35)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:451)
	... 16 more

Maybe this is more to do with some deferred finalizer running in a thread-backed future, and it needs to ignore some exceptions?

@Wyverald Wyverald added P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug under investigation labels Feb 11, 2022
@Wyverald Wyverald changed the title pipe breaking with bazel output release 5.0.0 Bazel crashes with InterruptedException when certain benign errors happen Feb 11, 2022
@wisechengyi
Copy link
Contributor Author

To add, unsurprisingly we are also seeing the error when interrupting bazel with ctrl-c.

% bazel build strato/...
Starting local Bazel server and connecting to it...
^C
Bazel caught interrupt signal; shutting down.

INFO: Invocation ID: 48473608-372c-4ac2-b1f1-fc41cf67fe5d
ERROR: command interrupted while syncing package loading
INFO: Invocation ID: 48473608-372c-4ac2-b1f1-fc41cf67fe5d
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: java.lang.InterruptedException: DefaultPromise@788c8811(incomplete)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:454)
	at com.google.devtools.build.lib.remote.http.HttpCacheClient.close(HttpCacheClient.java:736)
	at com.google.devtools.build.lib.remote.disk.DiskAndRemoteCacheClient.close(DiskAndRemoteCacheClient.java:71)
	at com.google.devtools.build.lib.remote.RemoteCache.deallocate(RemoteCache.java:454)
	at io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86)
	at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
	at com.google.devtools.build.lib.remote.RemoteActionContextProvider.afterCommand(RemoteActionContextProvider.java:217)
	at com.google.devtools.build.lib.remote.RemoteModule.afterCommand(RemoteModule.java:862)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:626)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:619)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:231)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:579)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.InterruptedException: DefaultPromise@788c8811(incomplete)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:244)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:35)
	at io.netty.channel.pool.FixedChannelPool.close(FixedChannelPool.java:451)
	... 16 more

@Wyverald
Copy link
Member

Wyverald commented Mar 2, 2022

@coeuvre could you take a look at this? If this is indeed a 5.0.0 regression, we could look at fixing it in 5.1.0.

@coeuvre
Copy link
Member

coeuvre commented Mar 2, 2022

Sorry for the late reply, somehow I missed this from my inbox. This is indeed a 5.0.0 regression and I will fix it in 5.1.0.

@coeuvre coeuvre added P1 I'll work on this now. (Assignee required) release blocker and removed P2 We'll consider working on this in future. (Assignee optional) under investigation potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone labels Mar 2, 2022
@Wyverald
Copy link
Member

Wyverald commented Mar 2, 2022

@bazel-io fork 5.1

brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Mar 8, 2022
Also cancels tasks submitted during `afterCommand` if interrupted.

Fixes bazelbuild#14787.

Closes bazelbuild#14992.

PiperOrigin-RevId: 433205726
(cherry picked from commit a73aa12)
Wyverald pushed a commit that referenced this issue Mar 8, 2022
…14999)

Also cancels tasks submitted during `afterCommand` if interrupted.

Fixes #14787.

Closes #14992.

PiperOrigin-RevId: 433205726
(cherry picked from commit a73aa12)

Co-authored-by: Chi Wang <[email protected]>
@coryan
Copy link

coryan commented Jul 12, 2024

I am seeing this problem with Bazel 7.2.1:

bazel version
Bazelisk version: v1.18.0
Build label: 7.2.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Jun 25 15:53:05 2024 (1719330785)
Build timestamp: 1719330785
Build timestamp as int: 1719330785

Then you get:

$ bazel query --noshow_progress "..." | head -2
//:accessapproval
//:accessapproval_mocks

Cannot write to standard output; exiting...
$ echo $?
0

However, if your script has set -o pipefail:

$ set -o pipefail

$ bazel query --noshow_progress "..." | head -2
//:accessapproval
//:accessapproval_mocks

Cannot write to standard output; exiting...

$ echo $?
255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants