Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Spawn Runner throws "Worker process did not return a WorkResponse" with empty error message #11948

Open
ThomasCJY opened this issue Aug 13, 2020 · 8 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@ThomasCJY
Copy link
Contributor

Background
This is a follow-up for #11691. Previously, after enabling spawn worker for android resource merging actions, the error logs cannot be correctly displayed in console so I made this change to fix this issue. However, a recent update which is included in the 3.1.0+ bazel releases introduced another regression where all the std output and std err has been hijacked for spawn workers and worker response will always be null. This can be reproduced on this test branch:

Test branch (link)

Repro steps

  1. bazelisk build //src/test:main_activity_test check the output.
    • default version is 3.4.1, where logs will be empty
  2. update bazel version to be 3.0.0. Retry step 1. See the error output is different where stdout is printed.
  3. Remove build --config=android_workers from .bazelrc, retry step 1. Check the correct error message.

The output for step 1 looks like this, where both stdout and stderr are empty.

ERROR: /Users/jchen/01_Code/rules_jvm_external/examples/android_local_test/src/test/BUILD:6:19 Merging manifest for //src/test:main_activity_test failed: Worker process did not return a WorkResponse:

---8<---8<--- Start of log, file at /private/var/tmp/_bazel_jchen/0f8337efc7290202461be411a0f2bc48/bazel-workers/worker-7-ManifestMerger.log ---8<---8<---
(empty)
---8<---8<--- End of log ---8<---8<---

The output for step 2 will have stdout. We can apply #11691 to show the error log.

---8<---8<--- Start of log ---8<---8<---
Warning: 
See http://g.co/androidstudio/manifest-merger for more information about the manifest merger.
anifest merger.
---8<---8<--- Start of log ---8<---8<---

Ideally, after applying #11691 and fixed the regression, we would get:

---8<---8<--- Start of log ---8<---8<---
Error: /private/var/tmp/_bazel_jchen/0f8337efc7290202461be411a0f2bc48/sandbox/darwin-sandbox/83/execroot/__main__/src/test/AndroidManifest.xml:6:5-74 Error:
        uses-sdk:minSdkVersion 11 cannot be smaller than version 16 declared in library [@maven//:com_google_android_gms_play_services_ads_base] /var/folders/89/gngzz1cn74n0yt9fxth8czhw0000gp/T/manifest_merge_tmp345140065745544520/bazel-out/darwin-fastbuild/bin/external/maven/com_google_android_gms_play_services_ads_base_processed_manifest/AndroidManifest.xml
        Suggestion: use tools:overrideLibrary="com.google.android.gms.ads_base" to force usage
Aug 13, 2020 2:58:13 PM com.google.devtools.build.android.ManifestMergerAction main
SEVERE: Error during merging manifests
com.google.devtools.build.android.AndroidManifestProcessor$ManifestProcessingException: Manifest merger failed : uses-sdk:minSdkVersion 11 cannot be smaller than version 16 declared in library [@maven//:com_google_android_gms_play_services_ads_base] /var/folders/89/gngzz1cn74n0yt9fxth8czhw0000gp/T/manifest_merge_tmp345140065745544520/bazel-out/darwin-fastbuild/bin/external/maven/com_google_android_gms_play_services_ads_base_processed_manifest/AndroidManifest.xml
        Suggestion: use tools:overrideLibrary="com.google.android.gms.ads_base" to force usage
        at com.google.devtools.build.android.AndroidManifestProcessor.mergeManifest(AndroidManifestProcessor.java:186)
        at com.google.devtools.build.android.ManifestMergerAction.main(ManifestMergerAction.java:217)
        at com.google.devtools.build.android.ResourceProcessorBusyBox$Tool$5.call(ResourceProcessorBusyBox.java:93)
        at com.google.devtools.build.android.ResourceProcessorBusyBox.processRequest(ResourceProcessorBusyBox.java:233)
        at com.google.devtools.build.android.ResourceProcessorBusyBox.main(ResourceProcessorBusyBox.java:177)
---8<---8<--- Start of log ---8<---8<---
@meisterT
Copy link
Member

cc @larsrc-google

@nkoroste
Copy link
Contributor

This is super helpful for debugging, thanks @ThomasCJY

@gregestren gregestren added team-Local-Exec Issues and PRs for the Execution (Local) team type: bug untriaged labels Aug 17, 2020
nkoroste pushed a commit to nkoroste/bazel that referenced this issue Aug 18, 2020
… issue

**Background**
https://jira.sc-corp.net/browse/APP-60122
More details see upstream ticket: bazelbuild#11948

This issue is a regression from Bazel 3.1.0+. This PR is a temp work around.
---
Automatic squash commit from https://github.sc-corp.net/Snapchat/bazel/pull/63
Cooled by jchen
@tetromino tetromino added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Aug 20, 2020
@tetromino
Copy link
Contributor

tetromino commented Aug 20, 2020

I can see two steps towards fixing this.

  1. Remove the stderr redirect to buffer in ResourceProcessorBusyBox, but keep the stdout redirect (because printing arbitrary error messages to stdout violates the worker protocol).
  2. Take better care to dump the buffer in the correct format on worker failure.

Would fix no. 1 by itself - which is quite simple - be sufficient for you?

@tetromino
Copy link
Contributor

cc @susinmotion

@ThomasCJY
Copy link
Contributor Author

@tetromino yeah fixing no.1 will be sufficient in the current stage. Thanks!

larsrc-google referenced this issue Jan 28, 2021
In certain circumstances the various android compilation tools could output to stdout while operating as a persistent worker which would lead to a corruption of the response protobuf and ultimately lead to a failed build.

This PR solves that problem by redirecting all stdout and stderr emitted by the worker to an in-memory buffer that is later attached to the proto response as an output. In this way, all future tools added to the tool wrapper can benefit from having their output logged rather than cause a build failure.

#10527
#9207

Closes #10911.

PiperOrigin-RevId: 307252644
@larsrc-google
Copy link
Contributor

https://stackoverflow.com/questions/3340342/writing-to-the-real-stdout-after-system-setout talks about how to properly redirect stdout, it looks like what the current code does is incorrect.

I like the idea of redirecting stdout, though, and will add it to WorkRequestHandler.

nkoroste pushed a commit to nkoroste/bazel that referenced this issue Feb 23, 2021
… issue

**Background**
https://jira.sc-corp.net/browse/APP-60122
More details see upstream ticket: bazelbuild#11948

This issue is a regression from Bazel 3.1.0+. This PR is a temp work around.
---
Automatic squash commit from https://github.sc-corp.net/Snapchat/bazel/pull/63
Cooled by jchen
@nkoroste
Copy link
Contributor

nkoroste commented May 4, 2022

I can't seem to repro this in Bazel 5.1.1 - not sure if anything changed?

@lukaciko
Copy link

lukaciko commented Jul 1, 2022

We still see this issue with Bazel 5.1.1. The issue goes away when not using a persistent worker. I can give more info if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

7 participants