-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for wrapping system streams in WorkRequestHandler #14201
Add support for wrapping system streams in WorkRequestHandler #14201
Conversation
@larsrc-google mind taking a look at this? |
|
||
private final InputStream originalInputStream; | ||
private final PrintStream originalOutputStream; | ||
private final PrintStream originalErrorStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping System.err
isn't actually be necessary for a wrapper worker io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Also System.in
is already handled properly. Only System.out
needs to be wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also System.in is already handled properly. Only System.out needs to be wrapped.
The motivation behind wrapping all of the system streams in a WorkerIO
class is that there is a clear and consistent place to access the root system streams while setting up the work request handler. For System.in
and System.err
we can just return the original streams when calling getOriginalInputStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further reflection, what you do makes sense, but not in the way you use it in the example. Instead of wrapping the whole WorkRequestHandler, wrapping the inside of the WorkRequestCallback would be clearer. You wouldn't even need to have the accessors to the original streams. That prevents the tool from attempting to read from stdin and corrupting the communication.
That change would also mean that the WorkerIO doesn't need to reset the buffer, as it'll be thrown away once a request is done. Plus in case processRequest
in your example throws an exception the WorkerIO will be closed properly.
A definite benefit of either approach is that stdout and stderr get merged into a single stream, that will be easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapping the inside of the WorkRequestCallback would be clearer
By this do you mean calling capture
from within the implementation of WorkRequestHandler
, or in the lambda that gets passed into the WorkRequestHandler
? I like idea of this being hidden inside the WorkRequestCallback
because we hide WorkerIO#readCapturedAsUtf8String
entirely which is kind of an awkward API to need to expose.
That change would also mean that the WorkerIO doesn't need to reset the
Would each work request call WorkerIO#capture
swapping in it's own buffer?
A definite benefit of either approach is that stdout and stderr get merged into a single stream, that will be easier to understand.
Wrapping the stderr
could be a bit error-prone because if there's ever an unrecoverable error that causes the worker to not send a work response, you just won't see an exception come out of the worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does your current solution deal with multiplex workers?
@larsrc-google I took another pass at building this into the WorkRequestHandler
so that it's a bit more transparent, and with the multiplex worker cases in mind. The new approach uses the existing WorkerIO
implementation, but wraps the system streams when you call WorkRequestHandler#processRequests
.
The WorkerIO
instance gets passed down into respondToRequest
so that there is a single source of truth for the wrapped streams, and so that we don't have to deal with multiple WorkerIO
instances being created inside of respondToRequest
which doesn't really make sense in a multiplex-world where respondToRequest
may get called concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I've been so focused on the implementation details that I forgot the larger problem: Redirecting the stderr messages into the output
field of the response changes semantics. The output
field is for shorter messages to be displayed to the user, stderr
goes into the logs. See https://docs.bazel.build/versions/main/creating-workers.html#work-responses. It's known that multiplex workers only have one log between them, but the messages shown to the user should not interleave.
And of course, since the multiplex threads still share one System.err
instance regardless of what we set, we can't possibly disentangle that. But we can still fix the main issue of stdout getting corrupted by setting it to the same stream as stderr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure i'm understanding next steps here - You'd prefer that we avoid capturing the tool output streams and instead just swap the System.err
in for System.out
to not pollute the work response output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Having System.err and System.out go to the same place (the logs) is good. Changing the work response output would cause a lot of noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the changes for swapping System.err
in for System.out
and updated the tests.
Sorry for not reacting earlier, need to update mail filters. This is a good thing to do, something I have considered but not gotten around to yet. One question: Why an optional wrapper? Why not just make it part of WorkRequestHandler? Can you imagine a case where a user of WorkRequestHandler would not want to redirect stdout? |
No reason other than it felt easier to reason about merging this into Bazel as an isolated tool first, instead of making changes to
I can't, although the java workers seem to be managing fine without wrapped system streams. |
04ac34c
to
cc18fc2
Compare
@larsrc-google is there a path forward for merging some form of this? |
901c960
to
2856b3b
Compare
2856b3b
to
008280a
Compare
|
||
// Redirect System.out to System.err | ||
System.setOut(originalErrorStream); | ||
System.setErr(originalErrorStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a lot simpler. I don't see any reason to do anything to System.err, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
9a11aa4
to
911ea81
Compare
@@ -311,6 +315,8 @@ public WorkRequestHandler build() { | |||
* returns. If {@code in} reaches EOF, it also returns. | |||
*/ | |||
public void processRequests() throws IOException { | |||
WorkerIO workerIO = WorkerIO.redirectSystemStreams(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at #14623, you're not the only one with this idea. You might be right in making it optional, as some workers do their own handling. Fortunately that could easily be done by adding a method to the builder.
911ea81
to
fdc6e07
Compare
3da7148
to
6e5c7cb
Compare
Any updates on this? |
Sorry for the silence, I was out sick for quite a while. I believe being optional is not as big a deal as I indicated, so if you can fix the conflicts, we should be able to merge this. |
6e5c7cb
to
2d3689f
Compare
@larsrc-google I have this a rebase. Please give it another once over to make sure that the conflicts got resolved correctly against the new changes that were in master. |
There are often [places](https://github.com/bazelbuild/bazel/blob/ea19c17075478092eb77580e6d3825d480126d3a/src/tools/android/java/com/google/devtools/build/android/ResourceProcessorBusyBox.java#L188) where persistent workers need to swap out the standard system streams to avoid tools poisoning the worker communication streams by writing logs/exceptions to it. This pull request extracts that pattern into an optional WorkerIO wrapper can be used to swap in and out the standard streams without the added boilerplate. Closes #14201. PiperOrigin-RevId: 498983983 Change-Id: Iefb956d38a5887d9e5bbf0821551eb0efa14fce9
@bazel-io flag |
(Above was from a discussion with @larsrc-google: #17448 (comment)) |
@bazel-io fork 6.1.0 |
This pull request provides a custom `DiagnosticsHandler` implementation that redirects info and warning messages to the original `System#err` stream so that the noisy D8 messages are written to the worker log file instead of directly to the console. In addition this PR also removes the custom system stream redirection, and instead uses the baked in `WorkerIO` implementation that automatically wraps the system stream for us that was added here #14201. #17647 Closes #17648. PiperOrigin-RevId: 560600558 Change-Id: I9d2d162b62fe5dfd00e8614ba73caf1cac5f02d7
There are often places where persistent workers need to swap out the standard system streams to avoid tools poisoning the worker communication streams by writing logs/exceptions to it.
This pull request extracts that pattern into an optional WorkerIO wrapper can be used to swap in and out the standard streams without the added boilerplate.