-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: fix Windows blocking on pipe close #4400
Conversation
Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into debug mode. Closing its output pipe at that point will block indefinitely and prevent the process from being killed by Nomad. See the upstream powershell issue for details: PowerShell/PowerShell#4254
l.rotatorWriter.Close() | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error was never being checked, so there's no harm in elliding it in favor of logging errors directly from this method.
go func() { | ||
defer close(closeDone) | ||
err := l.processOutReader.Close() | ||
if err != nil && !strings.Contains(err.Error(), "file already closed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logging meant to be temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not? I don't think there's any way to know when it would be safe to remove.
The contains check is just to prevent spamming the logs since we close the pipe multiple times. We could probably try to fix that, but I'm not sure it's worth the effort as there's no harm in multiple Closes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value is logging the error giving if we don't change the outcome in the call site anyway by ignoring the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly "file already closed" is the only error I've ever seen returned from Close()
, so I don't expect to ever see this code hit.
I put it in in case it helped debug future issues similar to this. I have never seen anything like this behavior before (eg the blocking on Close), so I have little idea where extra logging might be helpful in the future or just noise.
// Wait up to the close tolerance before we force close | ||
select { | ||
case <-l.hasFinishedCopied: | ||
case <-time.After(processOutputCloseTolerance): | ||
} | ||
err := l.processOutReader.Close() | ||
|
||
// Closing the read side of a pipe may block on Windows if the process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added this close for processOutreader in 4150296 to fix this failing test TestExecutor_Start_Wait. Removing this entirely doesn't affect that test (I tested with -count 50 locally). Wondering if we can remove that l.processOutReader.Close()
line 893 entirely. That would simplify all this even more.
That test was failing because without the channel blocked wait on hasFinishedCopied
, a very short lived command would never get its standard output read and stored in the log file. Now that we added a wait on that channel and a grace period of 2 seconds, I don't see why we need to call close on the processOutreader again in line 893.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of line 893 (now 890) is that it is to propagate errors from the io.Copy destination (rotator
) back to the source (processOutReader
).
So if io.Copy
fails to write, it needs to signal to processOutReader
that i will never be read again by Close()
ing it.
I think this will handle cases like running-out-of-disk where we can no longer write anything, so we signal that to the process by closing its output (and likely causing the process to crash).
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into
debug mode. Closing its output pipe at that point will block
indefinitely and prevent the process from being killed by Nomad.
See the upstream powershell issue for details:
PowerShell/PowerShell#4254
Sample executor logging when the blocking behavior is hit:
And the process is force killed properly.
This PR fixes a bug introduced by the combination of #4153, #4336, and PowerShell/PowerShell#4254