Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix Windows blocking on pipe close #4400

Merged
merged 2 commits into from
Jun 11, 2018

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Jun 8, 2018

Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into
debug mode. Closing its output pipe at that point will block
indefinitely and prevent the process from being killed by Nomad.

See the upstream powershell issue for details:
PowerShell/PowerShell#4254

Sample executor logging when the blocking behavior is hit:

2018/06/08 21:56:32.242562 [INFO] executor: launching command powershell.exe -Command echo Hello; sleep 1000000
2018/06/08 21:56:43.063844 [INFO] executor: sent Ctrl-Break to process 8044
2018/06/08 21:56:52.068681 [WARN] executor: timed out waiting for read-side of process output pipe to close
2018/06/08 21:56:56.070628 [WARN] executor: timed out waiting for read-side of process output pipe to close

And the process is force killed properly.

This PR fixes a bug introduced by the combination of #4153, #4336, and PowerShell/PowerShell#4254

Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into
debug mode. Closing its output pipe at that point will block
indefinitely and prevent the process from being killed by Nomad.

See the upstream powershell issue for details:
PowerShell/PowerShell#4254
l.rotatorWriter.Close()
return err
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error was never being checked, so there's no harm in elliding it in favor of logging errors directly from this method.

go func() {
defer close(closeDone)
err := l.processOutReader.Close()
if err != nil && !strings.Contains(err.Error(), "file already closed") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logging meant to be temporary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not? I don't think there's any way to know when it would be safe to remove.

The contains check is just to prevent spamming the logs since we close the pipe multiple times. We could probably try to fix that, but I'm not sure it's worth the effort as there's no harm in multiple Closes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value is logging the error giving if we don't change the outcome in the call site anyway by ignoring the error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly "file already closed" is the only error I've ever seen returned from Close(), so I don't expect to ever see this code hit.

I put it in in case it helped debug future issues similar to this. I have never seen anything like this behavior before (eg the blocking on Close), so I have little idea where extra logging might be helpful in the future or just noise.

// Wait up to the close tolerance before we force close
select {
case <-l.hasFinishedCopied:
case <-time.After(processOutputCloseTolerance):
}
err := l.processOutReader.Close()

// Closing the read side of a pipe may block on Windows if the process
Copy link
Contributor

@preetapan preetapan Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added this close for processOutreader in 4150296 to fix this failing test TestExecutor_Start_Wait. Removing this entirely doesn't affect that test (I tested with -count 50 locally). Wondering if we can remove that l.processOutReader.Close() line 893 entirely. That would simplify all this even more.

That test was failing because without the channel blocked wait on hasFinishedCopied, a very short lived command would never get its standard output read and stored in the log file. Now that we added a wait on that channel and a grace period of 2 seconds, I don't see why we need to call close on the processOutreader again in line 893.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of line 893 (now 890) is that it is to propagate errors from the io.Copy destination (rotator) back to the source (processOutReader).

So if io.Copy fails to write, it needs to signal to processOutReader that i will never be read again by Close()ing it.

I think this will handle cases like running-out-of-disk where we can no longer write anything, so we signal that to the process by closing its output (and likely causing the process to crash).

@schmichael schmichael merged commit c0507cb into master Jun 11, 2018
@schmichael schmichael deleted the b-fix-powershell-shutdown branch June 11, 2018 18:05
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants