executor: fix Windows blocking on pipe close #4400

schmichael · 2018-06-08T21:51:52Z

Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into
debug mode. Closing its output pipe at that point will block
indefinitely and prevent the process from being killed by Nomad.

See the upstream powershell issue for details:
PowerShell/PowerShell#4254

Sample executor logging when the blocking behavior is hit:

2018/06/08 21:56:32.242562 [INFO] executor: launching command powershell.exe -Command echo Hello; sleep 1000000
2018/06/08 21:56:43.063844 [INFO] executor: sent Ctrl-Break to process 8044
2018/06/08 21:56:52.068681 [WARN] executor: timed out waiting for read-side of process output pipe to close
2018/06/08 21:56:56.070628 [WARN] executor: timed out waiting for read-side of process output pipe to close

And the process is force killed properly.

This PR fixes a bug introduced by the combination of #4153, #4336, and PowerShell/PowerShell#4254

Sending the Ctrl-Break signal to PowerShell <6 causes it to drop into debug mode. Closing its output pipe at that point will block indefinitely and prevent the process from being killed by Nomad. See the upstream powershell issue for details: PowerShell/PowerShell#4254

schmichael · 2018-06-08T21:52:23Z

client/driver/executor/executor.go

 	l.rotatorWriter.Close()
-	return err


This error was never being checked, so there's no harm in elliding it in favor of logging errors directly from this method.

preetapan · 2018-06-08T22:02:38Z

client/driver/executor/executor.go

+	go func() {
+		defer close(closeDone)
+		err := l.processOutReader.Close()
+		if err != nil && !strings.Contains(err.Error(), "file already closed") {


Is this logging meant to be temporary?

Probably not? I don't think there's any way to know when it would be safe to remove.

The contains check is just to prevent spamming the logs since we close the pipe multiple times. We could probably try to fix that, but I'm not sure it's worth the effort as there's no harm in multiple Closes.

What value is logging the error giving if we don't change the outcome in the call site anyway by ignoring the error

Honestly "file already closed" is the only error I've ever seen returned from Close(), so I don't expect to ever see this code hit.

I put it in in case it helped debug future issues similar to this. I have never seen anything like this behavior before (eg the blocking on Close), so I have little idea where extra logging might be helpful in the future or just noise.

preetapan · 2018-06-11T16:12:02Z

client/driver/executor/executor.go

 	// Wait up to the close tolerance before we force close
 	select {
 	case <-l.hasFinishedCopied:
 	case <-time.After(processOutputCloseTolerance):
 	}
-	err := l.processOutReader.Close()
+
+	// Closing the read side of a pipe may block on Windows if the process


We added this close for processOutreader in 4150296 to fix this failing test TestExecutor_Start_Wait. Removing this entirely doesn't affect that test (I tested with -count 50 locally). Wondering if we can remove that l.processOutReader.Close() line 893 entirely. That would simplify all this even more.

That test was failing because without the channel blocked wait on hasFinishedCopied, a very short lived command would never get its standard output read and stored in the log file. Now that we added a wait on that channel and a grace period of 2 seconds, I don't see why we need to call close on the processOutreader again in line 893.

My understanding of line 893 (now 890) is that it is to propagate errors from the io.Copy destination (rotator) back to the source (processOutReader).

So if io.Copy fails to write, it needs to signal to processOutReader that i will never be read again by Close()ing it.

I think this will handle cases like running-out-of-disk where we can no longer write anything, so we signal that to the process by closing its output (and likely causing the process to crash).

github-actions · 2023-03-02T02:20:09Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

schmichael commented Jun 8, 2018

View reviewed changes

executor: fix log line formatting

d0bec72

preetapan reviewed Jun 8, 2018

View reviewed changes

preetapan reviewed Jun 11, 2018

View reviewed changes

schmichael merged commit c0507cb into master Jun 11, 2018

schmichael deleted the b-fix-powershell-shutdown branch June 11, 2018 18:05

github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: fix Windows blocking on pipe close #4400

executor: fix Windows blocking on pipe close #4400

schmichael commented Jun 8, 2018 •

edited

Loading

schmichael Jun 8, 2018

preetapan Jun 8, 2018

schmichael Jun 8, 2018

preetapan Jun 11, 2018

schmichael Jun 11, 2018

preetapan Jun 11, 2018 •

edited

Loading

schmichael Jun 11, 2018

github-actions bot commented Mar 2, 2023

executor: fix Windows blocking on pipe close #4400

executor: fix Windows blocking on pipe close #4400

Conversation

schmichael commented Jun 8, 2018 • edited Loading

schmichael Jun 8, 2018

Choose a reason for hiding this comment

preetapan Jun 8, 2018

Choose a reason for hiding this comment

schmichael Jun 8, 2018

Choose a reason for hiding this comment

preetapan Jun 11, 2018

Choose a reason for hiding this comment

schmichael Jun 11, 2018

Choose a reason for hiding this comment

preetapan Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

schmichael Jun 11, 2018

Choose a reason for hiding this comment

github-actions bot commented Mar 2, 2023

schmichael commented Jun 8, 2018 •

edited

Loading

preetapan Jun 11, 2018 •

edited

Loading