-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix job support #168
Fix job support #168
Conversation
Previously we would rely on the exit This fixes a few nasty bugs: * System.Process.waitForProcess failed to keep the ProcessHandle's MVar alive, potentially resulting in the finalizer being run while waitForJobCompletion is executing. This would cause the process handles to be closed, resulting in waitForJobCompletion to fail. * waitForProcess failed to explicitly close the job, process, and IOCP handles. * waitForProcess failed to stop delegation of Ctrl-C in processes using jobs.
d55ca28
to
59ec311
Compare
I believe this is ready for review. To test this I booted GHC and then used the GHC to boot another GHC, this time with changes in |
Ugh, Appveyor cancelled the job as I accidentally closed the MR. I guess I'll have to push a new commit. |
Previously in order to wait on a job object we would create an IO Completion Port, configure the job object to emit notifications to it with SetInformationJobObject, and wait for JOB_OBJECT_MSG_EXIT_PROCESS notifications until all processes have died. This followed one piece of guidance from Microsoft [1] but according to Microsoft's own documentation, this cannot work reliably as delivery of job notifications is not guaranteed [2]. I have seen cases where the processes hang waiting on job objects so I can only guess that message loss is indeed possible. Instead we now take a simpler approach: look at the processes in the job, if there are none then we are done. If there are still processes, choose one and wait for it to finish. Iterate until the job is empty. Credit for this approach goes to Davean Scies. [1] https://devblogs.microsoft.com/oldnewthing/20130405-00/?p=4743 [2] https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_associate_completion_port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @bgamari, everything I found suggest you need to handle the ERROR_MORE_DATA
case.
I've pushed an additional patch to implement support for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks. This looks good to me!
@bgamari Can you indicate whether this should be a major, minor, or patch revision, and add a ChangeLog entry? |
I have added a changelog entry. I suspect that a minor version bump is appropriate; this definitely fixes a major bug but the types and semantics of the library's exports have not changed. |
Thanks! |
But they did change. You dropped a parameter from |
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
@Mistuke, AFAIK |
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
Many toolchain tools written for POSIX systems rely on the exec system call. Unfortunately, it is not possible to implement `exec` in a POSIX-compliant manner on Windows. In particular, the semantics of the `exec` implementation provided by the widely-used `msvcrt` C runtime will cause process's waiting on the `exec`'ing process to incorrectly conclude that the process has successfully terminated when in fact it is still running in another process. For this reason, the `process` library exposes the `use_process_jobs` flag to use a more strict (although still not POSIX-compliant) mechanism for tracking process completion. This is explained in this comment [2]. Unfortunately, job support in the `process` library is currently quite broken and was only recently fixed [1]. Consequently, we only enable job object support for process releases >= 1.6.8. [1] haskell/process#168 [2] https://github.com/haskell/process/blob/master/System/Process.hs#L399
This fixes the incorrect termination behavior described in #167. The path to get here was a bit meandering.
The "Check exit code of processes using jobs in two stages" commit refactors
waitForProcess
to ensure that theProcessHandle
'sMVar
remains reachable throughout the blocking inwaitForJobCompletion
. It also updates theMVar
after job completion and closes the OS handles to allow the process's resources to be freed.However, I then realized that the Win32 documentation explicitly states that the delivery of job IOCP notifications is not guaranteed. This is strange and appears to contradict the guidance given by this blog post but I have nevertheless observed hung processes that would seem to confirm the lossy behavior.
In light of this, the "Refactor waiting on job objects" commit refactors the job waiting logic to avoid IOCPs entirely. We now rather query the job for a list of its constituent processes, choose one to block on, wait until it has terminated, and iterate until we find that there are no processes left.
Additionally, we give up on trying to choose which process to take the exit code from; the probability that this heuristic works is simply too far from 1 and it otherwise does nothing but contribute non-determinism to the operation.