Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SIGTERM properly, on unix systems #7921

Merged
merged 5 commits into from
Mar 19, 2022
Merged

Conversation

robx
Copy link
Collaborator

@robx robx commented Jan 22, 2022

This make cabal-install reliably terminate its children before exiting when cabal-install itself is killed (fixes #7914), by:

  • replacing use of createProcess by withCreateProcess, which provides async-exception-safe subprocess cleanup
  • installing a signal handler which converts SIGTERM into an asynchronous exception that's thrown at the main thread

Some notes:


Please include the following checklist in your PR:

Please also shortly describe how you tested your change. Bonus points for added tests!

@robx robx mentioned this pull request Feb 18, 2022
3 tasks
@robx robx marked this pull request as draft February 22, 2022 19:53
@robx robx force-pushed the fix-processes branch 2 times, most recently from 8bd6208 to da1fa13 Compare February 23, 2022 00:45
@robx robx changed the title Handle SIGTERM "properly" (proof of concept) Handle SIGTERM "properly" Feb 23, 2022
@hasufell
Copy link
Member

cabal run and cabal exec should use execSpawn like stack does, see #7757 (comment) ...there's no reason to keep the parent process alive

@robx robx changed the title Handle SIGTERM "properly" Handle SIGTERM properly Feb 23, 2022
@robx robx marked this pull request as ready for review February 23, 2022 22:39
@robx
Copy link
Collaborator Author

robx commented Feb 23, 2022

I agree that it would make sense for cabal run and cabal exec to essentially exec(3) without forking. I think the change here is necessary regardless, though, because cabal shouldn't be leaving behind ghc processes or similar either when it is killed.

@robx robx force-pushed the fix-processes branch 2 times, most recently from d0c6338 to 09e7c77 Compare March 4, 2022 19:48
@jneira
Copy link
Member

jneira commented Mar 10, 2022

@robx I could give a try to this on windows, what build step could be more appropriate?

@robx
Copy link
Collaborator Author

robx commented Mar 10, 2022

@robx I could give a try to this on windows, what build step could be more appropriate?

Thanks, that'd be great!

The two three things I'd try if I had access to a Windows system would be:

  1. cabal run for some long-running executable, then kill it (but I'm not sure exactly what killing means on windows -- from process manager?), and see if that kills the child, too
  2. cabal build along the lines of what you tested for Fix concurrency/exception bugs in asyncFetchPackages #7929 (i.e. does interrupt work better now? I think it might)
  3. cabal build again, but this time "kill" the process, and see if it exits completely

Just 1 would be great already. More detail there:

$ cat Main.hs 
import Control.Concurrent (threadDelay)

main = do
  putStrLn "whoop"
  threadDelay 500000
  main
$ cat RunKill.cabal 
name: RunKill
version: 1.0
build-type: Simple
cabal-version: >= 1.10

executable exe
  default-language: Haskell2010
  build-depends: base
  main-is: Main.hs
$ cat cabal.project 
packages: .

Then on unixy systems, what I'd do is:

  • in one terminal, call cabal run RunKill
  • in another, I'd call ps | grep cabal, which gives me two processes: cabal run RunKill (pid 1000) and something like .../dist-newstyle/.../exe (pid 2000)
  • I'd call kill 1000 in the second terminal, and the first prints Terminated: 15 because cabal exits, but exe continues to run, printing whoop to the terminal, until I kill it explicitly with kill 2000

That's with cabal prior to this PR. After this PR, kill 1000 is enough to kill both processes, and output to the terminal stops. Ideally, we'd see the same behaviour on Windows.

@jneira
Copy link
Member

jneira commented Mar 11, 2022

(but I'm not sure exactly what killing means on windows -- from process manager?)

we can do some things in the shell in windows too :-P taskkill /pid in cmd or stop-process -pid in powershell

thanks for the tips, will follow them

@jneira
Copy link
Member

jneira commented Mar 11, 2022

Hi i ve got to build this pr and test it:

  1. Unfortunately killing the main cabal process did not kill the child one in my windows 7
D:\ws>tasklist | grep cabal
cabal.exe                     4244 Console                    1    61.516 KB
cabal-script-Delay.hs.exe     8112 Console                    1     4.088 KB

D:\ws>taskkill /pid 4244 /force  # it does not allow me kill it without forcing
Correcto: se terminó el proceso con PID 4244.

D:\ws>tasklist | grep cabal
cabal-script-Delay.hs.exe     8112 Console                    1     4.184 KB

In the shell running the script there is

whoop
whoop
whoop
           <------------ here the cabal kill
D:\ws\haskell\cabal>whoop  
whoop
whoop

@jneira
Copy link
Member

jneira commented Mar 11, 2022

  1. Good news, hitting ctrl+c has killed cabal immediately in all my tests in several steps of the build: downloading, building, etc

Sometimes it prints AsyncCancelled and sometims it does not, though

  1. killing the cabal process while building has worked fine in all my tests, trying to trigger it in different phases of the build (with ghc/cc/ld spawned)

@robx
Copy link
Collaborator Author

robx commented Mar 11, 2022

Thanks a lot! I've done a bit of digging, and it seems that signals as such don't exist meaningfully on Windows / mingw. Best reference I found is https://sourceforge.net/p/mingw/mailman/message/16209215/.

The signal library just appears to wrap mingw's signal.h https://github.com/jonasstrandstedt/MinGW/blob/master/MinGW/include/signal.h (not sure if that's the real MinGW source):

SIGTERM comes from what kind of termination request exactly?

I've also checked, and the "can only be terminated forcefully" thing applies to any Haskell processes, so seems like the PR doesn't break anything, either.

I think what I'll do is

  1. drop the signal dependency, and #ifdef notwindows the new signal handling code, using posix instead
  2. raise an issue with the signal package to document that it doesn't really do anything useful (or give the author a chance to clarify my misunderstanding)

Anyway, effect should be that we just get the better Ctrl-C exit on Windows, and the SIGTERM fix on Unix only.

@robx
Copy link
Collaborator Author

robx commented Mar 11, 2022

(There's probably something the ghc windows runtime could do to make it respond to taskkill without /f... Some searching indicates that what it does is send a message WM_CLOSE, though I'm unsure if that applies to a command line program.

...

This looks like the spot in the GHC runtime: https://gitlab.haskell.org/ghc/ghc/-/blob/master/rts/win32/ConsoleHandler.c#L214)

@robx robx changed the title Handle SIGTERM properly Handle SIGTERM properly, on unix systems Mar 11, 2022
@hasufell
Copy link
Member

cabal run and cabal exec should use execSpawn like stack does, see #7757 (comment) ...there's no reason to keep the parent process alive

Will this be addressed?

@robx
Copy link
Collaborator Author

robx commented Mar 16, 2022

cabal run and cabal exec should use execSpawn like stack does, see #7757 (comment) ...there's no reason to keep the parent process alive

Will this be addressed?

I didn't intend to address this, at least not in this PR, since it seems orthogonal to the change as is.

(ETA: I did address the comment in so far as replying to it above, did you see that?)

Copy link
Member

@Mikolaj Mikolaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot.

@jneira
Copy link
Member

jneira commented Mar 18, 2022

@Mergifyio rebase master

robx added 5 commits March 18, 2022 21:58
That's roughly how Ctrl-C (SIGINT) is handled by the GHC
runtime.

This way, killing cabal via SIGTERM will give it a chance to
terminate any running children. Previously, it would exit
directly, leaving children behind to exit on their own.
@mergify
Copy link
Contributor

mergify bot commented Mar 18, 2022

rebase master

✅ Branch has been successfully rebased

Copy link
Member

@jneira jneira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@hasufell
Copy link
Member

hasufell commented Jun 9, 2022

The cabal pre-release now includes this change and I'm getting a lot of this lately when cancelling builds:

cabal-3.8.0.20220526: waitForProcess: does not exist (No child process)

Pretty sure it's this patchset.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 9, 2022

That's on LInux. Is the problem only the alarming message or is there a visible delay or are zombies created or whatever?

@robx
Copy link
Collaborator Author

robx commented Jun 9, 2022

Quoting the PR description:

There's a bug in process versions before 1.6.14 (haskell/process#231) which causes spurious prints of cabal: waitForProcess: does not exist (No child processes) when interrupting with Ctrl-C.

(I think process-1.6.14.0 ships with GHC 9.2.)

@hasufell
Copy link
Member

hasufell commented Jun 9, 2022

Oh no. So I was wrong.

I guess we can set a constraint in cabal.project.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 9, 2022

We could conditionally exclude the borked versions of process (if cabal is compiled on new enough GHCs that the new process can be used instead) or in cabal.project somehow, I guess. PR welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Tell Mergify Bot to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cabal run leaves children running when killed
4 participants