Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Gentler subprocess termination, for windows #862

Closed
sdboyer opened this issue Jul 20, 2017 · 7 comments
Closed

Gentler subprocess termination, for windows #862

sdboyer opened this issue Jul 20, 2017 · 7 comments

Comments

@sdboyer
Copy link
Member

sdboyer commented Jul 20, 2017

As a follow-up to #857, we want to make subprocess termination a little more selective/gentler on Windows, too.

@mattn and @ChrisHines had a bunch of thoughts, already, but I wanted to move that discussion to an issue here, so that that PR can focus on getting the UNIX support done. Here's my attempt to summarize those thoughts:

  • Invoke taskkill:
exec.Command("taskkill", "/F", "/T", "/PID", fmt.Sprint(cmd.Process.Pid)).Run()
@mattn
Copy link
Member

mattn commented Jul 20, 2017

I suggest to use alex's ps package.

@JoshuaSjoding
Copy link

I wrote a package called graceful that will hopefully work well for you.

It's based on a solution to the same problem used in git-for-windows that issues an ExitProcess() call within the target process. My understanding is that the git-for-windows client will trap this action and execute a controlled cleanup.

I don't have any knowledge of how other version control systems will handle the intrusion, but I expect it will still be preferable to calling TerminateProcess().

@sdboyer
Copy link
Member Author

sdboyer commented Aug 1, 2017

@JoshuaSjoding iiiinteresting. yeah, all we can really hope for here is incremental improvement over the current conditions.

any interest in making a PR? 😄

@alexbrainman
Copy link
Member

I do not know what the problem is that you are trying to fix in this current issue. I looked at issue #823, and it is about dep taking long time to complete - 20 minutes long. It should not take 20 minutes for git to copy even large repo, so I suspect some state is corrupted. If it is something that git manages, how can you prevent this from happening? On the other hand @sdboyer says (here #857): "I think git is quite careful to ensure it cleans up after itself - possibly even incrementally, so that progress is saved?". So, I take it, you hope that git is OK to be killed by Windows TerminateThread. There might be undoubtedly be some child processes left, and they might interfere with new git process trying to redo all interrupted work again. But issue #823 is not even for Windows. So I am not even convinced you have Windows issue to fix. Do you have more information to add to this issue? Do you have a repro that I can run here?

It is, probably, OK to use github.com/gentlemanautomaton/graceful instead of calling Windows TerminateThread (that is what you do now). But github.com/gentlemanautomaton/graceful just calls Windows ExitProcess, and ExitProcess will run more of "my process is stopping" code than TerminateThread. You can read ExitProcess doco - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658(v=vs.85).aspx that explains what ExitProcess does, but you still cannot be sure that it will run the code you need - the code that saves state to the disk. Maybe ExitProcess will handle child processes better than TerminateThread, but maybe it won't. So, perhaps, it is good idea to implement killing of all Git children too. you can use Windows Job Objects as suggested by @ChrisHines or you could just use something like https://github.com/mattn/psutil/blob/master/psutil_windows.go#L53 as suggested by @mattn. The second approach would be more error prone, because it is racy.

I can implement whatever you decide, but it would be nice if we had some Windows repro before we try and fix this. Please, let me know if you want me to do something here.

Alex

@sdboyer
Copy link
Member Author

sdboyer commented Aug 8, 2017

hi Alex, thanks for coming by!

I do not know what the problem is that you are trying to fix in this current issue.

this issue is best understood as me, probably unwisely, extrapolating from a fix we needed to make on the UNIXish side: i had written subprocess termination code that relied on os.Process.Kill() without parameterizing per platform. this would be triggered by a Ctrl-C, and was resulting in numerous git repositories that were mid-clone becoming corrupted in unpredictable ways.

#857 was my fix for non-Windows platforms, shifting to sending os.Interrupt signals with a timer-based fallback to Process.Kill(). but Windows retained the same Process.Kill() semantics as before.

my goal with this issue is just to try to narrow the window of possibility that an attempt to terminate dep processes early on windows could lead to corrupted cache items. my criteria for evaluating whether or not a change here is worth making is whether it reliably achieves that goal, without introducing other problems.

but it's worth noting that:

it would be nice if we had some Windows repro before we try and fix this.

it does not appear that this is a problem our windows users have encountered yet - at least, judging from bug reports. so it may be best to let it sit? i don't know. this is my essential problem - i lack the experience with windows to know what the overall least-risky path is here.

@alexbrainman
Copy link
Member

#857 was my fix for non-Windows platforms, shifting to sending os.Interrupt signals with a timer-based fallback to Process.Kill(). but Windows retained the same Process.Kill() semantics as before.

Yes. You replaced os.Process.Kill() call with os.Process.Signal(os.Interrupt) in the hope that the process that gets killed will be able to run some "clean up" code from its os.Interrupt handler. Unfortunately there is no such mechanism on Windows - Windows process (in general) does not have a way to say to OS to run some code before it gets killed. So, unless Git provides some instructions about this, I don't think you can do anything useful.

it does not appear that this is a problem our windows users have encountered yet - at least, judging from bug reports. so it may be best to let it sit?

I would wait for reports. On Windows you cannot delete files or directories, if some process have them opened. So I suspect you will know quickly if you leave some external processes hanging.

I say don't change things, because you might make things worse. For example, at this moment only main Git process gets killed via TerminateProcess, all other children of his (hopefully) exit themselves for one reason or the other. But if you introduce Windows Job Objects, the way to kill whole process tree is to call TerminateJobObject. But TerminateJobObject just "as if TerminateProcess were called for each process associated with the job". So now even child processes won't have even chance to exit properly.

Alex

@sdboyer sdboyer added future and removed help wanted labels Aug 8, 2017
@sdboyer
Copy link
Member Author

sdboyer commented Aug 8, 2017

cool, i'm good with that. thanks for your input - updating labels a bit accordingly 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants