-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: make.bat hangs #36492
Comments
Thanks for the detailed report. Most of the threads look uninteresting, except, I think, these two:
These are all stuck in Thread 13 is also interesting because the "suspend count" is 2, suggesting that some other thread has suspended it and is failing to resume it. This may also be why threads 4 and 10 are stopped in obviously blocking operations, while thread 13 is stopped at a seemingly random point. Do you know what "tmmon64" and "TmUmEvt64" are? Maybe we just need to hold the suspendLock for longer (though I don't have a theory for why this would be). What happens if you move the |
This is not my computer, so I cannot properly poke at it. But I suspect this computer has some standard anrivirus software installed. And these kinds of software often would install their code to intercept real Win32 API calls.
I won't see her on the weekend. But I will try it Monday or Tuesday. Also please show diff of your suggested change, because I don't trust myself with reading your mind. Thank you. Alex |
I changed the code, like this diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index 91e147fca9..9166aeb323 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -1201,8 +1201,6 @@ func preemptM(mp *m) {
// GetThreadContext actually blocks until it's suspended.
stdcall2(_GetThreadContext, thread, uintptr(unsafe.Pointer(c)))
- unlock(&suspendLock)
-
// Does it want a preemption and is it safe to preempt?
gp := gFromTLS(mp)
if wantAsyncPreempt(gp) && isAsyncSafePoint(gp, c.ip(), c.sp(), c.lr())
{
@@ -1231,6 +1229,8 @@ func preemptM(mp *m) {
stdcall1(_ResumeThread, thread)
stdcall1(_CloseHandle, thread)
+
+ unlock(&suspendLock)
}
// osPreemptExtEnter is called before entering external code that may And I managed to run make.bat once to successful completion. But then it hung as before, when I run make.bat second time. This time, it is compile.exe that hung. Here is the stack trace from windbg
Alex |
I just tried to verify this issue. And it is still broken on af9ab6b. It gets stuck here
This what process tree looks like when stuck And this is what windbg says about compile.exe process with pid of 11956:
Alex |
Thanks for the two other dumps. I did some more searching around and I'm almost positive tmmon64 and TmUmEvt64 are related to Trend Micro anti-virus, which agrees with what you said in #36492 (comment). Unfortunately, I think its syscall interception is introducing a lock cycle that's leading to a deadlock. In #36492 (comment), threads 6 and 8 have suspend counts > 1. Threads 0 and 6 (again) are in ResumeThread. So, thread 6 must have suspended thread 8 for preemption, and then when it was trying to resume thread 8, thread 0 suspended thread 6 for preemption. Where this gets interesting is that all three of these threads are in Windows memory allocation functions via tmmon64/TmUmEvt64. My guess is there's a cycle between threads 0 and 6: TmUmEvt64 on thread 6 locked the Windows heap inside ResumeThread, and was then suspended with that lock held. When thread 0 then tried to resume it with ResumeThread, TmUmEvt64 again tried to lock the Windows heap, but it can't get that lock, so it's stuck. #36492 (comment) shows similar evidence: thread 6 is suspended in RtlUnlockHeap (via TmUmEvt64) and thread 4 is in GetThreadContext -> TmUmEvt64 -> RtlAllocateHeap, indicating that it has thread 6 suspended and is in a lock cycle on the Windows heap lock. Even completely serializing thread suspend/resume by moving the unlock doesn't help enough because TmUmEvt64 can wind up in Windows heap functions through other means. So, ultimately, this is probably a bug in Trend Micro, but only because we're doing something really unusual with suspending our own threads, which, sadly, probably makes this our problem. The downsides of using SuspendThread keep piling up, but I have no idea what to replace it with. :( |
Fascinating. It seems .NET uses SuspendThread for driving threads to GC safe-points. It seems they ran into similar problems with Windows heap locks in general, though not specifically related to system call interceptors. I don't see anything in threadsuspend.cpp itself that's obviously different from what we do, but there's a huge comment about OS resources and SuspendThread that indicates they're carefully synchronizing every transition into and out of managed code (presumably this includes every "system call") so they don't even attempt to suspend a thread that isn't in managed code. If I've followed the twisty passages correctly, this winds up at DisablePreemptiveGC and EnablePreemptiveGC and ultimately RareDisablePreemptiveGC and RareEnablePreemptiveGC, which look like they can block transitions into and out of managed code depending on GC preemption state. |
You are, probably, correct. I use this PC at work. Our admin run whatever software they like on it.
I agree. I run a lot of different programs on that computer, and none of it hangs. This makes Go build tools impossible to use on this computer. I suspect the same can be said about programs built with Go. The PC is still running Windows 7 - which is rare this days. Hopefully this bug is uncommon.
Personally I don't see any benefits from preemption. I don't run any code that requires preemption on Windows. I would just disable preempt code on Windows. You will also avoid Delve problems on Windows.
It is quite possible. But I expect to see restrictions like that mentioned on Windows API descriptions. I am not aware of any such thing. Adding @zx2c4 in case he has some bright ideas. Alex |
Another attempt to verify this issue. I checked against ec51703 (tagged as go1.17). This time it was harder to break it - it took me 3 running of make.bat before it hung. I am not sure if it is because of new version of Go or because software on my OS changed or because my PC is overloaded or under-loaded with other programs. Here is how my environment is configured: set HOME=c:\users\alexb\dev\
set GOROOT=%HOME%\go
set GOROOT_BOOTSTRAP=%HOME%\go1.4
set GOPATH=%HOME%
set MINGW=%HOME%\tdm_gcc_64_5.1.0
set PATH=%PATH%;%MINGW%\bin;%GOROOT%\bin
cd %GOROOT%\src
cmd Here is what I did:
Here is a screenshot of Go build process tree that hung in Setting Unfortunately Go still hangs if I run some tests in runtime package, because some tests silently clear Alex |
I have Windows 7 computer - windows/amd64.
I am building Go from source using go1.4 as bootstrap.
I am using commit 56d6b87
I am running make.bat command.
What did you expect to see?
I expected make.bat command finish successfully.
What did you see instead?
make.bat never finishes. It hangs, for example, like this:
I used process explorer https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer to examine process tree, and that is what I see:
I also used WinDbg to attach to go_bootstrap.exe (pid 9788) and print stacks of all its threads. And that is what I see:
I was able to use Delve to examine this bug once (see #35775 (comment)), but not anymore. Delve just fails to attach now.
I can reproduce this pretty reliably on this particular computer - make.bat never completes. Sometimes it hangs in go_bootstrap.exe and sometimes in compile.exe. Sometimes there are more than single hung compile.exe.
I can make problem go away, if I change source code to have runtime.preemptMSupported set to false.
/cc @aclements
Alex
The text was updated successfully, but these errors were encountered: