-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: intermittent "failed to create new OS thread" on Windows since 2022-01-24 #52572
Comments
Since most of these failures are in However the “requested the Runtime to terminate it in an usual way" failure mode in particular looks like a Go runtime bug. Since |
ping: this is a release blocker that hasn't been updated in a while. |
Three out of the four also have an "out of memory" failure, which strongly suggests this is a real resource exhaustion on the host. That's consistent with most of the failures being on a particular subrepo. The "requested the Runtime to terminate it in an usual way" message is not coming from the Go runtime, and must be coming from the C runtime. That's presumably printed by CreateThread failing, but it is odd that the Go runtime still gets to print "failed to create new OS thread" after the C runtime prints its message. (Maybe that's a buffer flushing thing?) |
I reproduced this 12 times over the weekend on gomotes, interestingly always in package runtime's TestCgoSignalDeadlock:
|
I should add that most, but not all, failures include the "This application has requested ..." message, and that none of the failures include any other resource or other test failures in the same run. |
The tests did seem to take abnormally long to run. That said, the first run (which passed) on a new gomote took 12s to complete (mostly spent building the test binary), so these aren't that far off.
|
Ah, well the "requested the Runtime to terminate it in an usual way" message is not surprising. The runtime message comes from https://cs.opensource.google/go/go/+/master:src/runtime/cgo/gcc_windows_amd64.c;l=31-32. Per the Microsoft docs, "By default, the abort routine prints the message: "This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information."" I am now more confused why I sometimes don't get the message, and also why the runtime error is printed after the message, given that the docs say "abort does not flush stream buffers or do atexit/_onexit processing." |
"runtime: failed to create new OS thread (13)" is errno EACCES. |
I am running with https://go.dev/cl/408216 to try to get a sense of whether the crashing process is using lots of memory, the overall system is low on memory, or this has nothing to do with memory. (Hopefully there is enough memory to run the dump function 😅) |
Change https://go.dev/cl/408216 mentions this issue: |
@prattmic Have you tried extending the idea of https://go.dev/cl/33894 to Windows? The exact reason for an "insufficient resources" error isn't that interesting if retrying fixes the problem. |
I haven't, but it is not a bad idea, as That said, while this is the only error in my repros, all of the failures in Bryan's original comment also have other fatal OOMs in the allocator. So while a retry might make thread creation succeed, the program may die anyways due to another OOM. It seems like the ultimate problem here might end up being test/builder misconfiguration. i.e., we are running too many high-memory tests in parallel and exhausting all memory on the builder. |
Since there's a decent chance this is resource exhaustion on the builders I'm not sure this is going to impede users using the beta, so marking as okay-after-beta1. |
I reproduced with https://go.dev/cl/408216:
Total memory usage on the system is only ~18% of available memory, and the failing process is only using ~25MB of memory, so this does not seem like an obvious memory pressure issue. I'll give Ian's suggestion a shot and see if retry of |
Change https://go.dev/cl/410354 mentions this issue: |
Change https://go.dev/cl/410355 mentions this issue: |
The bodies of cgo_sys_thread_start (and x_cgo_sys_thread_create) are nearly identical on all of the windows ports. Create a single _cgo_beginthread implementation that contains the body and is used on all ports. This will reduce churn in an upcoming CL to add retry logic. We could theoretically have a single implementation of _cgo_sys_thread_start shared by all ports, but I keep them separate for ease of searching. Right now every single port implements this function in their gcc_GOOS_GOARCH.c file, so it is nice to keep this symmetry. _cgo_dummy_export must move out of libcgo_windows.h because it is a definition and the inclusion of libcgo_windows.h in multiple files creates duplicate definitions. For #52572. Change-Id: I9fa22009389349c754210274c7db2631b061f9c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/410354 Run-TryBot: Michael Pratt <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
I haven't been able to reproduce this issue with CL 410355, so closing for now, though of course that CL isn't a hard fix so we'll see if this comes back. |
greplogs --dashboard -md -l -e '(?ms)\Awindows-.*runtime: failed to create new OS thread'
2022-04-26T02:30:39-5bb9a5e-17d7983/windows-amd64-race
2022-03-21T13:26:21-86b02b3-4aa1efe/windows-arm64-11
2022-02-04T14:02:15-25d2ab2-4afcc9f/windows-arm64-11
2022-01-24T16:55:59-f9df4ea/windows-amd64-2008
Curiously, I don't see any of these errors in the logs before 2022-01-24. 🤔
The text was updated successfully, but these errors were encountered: