-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: "fatal: morestack on g0" with PGO enabled on arm64 #62120
Comments
CC @cherrymui, @golang/runtime. |
The actual error is If you could reproduce the failure in a debugger, that might be helpful. We might be able to find out why we call morestack there in a debugger. If you build the program with Thanks. |
I'm able to reproduce the crash in a debugger when compiling this small repro case with the .pgo file from the original build. When I expose this program to our production HTTP traffic it crashes within a minute or two. This is probably not an absolutely minimal case, but small changes to the above program don't reproduce the error, including:
Since there is no longer any private information in the crash, I can share the full output here. I also recorded the build output with -gcflags=all=-d=pgodebug=1 and -gcflags=all=-d=pgodebug=2. I'm not quite sure how to determine which function is being miscompiled. I put a breakpoint on runtime.abort and let the program crash in a debugger (I'm using Delve). |
@cherrymui , want to roll your "morestack on g0" emergency traceback CL for arm64? This could be a use for PGO bisection? |
Thanks for the small reproducer and the information! The recursive If you could share the profile and/or a core file, I probably could try to reproduce the crash and do the manual unwinding. @aclements sure I'll try to work out something for the "morestack on g0" traceback. My old code has a merge conflict with the new unwinder. I'll need to rebase, and port to ARM64. PGO bisection would probably also help. I'll hack up something for that, too. |
No problem, here is the executable and the core file, taken from the breakpoint on runtime.abort. I may be able to privately share the profile as well; I'm discussing it with the security team at my company. Thanks for taking a look at this, and please let me know if there's more information I can provide (other than the profile). |
@dmac thanks for sharing. Unfortunately, it seems the core file you shared does not include register information (or at least the GDB I used cannot read the registers from it). So it would be hard to investigate. Could you share a new one that includes the registers? Also, CL https://go.dev/cl/419435 attempts to print a stack trace when a "morestack on g0" error occurs. Could you patch that CL (apply to the Go runtime, and rebuild your binary) and see if it prints anything helpful? Thanks. |
I generated that core file with Delve, and while I can see registers when inspecting it with Delve, I also can't see registers when inspecting it with GDB. I performed the repro with the minimal executable using GDB this time and generated a new core file. Does that work better?
I was able to repro the crash using that patch on our full program; the partial output can be viewed here. However, I haven't yet been able to reproduce the issue using the CL patch on my minimal program yet. I might need to find a new minimal repro when building with the patch. (I'm not sure if the new stack trace is useful without the full binary/core.) |
Thanks! This looks like an actual stack overflow. The g0 stack size is 8 KB, which matches the size we allocated https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=1941 (I assume this is a non-cgo program). 8 KB g0 stack looks rather small to me. Maybe due to PGO the stack frames are larger and just pushes it over the limit... See also #62489. Could you try if just increasing the g0 stack size to 16 KB would fix the issue? That is, apply the patch in #62489 (comment) and rebuild the program with the same profile. Thanks. |
Correct, not cgo. And yes, that patch seems to fix the issue in both the minimal program and the original program. |
Change https://go.dev/cl/526995 mentions this issue: |
@dmac thanks for confirming! |
@gopherbot please backport this to Go 1.21. This can cause programs built with PGO fail to run. There is no workaround except disabling PGO. |
Backport issue(s) opened: #62537 (for 1.21). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Change https://go.dev/cl/527055 mentions this issue: |
Thank you! |
Currently, for non-cgo programs, the g0 stack size is 8 KiB on most platforms. With PGO which could cause aggressive inlining in the runtime, the runtime stack frames are larger and could overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only one per OS thread, so it shouldn't increase memory use much. Updates #62120. Updates #62489. Fixes #62537. Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac Reviewed-on: https://go-review.googlesource.com/c/go/+/526995 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit c6d550a) Reviewed-on: https://go-review.googlesource.com/c/go/+/527055
Currently, for non-cgo programs, the g0 stack size is 8 KiB on most platforms. With PGO which could cause aggressive inlining in the runtime, the runtime stack frames are larger and could overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only one per OS thread, so it shouldn't increase memory use much. Updates golang#62120. Updates golang#62489. Fixes golang#62537. Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac Reviewed-on: https://go-review.googlesource.com/c/go/+/526995 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit c6d550a) Reviewed-on: https://go-review.googlesource.com/c/go/+/527055
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
go1.21.0 is the latest release at the time of this writing.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I took a pprof profile from a web server running on a linux/arm64 instance, then used that profile with
-pgo
to cross-compile a new arm64 binary from my amd64 development computer. Running that binary on the arm64 instance crashes due to a segfault within a few seconds of starting, after beginning to run its normal code paths.This issue reproduces easily with this program, but it does not reproduce with a simple "hello world" web server. Unfortunately, this program is proprietary, but it is around 46,000 lines of code and the size of the compiled binary is about 50 MB.
I can provide a more detailed, redacted crash log if that would be helpful. I may also be able to supply a binary through a private channel if that turns out to be necessary. Please let me know if there's any more information I can provide that would be helpful.
What did you expect to see?
I expect the program to run without crashing.
What did you see instead?
The program crashes with a segfault. The head and tail of the crash log is:
The text was updated successfully, but these errors were encountered: