-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/trace: "failed to parse trace: no consistent ordering of events possible" #29707
Comments
I have to add, I was testing a binary that was linked with C code (CGO). Is it possible that tracing does not work properly in CGO builds? |
@hyangah I just ran into this and made a repo. Basically it happens when you export a Go function to C and then call that function inside a C created thread.
|
Here's a related flake in
|
Any update? I cannot parse trace created on arm64 from x86 machine |
Change https://golang.org/cl/234617 mentions this issue: |
Currently sysmon is not stopped when the world is stopped, which is in general a difficult thing to do. The result of this is that when tracing starts and the value of trace.enabled changes, it's possible for sysmon to fail to emit an event when it really should. This leads to traces which the execution trace parser deems inconsistent. Fix this by putting all of sysmon's work behind a new lock sysmonlock. StartTrace and StopTrace both acquire this lock after stopping the world but before performing any work in order to ensure sysmon sees the required state change in tracing. This change is expected to slow down StartTrace and StopTrace, but will help ensure consistent traces are generated. Updates #29707. Fixes #38794. Change-Id: I64c58e7c3fd173cd5281ffc208d6db24ff6c0284 Reviewed-on: https://go-review.googlesource.com/c/go/+/234617 Run-TryBot: Michael Knyszek <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Hyang-Ah Hana Kim <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
I tried with latest master branch, it still has this problem
The go code is built as a so, and called by Java. We can reproduce it every time under normal workload, but OK when it's idle. |
I can repro this issue on our production services consistently:
I've also tried running |
Can repro this issue on our production services every time, with Go 1.15. Our code involves cgo, don't know whether it's related. |
For anyone who's interested in investigating this, I am attaching a trace from the program @AlexRouSg posted. This trace parsing failed because of an unexpected Lines 207 to 213 in c847589
I suspect the code path that should trigger Line 239 in c847589
I couldn't find the corresponding stack info (stack id: 0) in the trace, so don't know what goroutine this is either. :-( cc runtime people: @dvyukov @aclements @mknyszek @ianlancetaylor |
Is there any progress on this? Even a suggested workaround would be welcome. I'm using the same basic pattern as #29707 (comment) |
@michaelfig the only workaround I can find is to never call Go code from C created threads |
Thanks, I think I'll be able to manage that for our application. |
same issue,have any update? |
I use curl http://addr:port/debug/pprof/trace to get the trace file, and get the same error. |
@wangfakang, @nejisama, do your applications also involve call-backs from C to Go on threads created in C? |
same issue, waiting for update, program no cgo, go version 1.14.15 |
same problem here using go1.17.1. |
@aclements yeah. |
The 2022 failure is also not one of the tests that frequently triggered this in the past. 🤔 (I've filed it separately as #51224.) |
I meet the same problem! |
Change https://go.dev/cl/411034 mentions this issue: |
Have the same issue on Go 1.18.3 linux/amd64, in my case Python's the main, calling Go via CFFI (C basically). While Python/C creates its own threads, the problem happens when there's high concurrency. Unfortunately in my case I can't control the Python side so any idea on how to work around this would be great. |
Temporarily reverted in https://go.dev/cl/423437. |
Change https://go.dev/cl/429858 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I ran a network intensive go program (closed source) with a fairly light workload.
What did you expect to see?
I expect to be able to obtain a trace profile and explore it with the viewer program.
What did you see instead?
After obtaining the profile with
wget -O trace.out "http://localhost:6060/debug/pprof/trace?seconds=10"
. When runninggo tool trace trace.out
I received:The text was updated successfully, but these errors were encountered: