-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: TestCPUProfileInlining failures with "got separate Location entries" #50996
Comments
(Note that most of these failures are on first-class ports.) |
Looking at the linux-amd64-nocgo failure, https://build.golang.org/log/1d40b595779b94ddca0ff6d018d5378dea23f3f6 that log shows three call stacks, all with root ends of locations 2,3,4,5,6. The kind of leaf end the test is looking for is "1", PC 0x539604, encoded as inlinedCaller calling inlinedCallee calling cpuHog0. But the test fails because some of the samples have leaf ends of location 7 (PC 0x5395e1, inlinedCaller), and of location 7 calling location 8 (PC 0x5395fc, inlinedCallee calling cpuHog0).
When I checked out the commit from that log and rebuilt the test command, I got the following disassembly. (Very very nice that the PCs match what's in the build log. Wow.)
Here's the part of the test that fails. It looks at the Location table, not the samples themselves. go/src/runtime/pprof/pprof_test.go Lines 246 to 260 in 0bef30d
|
It looks like this gets into trouble when one of the "fake" PCs for an outer frame shows up in an actual profile sample. I don't understand all of the profileBuilder code yet, but it looks like the presence of that real PC in the under-construction location map prevents the code from interpreting that same PC as a "fake" PC when it appears mid-stack in a later sample. I've reproduced the failure on linux/amd64 with with
Here are the failures I've seen with the patch:
Here's one of the success cases with the patch:
Note that the failures include PC |
Change https://go.dev/cl/384239 mentions this issue: |
"The fake PCs that the runtime creates to represent inlined calls can also appear in the outer function as real PCs." This sounds wrong to me. I think (though I'm not certain) that the compiler should insert enough NOPs as necessary to ensure that the tracepc for inlined functions will always be less than the PC of the next instruction in the calling function. @cherrymui does this sound right to you? If that is correct, then it would seem the bug is either in:
i.e., it looks like your CL would suppress the issue, but if I'm not misunderstanding it may be fixing it in the wrong place. |
If all of those mechanisms work, it seems like the result is an outer function that contains a sequence of NOP instructions leading up to the instructions that form the body of the inlined function. What stops those NOP instructions from executing, from taking CPU time to execute, and from showing up as the leaf-most PC in a CPU profile? (Edit: the example program I wrote to test the presentation of NOPs uses mid-stack inlining, so the inlined body is a CALL instruction, as reported by |
Nothing stops them from naturally appearing in a profile sample. When they do they should be recorded as part of the inlined function. |
When I ran the test with the patch to print the arguments to I think at that point we're supposed to subtract one from each address passed in to Here's objdump from the program I have running now, CL 384239 PS 1.
And here is how those addresses show up with binutils' addr2line when I ask it to describe inlining:
So it looks to me like a leaf of It looks like all of that works well. And But when offset +0x1 shows up by itself at the leaf ("Nothing stops them from naturally appearing in a profile sample"), it gets its own Location and then appears in |
That is to say, based on what I know, the behavior and the fix I proposed seem consistent with how I understand these parts. But I don't know enough about the funcdata info to consider whether this is how a bug in those parts would present itself. It looks like each NOP instruction only gets used to show a single layer of inlining—so if a SIGPROF hitting on one of those should be reported as all of the layers of inlining (rather than only those up to that particular NOP), then yes, there are more pieces missing in order to make that work. Is that what you're describing @prattmic ? |
That is what I am thinking, but I'll need to find time to think about this more deeply. I've been swamped dealing with other issues for the past few days. :) |
If I understand it correctly I think the NOPs have inlining information attached to them up to the level its line number represents, e.g. for
the NOPL is associated with inlinedCaller, the XORL is associated with inlinedCallee inlined into inlinedCaller. I'd still need to understand what went wrong... |
I agree that expansion and re-compression is weird. Maybe ideally we could find a way to avoid the expansion then re-compression. I think I understand the failure mode. The fix in the CL probably makes sense, at least for the short time if we cannot avoid the expansion and re-compression. Thanks. |
Maybe another idea is to have the compiler to insert the NOPs at the end of a function so they never get executed. |
One more on 2022-03-08T15:18:16-0b76afc/darwin-amd64-race I don't have a strong opinion about how to prioritize the fix, but if it's going to be more than a few more weeks I'd at least appreciate adding a test-skip in the meantime. |
Change https://go.dev/cl/404995 mentions this issue: |
The compiler may choose to inline multiple layers of function call, such that A calling B calling C may end up with all of the instructions for B and C written as part of A's function body. Within that function body, some PCs will represent code from function A. Some will represent code from function B, and for each of those the runtime will have an instruction attributable to A that it can report as its caller. Others will represent code from function C, and for each of those the runtime will have an instruction attributable to B and an instruction attributable to A that it can report as callers. When a profiling signal arrives at an instruction in B (as inlined in A) that the runtime also uses to describe calls to C, the profileBuilder ends up with an incorrect cache of allFrames results. That PC should lead to a location record in the profile that represents the frames B<-A, but the allFrames cache's view should expand the PC only to the B frame. Otherwise, when a profiling signal arrives at an instruction in C (as inlined in B in A), the PC stack C,B,A can get expanded to the frames C,B<-A,A as follows: The inlining deck starts empty. The first tryAdd call proposes PC C and frames C, which the deck accepts. The second tryAdd call proposes PC B and, due to the incorrect caching, frames B,A. (A fresh call to allFrames with PC B would return the frame list B.) The deck accepts that PC and frames. The third tryAdd call proposes PC A and frames A. The deck rejects those because a call from A to A cannot possibly have been inlined. This results in a new location record in the profile representing the frames C<-B<-A (good), as called by A (bad). The bug is the cached expansion of PC B to frames B<-A. That mapping is only appropriate for the resulting protobuf-format profile. The cache needs to reflect the results of a call to allFrames, which expands the PC B to the single frame B. For #50996 For #52693 Fixes #52764 Change-Id: I36d080f3c8a05650cdc13ced262189c33b0083b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/404995 Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Rhys Hiltner <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
greplogs --dashboard -md -l -e 'FAIL: TestCPUProfileInlining .*(?:\n .*)* got separate Location entries' --since=2021-01-01
2022-02-03T16:58:29-f9b761a/windows-amd64-longtest
2021-11-05T17:23:06-37951d8/linux-386-longtest
2021-10-31T02:24:29-0bef30d/linux-amd64-nocgo
2021-10-28T20:39:36-834e36e/windows-amd64-longtest
2021-08-06T20:44:03-63b968f/openbsd-mips64-jsing
2021-05-08T17:03:18-b38b1b2/linux-amd64-staticlockranking
May be related to #42862.
CC @cherrymui @prattmic @rhysh
The text was updated successfully, but these errors were encountered: