-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: rewrite gentraceback as an iterator API #54466
Comments
Keep in mind #7181, where we want to print bottom and top of deep stacks. It probably requires walking the stack twice, or keeping a buffer of 50 frames, or something like that. |
Change https://go.dev/cl/424254 mentions this issue: |
Change https://go.dev/cl/424255 mentions this issue: |
Change https://go.dev/cl/424257 mentions this issue: |
I think this will actually make that dramatically easier by inverting the flow of traceback printing. With this change, the printer will drive the stack walk instead of the other way around, so I think it will be much easier for the printer to keep the buffer it needs, and to do so without adding complexity around the stack walk itself. |
Change https://go.dev/cl/424514 mentions this issue: |
Change https://go.dev/cl/424516 mentions this issue: |
Change https://go.dev/cl/424515 mentions this issue: |
Change https://go.dev/cl/425936 mentions this issue: |
Currently, gentraceback tracks the closure context of the outermost frame. This used to be important for "unstarted" calls to reflect function stubs, where "unstarted" calls are either deferred functions or the entry-point of a goroutine that hasn't run. Because reflect function stubs have a dynamic argument map, we have to reach into their closure context to fetch to map, and how to do this differs depending on whether the function has started. This was discovered in issue #25897. However, as part of the register ABI, "go" and "defer" were made much simpler, and any "go" or "defer" of a function that takes arguments or returns results gets wrapped in a closure that provides those arguments (and/or discards the results). Hence, we'll see that closure instead of a direct call to a reflect stub, and can get its static argument map without any trouble. The one case where we may still see an unstarted reflect stub is if the function takes no arguments and has no results, in which case the compiler can optimize away the wrapper closure. But in this case we know the argument map is empty: the compiler can apply this optimization precisely because the target function has no argument frame. As a result, we no longer need to track the closure context during traceback, so this CL drops all of that mechanism. We still have to be careful about the unstarted case because we can't reach into the function's locals frame to pull out its context (because it has no locals frame). We double-check that in this case we're at the function entry. I would prefer to do this with some in-code PCDATA annotations of where to find the dynamic argument map, but that's a lot of mechanism to introduce for just this. It might make sense to consider this along with #53609. Finally, we beef up the test for this so it more reliably forces the runtime down this path. It's fundamentally probabilistic, but this tweak makes it better. Scheduler testing hooks (#54475) would make it possible to write a reliable test for this. For #54466, but it's a nice clean-up all on its own. Change-Id: I16e4f2364ba2ea4b1fec1e27f971b06756e7b09f Reviewed-on: https://go-review.googlesource.com/c/go/+/424254 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
The f funcInfo argument is always the same as frame.fn, so we don't need to pass it. I suspect that was there to make the signatures of getArgInfoFast and getArgInfo more similar, but it's not necessary. For #54466. Change-Id: Idc717f4df09e97cad49d52c5b7edf28090908cba Reviewed-on: https://go-review.googlesource.com/c/go/+/424255 Run-TryBot: Austin Clements <[email protected]> Auto-Submit: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Currently, when traceback jumps from the system stack to a user stack (e.g., during profiling tracebacks), it leaves gp pointing at the g0. This is currently harmless since it's only used during profiling, so the code paths in gentraceback that care about gp aren't used, but it's really confusing and would certainly break if _TraceJumpStack were ever used in a context other than profiling. Fix this by updating gp to point to the user g when we switch stacks. For #54466. Change-Id: I1541e004667a52e37671803ce45c91d8c5308830 Reviewed-on: https://go-review.googlesource.com/c/go/+/424257 Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]>
Currently, stkframe.arglen and stkframe.argmap are populated by gentraceback under a particular set of circumstances. But because they can be constructed from other fields in stkframe, they don't need to be computed eagerly at all. They're also rather misleading, as they're only part of computing the actual argument map and most callers should be using getStackMap, which does the rest of the work. This CL drops these fields from stkframe. It shifts the functions that used to compute them, getArgInfoFast and getArgInfo, into corresponding methods stkframe.argBytes and stkframe.argMapInternal. argBytes is expected to be used by callers that need to know only the argument frame size, while argMapInternal is used only by argBytes and getStackMap. We also move some of the logic from getStackMap into argMapInternal because the previous split of responsibilities didn't make much sense. This lets us return just a bitvector from argMapInternal, rather than both a bitvector, which carries a size, and an "actually use this size". The getArgInfoFast function was inlined before (and inl_test checked this). We drop that requirement from stkframe.argBytes because the uses of this have shifted and now it's only called from heap dumping (which never happens) and conservative stack frame scanning (which very, very rarely happens). There will be a few follow-up clean-up CLs. For #54466. This is a nice clean-up on its own, but it also serves to remove pointers from the traceback state that would eventually become troublesome write barriers once we stack-rip gentraceback. Change-Id: I107f98ed8e7b00185c081de425bbf24af02a4163 Reviewed-on: https://go-review.googlesource.com/c/go/+/424514 Run-TryBot: Austin Clements <[email protected]> Auto-Submit: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
This places getStackMap alongside argBytes and argMapInternal as another method of stkframe. For #54466, albeit rather indirectly. Change-Id: I411dda3605dd7f996983706afcbefddf29a68a85 Reviewed-on: https://go-review.googlesource.com/c/go/+/424515 Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Run-TryBot: Austin Clements <[email protected]> Auto-Submit: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
The stkframe struct and its methods are strewn across different source files. Since they actually have a pretty coherent theme at this point, migrate it all into a new file, stkframe.go. There are no code changes in this CL. For #54466, albeit rather indirectly. Change-Id: Ibe53fc4b1106d131005e1c9d491be838a8f14211 Reviewed-on: https://go-review.googlesource.com/c/go/+/424516 Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Cherry Mui <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]> Auto-Submit: Austin Clements <[email protected]>
Use an early return to reduce indentation and clarify flow. For #54466. Change-Id: I12ce810bea0f22b8707a175dc5ba66241c0a9a21 Reviewed-on: https://go-review.googlesource.com/c/go/+/425936 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Auto-Submit: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Note to self: an iterator-style traceback would make it much easier to eliminate the limit on CPU profile stack frames (#56029). |
Change https://go.dev/cl/468299 mentions this issue: |
Change https://go.dev/cl/468301 mentions this issue: |
Change https://go.dev/cl/468297 mentions this issue: |
Change https://go.dev/cl/468296 mentions this issue: |
Change https://go.dev/cl/472956 mentions this issue: |
I've been digging more into testing tracebacks. I came up with a different approach that seems quite promising for testing the really dark corner cases: set your own single-step flag just before doing something we want to test, and in the SIGTRAP handler keep taking tracebacks and single-stepping until we get what we're looking for or return from the test situation. I have a rough prototype of this on x86 that works on Linux and may trivially work on other Unixes, and I suspect would work without much trouble on Windows. I'm pretty sure I can do it on ARM64, too, but haven't written the code. I'm not sure about other architectures. However, I also looked at the coverage of the traceback code from the |
We're about to make major changes to tracebacks. We have benchmarks of stack copying, but not of PC buffer filling, so add some that we can track through these changes. For #54466. Change-Id: I3ed61d75144ba03b61517cd9834eeb71c99d74df Reviewed-on: https://go-review.googlesource.com/c/go/+/472956 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Currently, gentraceback keeps a copy of the stack bounds of the stack it's walking in the "stack" variable. Now that "gp" always refers to the G whose stack it's walking, we can simply use gp.stack instead of keeping a separate copy. For #54466. Change-Id: I68256e5dff6212cfcf14eda615487e66a92d4914 Reviewed-on: https://go-review.googlesource.com/c/go/+/458215 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Felix Geisendörfer <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
gentraceback also tracks the funcID of the callee, which is more general. Fix this up to happen in all cases and eliminate waspanic in favor of checking the funcID of the caller. For #54466. Change-Id: Idc98365a6f05022db18ddcd5b3ed8684a6872a88 Reviewed-on: https://go-review.googlesource.com/c/go/+/458216 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Felix Geisendörfer <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Currently, gentraceback resolves the funcInfo of the caller prior to processing the current frame (calling the callback, printing it, etc). As a result, if this lookup fails in a verbose context, it will print the failure before printing the frame that it's already resolved. To fix this, move the resolution of LR to a funcInfo to after current frame processing. This also has the advantage that we can reduce the scope of "flr" (the caller's funcInfo) to only the post-frame part of the loop, which will make it easier to stack-rip gentraceback into an iterator. For #54466. Change-Id: I8be44d4eac598a686c32936ab37018b8aa97c00b Reviewed-on: https://go-review.googlesource.com/c/go/+/458217 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Reviewed-by: Felix Geisendörfer <[email protected]>
For #54466. Change-Id: I4d8e1953703b6c763e5bd53024da43efcc993489 Reviewed-on: https://go-review.googlesource.com/c/go/+/466095 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Run-TryBot: Austin Clements <[email protected]>
We've replicated the code to expand inlined frames in many places in the runtime at this point. This CL adds a simple iterator API that abstracts this out. We also use this to try out a new idea for structuring tests of runtime internals: rather than exporting this whole internal data type and API, we write the test in package runtime and import the few bits of std we need. The idea is that, for tests of internals, it's easier to inject public APIs from std than it is to export non-public APIs from runtime. This is discussed more in #55108. For #54466. Change-Id: Iebccc04ff59a1509694a8ac0e0d3984e49121339 Reviewed-on: https://go-review.googlesource.com/c/go/+/466096 TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Run-TryBot: Austin Clements <[email protected]>
Since srcFunc can represent information for either an real text function or an inlined function, this means we no longer have to synthesize a fake _func just to call showframe on an inlined frame. This is cleaner and also eliminates the one case where _func values live in the heap. This will let us mark them NotInHeap, which will in turn eliminate pesky write barriers in the traceback rewrite. For #54466. Change-Id: Ibf5e24d01ee4bf384c825e1a4e2922ef444a438e Reviewed-on: https://go-review.googlesource.com/c/go/+/466097 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
We're about to rewrite this code and it has almost no test coverage right now. This test is also more complete than the existing TestTracebackInlineExcluded, so we delete that test. For #54466. Change-Id: I144154282dac5eb3798f7d332b806f44c4a0bdf6 Reviewed-on: https://go-review.googlesource.com/c/go/+/466098 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This converts all places in the runtime that perform inline expansion to use the new inlineUnwinder abstraction. For #54466. Change-Id: I48d996fb6263ed5225bd21d30914a27ae434528d Reviewed-on: https://go-review.googlesource.com/c/go/+/466099 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Currently, gentraceback consumes the gp.cgoCtxt slice by copying the slice header and then sub-slicing it as it unwinds. The code for this is nice and clear, but we're about to lift this state into a structure and mutating it is going to introduce write barriers that are disallowed in gentraceback. This CL replaces the mutable slice header with an index into gp.cgoCtxt. For #54466. Change-Id: I6b701bb67d657290a784baaca34ed02d8247ede2 Reviewed-on: https://go-review.googlesource.com/c/go/+/466863 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Many compiler-generated panics are dynamically changed to a "throw" when they happen in the runtime. One effect of this is that they are allowed in nowritebarrierrec contexts. Currently, the unsafe.Slice panics don't have this treatment. We're about to expose more code that uses unsafe.Slice to the write barrier checker (it's actually already there and it just can't see through an indirect call), so give these panics the dynamic check. Very indirectly updates #54466. Change-Id: I65cb96fa17eb751041e4fa25a1c1bd03246c82ba Reviewed-on: https://go-review.googlesource.com/c/go/+/468296 TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This is a really nice simplification for all of these call sites. It also achieves a nice performance improvement for stack copying: goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz │ before │ after │ │ sec/op │ sec/op vs base │ StackCopyPtr-48 89.25m ± 1% 79.78m ± 1% -10.62% (p=0.000 n=20) StackCopy-48 83.48m ± 2% 71.88m ± 1% -13.90% (p=0.000 n=20) StackCopyNoCache-48 2.504m ± 2% 2.195m ± 1% -12.32% (p=0.000 n=20) StackCopyWithStkobj-48 21.66m ± 1% 21.02m ± 2% -2.95% (p=0.000 n=20) geomean 25.21m 22.68m -10.04% Updates #54466. Change-Id: I31715b7b6efd65726940041d3052bb1c0a1186f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/468297 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Updates #54466. Change-Id: If070cf3f484e3e02b8e586bff466e0018b1a1845 Reviewed-on: https://go-review.googlesource.com/c/go/+/468298 Run-TryBot: Austin Clements <[email protected]> Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Currently, gentraceback's loop ends with a call to tracebackCgoContext to process cgo frames. This requires spreading various parts of the printing and pcbuf logic across these two functions. Clean this up by moving cgo unwinding into unwinder and then lifting the printing and pcbuf logic from tracebackCgoContext into gentraceback along with the other printing and pcbuf logic. Updates #54466. Change-Id: Ic71afaa5ae110c0ea5be9409e267e4284e36a8c9 Reviewed-on: https://go-review.googlesource.com/c/go/+/468299 Reviewed-by: Michael Pratt <[email protected]> Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Currently, filling PC traceback buffers is one of the jobs of gentraceback. This moves it into a new function, tracebackPCs, with a simple API built around unwinder, and changes all callers to use this new API. Updates #54466. Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c Reviewed-on: https://go-review.googlesource.com/c/go/+/468300 Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Austin Clements <[email protected]>
Printing is the only remaining functionality of gentraceback. Move this into the traceback printing code and eliminate gentraceback. This lets us simplify the logic, which fixes at least one minor bug: previously, if inline unwinding pushed the total printed count over _TracebackMaxFrames, we would print extra frames and then fail to print "additional frames elided". The cumulative performance effect of the series of changes starting with "add a benchmark of Callers" (CL 472956) is: goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz │ baseline │ unwinder │ │ sec/op │ sec/op vs base │ Callers/cached-48 1.464µ ± 1% 1.684µ ± 1% +15.03% (p=0.000 n=20) Callers/inlined-48 1.391µ ± 1% 1.536µ ± 1% +10.42% (p=0.000 n=20) Callers/no-cache-48 10.50µ ± 1% 11.11µ ± 0% +5.82% (p=0.000 n=20) StackCopyPtr-48 88.74m ± 1% 81.22m ± 2% -8.48% (p=0.000 n=20) StackCopy-48 80.90m ± 1% 70.56m ± 1% -12.78% (p=0.000 n=20) StackCopyNoCache-48 2.458m ± 1% 2.209m ± 1% -10.15% (p=0.000 n=20) StackCopyWithStkobj-48 26.81m ± 1% 25.66m ± 1% -4.28% (p=0.000 n=20) geomean 518.8µ 512.9µ -1.14% The performance impact of intermediate CLs in this sequence varies a lot as we went through many refactorings. The slowdown in Callers comes primarily from the introduction of unwinder because that doesn't get inlined and results in somewhat worse code generation in code that's extremely hot in those microbenchmarks. The performance gains on stack copying come mostly from replacing callbacks with direct use of the unwinder. Updates #54466. Fixes #32383. Change-Id: I4970603b2861633eecec30545e852688bc7cc9a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/468301 Reviewed-by: Michael Pratt <[email protected]> Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://go.dev/cl/475960 mentions this issue: |
This is relatively easy using the new traceback iterator. Ancestor tracebacks are now limited to 50 frames. We could keep that at 100, but the fact that it used 100 before seemed arbitrary and unnecessary. Fixes #7181 Updates #54466 Change-Id: If693045881d84848f17e568df275a5105b6f1cb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/475960 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Currently, all stack walking logic is in one venerable, large, and very, very complicated function:
runtime.gentraceback
. This function has three distinct operating modes: printing, populating a PC buffer, or invoking a callback. And it has three different modes of unwinding: physical Go frames, inlined Go frames, and cgo frames. It also has several flags. All of this logic is very interwoven.I would like to replace all of this with a caller-driven iterator-style interface. This is a tracking issue for that change.
An iterator API will consolidate the logic for unwinding and allow us to lift out printing and pcbuf populating into separate code, while replacing the callback mode with direct use of the new API. It will allow us to better layer the different modes of unwinding by creating separate iterator types for physical, inlined, and cgo frames, while keeping the interface ergonomic. This is also a good opportunity to generally clean up this code.
As a follow-on, I plan to dramatically simplify the
defer
implementation. Regabi enabled many simplifications todefer
and we've implemented many of them already, but there are more aggressive simplifications we haven't tackled yet. Part of this is simplifying open-coded defers, and doing that efficiently requires being able to simultaneously walk the stack frames and the defer stack. An iterator API will make this much easier to do.An alternative approach would be to use a callback interface rather than an iterator. This would be an improvement over the status quo and also be a simpler change, but I think it has two drawbacks: 1. It makes layering physical/inlined frame unwinding more awkward because you need multiple levels of callbacks. 2. It's poorly suited to parallel iteration like we need for the open-coded defer implementation. Long term, an iterator API probably makes it simpler to scan a goroutine stack while the goroutine is still running (reducing per-goroutine latency for goroutines with large stacks) because we can easily pause and resume unwinding, while a callback API doesn't easily afford this opportunity.
The text was updated successfully, but these errors were encountered: