-
Notifications
You must be signed in to change notification settings - Fork 209
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,13 +141,13 @@ There are various use cases for stack traces in Go, but they all end up hitting | |
|
||
Each frame lookup begins with the current `pc` which is passed to [`findfunc()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) which looks up the meta data for the function that contains the `pc`. Historically this was done using `O(log N)` binary search, but [nowadays](https://go-review.googlesource.com/c/go/+/2097/) there is a hash-map-like index of [`findfuncbucket`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) structs that usually directly guides us to the right entry using an `O(1)` algorithm. So at this point the overall complexity is still the same as frame pointer unwinding, but it's worth noting that the constant overheads are already significantly higher. | ||
|
||
The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoids doing lots of duplicated work. | ||
The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoid doing lots of duplicated work. | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
felixge
Author
Member
|
||
|
||
Now that that we have the stack pointer delta, we we are almost ready to locate the next `return address (pc)` value of the caller and do the same lookup for it until we reach the "bottom" of the stack. But before that, we need to check if the current `pc` is part of one or more inlined function calls. This is done by checking the `_FUNCDATA_InlTree` data for the current `_func` and doing another linear search over the (`pc`, `inline index`) pairs that exist. Any inlined call found this way gets its own virtual stack frame `pc` added to the list. Then we continue with `return address (pc)` as mentioned in the beginning of the paragraph. | ||
Now that that we have the stack pointer delta, we we are almost ready to locate the next `return address (pc)` value of the caller and do the same lookup for it until we reach the "bottom" of the stack. But before that, we need to check if the current `pc` is part of one or more inlined function calls. This is done by checking the `_FUNCDATA_InlTree` data for the current `_func` and doing another linear search over the (`pc`, `inline index`) pairs in that table. Any inlined call found this way gets virtual stack frame `pc` added to the list. Then we continue with `return address (pc)` as mentioned in the beginning of the paragraph. | ||
|
||
Putting it all together, for non-recursive call stacks without inlining, the complexity for `gopclntab` unwinding is `O(N*M)` where `N` is the number of frames on the stack, and `M` is the average size of the generated machine code per function. This can be validated [experimentally](https://github.com/DataDog/go-profiler-notes/tree/main/examples/stack-unwind-overhead), but in the real world I'd expect the average `N` and `M` to be fairly similar for most non-trivial Go applications, so unwinding a stack (without symbolization) will generally cost `1-10µs`. That being said, naive frame pointer unwinding appears to be [50x faster](https://github.com/felixge/gounwind), and does less cache thrashing, so high-resolution profiling and tracing use cases would likely benefit from seeing [support for it in the core](https://github.com/golang/go/issues/16638). | ||
|
||
Last but not least, it's worth noting that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is also the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that implements some of the same code as `runtime/symtab.go` and seems to be used by the linker, `go tool addr2line` and others. If you want, you can use it yourself in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) to go on your [gopclntab adventures](./examples/pclnttab). | ||
Last but not least, it's worth noting that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is another one in the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that seems to be used by the linker, `go tool addr2line` and others. If you want, you can use it yourself in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) as a starting point for your own [gopclntab adventures](./examples/pclnttab) for good or [evil](https://tuanlinh.gitbook.io/ctf/golang-function-name-obfuscation-how-to-fool-analysis-tools). | ||
|
||
### DWARF | ||
|
||
|
This is accurate, though I'll note that
pcvalue
maintains a small cache for repeated lookups (TBH, I don't know what the hit rate is like). Also, AIUI the table doesn't contain an entry for every PC, just one entry each time the delta changes (so we look for an entry with targetpc < pc. This is still O(n), but there aren't quite so many entries as this make it sound like.