diff --git a/stack-traces.md b/stack-traces.md index b5297e0..ba9bd93 100644 --- a/stack-traces.md +++ b/stack-traces.md @@ -141,13 +141,13 @@ There are various use cases for stack traces in Go, but they all end up hitting Each frame lookup begins with the current `pc` which is passed to [`findfunc()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) which looks up the meta data for the function that contains the `pc`. Historically this was done using `O(log N)` binary search, but [nowadays](https://go-review.googlesource.com/c/go/+/2097/) there is a hash-map-like index of [`findfuncbucket`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L671) structs that usually directly guides us to the right entry using an `O(1)` algorithm. So at this point the overall complexity is still the same as frame pointer unwinding, but it's worth noting that the constant overheads are already significantly higher. -The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoids doing lots of duplicated work. +The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoid doing lots of duplicated work. -Now that that we have the stack pointer delta, we we are almost ready to locate the next `return address (pc)` value of the caller and do the same lookup for it until we reach the "bottom" of the stack. But before that, we need to check if the current `pc` is part of one or more inlined function calls. This is done by checking the `_FUNCDATA_InlTree` data for the current `_func` and doing another linear search over the (`pc`, `inline index`) pairs that exist. Any inlined call found this way gets its own virtual stack frame `pc` added to the list. Then we continue with `return address (pc)` as mentioned in the beginning of the paragraph. +Now that that we have the stack pointer delta, we we are almost ready to locate the next `return address (pc)` value of the caller and do the same lookup for it until we reach the "bottom" of the stack. But before that, we need to check if the current `pc` is part of one or more inlined function calls. This is done by checking the `_FUNCDATA_InlTree` data for the current `_func` and doing another linear search over the (`pc`, `inline index`) pairs in that table. Any inlined call found this way gets virtual stack frame `pc` added to the list. Then we continue with `return address (pc)` as mentioned in the beginning of the paragraph. Putting it all together, for non-recursive call stacks without inlining, the complexity for `gopclntab` unwinding is `O(N*M)` where `N` is the number of frames on the stack, and `M` is the average size of the generated machine code per function. This can be validated [experimentally](https://github.com/DataDog/go-profiler-notes/tree/main/examples/stack-unwind-overhead), but in the real world I'd expect the average `N` and `M` to be fairly similar for most non-trivial Go applications, so unwinding a stack (without symbolization) will generally cost `1-10µs`. That being said, naive frame pointer unwinding appears to be [50x faster](https://github.com/felixge/gounwind), and does less cache thrashing, so high-resolution profiling and tracing use cases would likely benefit from seeing [support for it in the core](https://github.com/golang/go/issues/16638). -Last but not least, it's worth noting that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is also the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that implements some of the same code as `runtime/symtab.go` and seems to be used by the linker, `go tool addr2line` and others. If you want, you can use it yourself in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) to go on your [gopclntab adventures](./examples/pclnttab). +Last but not least, it's worth noting that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is another one in the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that seems to be used by the linker, `go tool addr2line` and others. If you want, you can use it yourself in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) as a starting point for your own [gopclntab adventures](./examples/pclnttab) for good or [evil](https://tuanlinh.gitbook.io/ctf/golang-function-name-obfuscation-how-to-fool-analysis-tools). ### DWARF