diff --git a/stack-traces.md b/stack-traces.md index 1021fee..e07d28a 100644 --- a/stack-traces.md +++ b/stack-traces.md @@ -133,9 +133,9 @@ Despite the apparent simplicity, frame pointer unwinding is no panacea. Frame po ### .gopclntab -Despite frame pointers being available on 64bit platforms, Go is not leveraging them for unwinding ([this might change](https://github.com/golang/go/issues/16638)). Instead Go ships with its own idiosyncratic unwinding tables that are embedded in the `.gopclntab` section of any Go binary. `.gopclntab` stands for "go program counter line table", but this is a bit of a misnomer as it contains various tables and meta data required for unwinding and symbolization. For unwinding, the general idea is to embed a table that maps every program counter (`pc`) to the current distance (delta) of the stack pointer (`rsp`) from the nearest `return address (pc)` above it. The initially lookup uses the `pc` from the `rip` instruction pointer register and then uses the `return address (pc)` for the next lookup and so on. +Despite frame pointers being available on 64bit platforms, Go is not leveraging them for unwinding ([this might change](https://github.com/golang/go/issues/16638)). Instead Go ships with its own idiosyncratic unwinding tables that are embedded in the `.gopclntab` section of any Go binary. `.gopclntab` stands for "go program counter line table", but this is a bit of a misnomer as it contains various tables and meta data required for unwinding and symbolization. For unwinding, the general idea is to embed a table that maps every program counter (`pc`) to the current distance (delta) of the stack pointer (`rsp`) from the nearest `return address (pc)` above it. The initial lookup uses the `pc` from the `rip` instruction pointer register and then uses the `return address (pc)` for the next lookup and so on. -Russ Cox initially described some of the involved data structured in his [Go 1.2 Runtime Symbol Information](https://golang.org/s/go12symtab) document, but it's very outdated by now and it's probably better to look at the current implementation directly. The relevant files are [traceback.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/traceback.go) and [symtab.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go), so let's dive in. +Russ Cox initially described some of the involved data structures in his [Go 1.2 Runtime Symbol Information](https://golang.org/s/go12symtab) document, but it's very outdated by now and it's probably better to look at the current implementation directly. The relevant files are [traceback.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/traceback.go) and [symtab.go](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go), so let's dive in. There are various use cases for stack traces in Go, but they all end up hitting the [`gentraceback()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/traceback.go#L76-L86) function. If the caller is e.g. `runtime.Callers()` the function only needs to do unwinding, but e.g. `panic()` wants text output, which requires symbolization as well. Additionally the code has to deal with the difference between [link register architectures](https://en.wikipedia.org/wiki/Link_register) such as ARM that work a little different from x86. This combination of unwinding, symbolization, support for different architectures and bespoke data structures might just be a regular day in the shop for the system developers on the Go team, but it's definitely been tricky for me, so please watch out for potential inaccuracies in my description below. @@ -143,11 +143,13 @@ Each frame lookup begins with the current `pc` which is passed to [`findfunc()`] The [_func](https://github.com/golang/go/blob/9baddd3f21230c55f0ad2a10f5f20579dcf0a0bb/src/runtime/runtime2.go#L825) meta data that we just retrieved contains a `pcsp` offset into the `pctab` table that maps program counters to stack pointer deltas. To decode this information, we call [`funcspdelta()`](https://github.com/golang/go/blob/go1.16.3/src/runtime/symtab.go#L903) which does a `O(N)` linear search over all program counters of the function until it finds the (`pc`, `sp delta`) pair were are looking for. For stacks with recursive call cycles, a tiny program counter cache is used to avoids doing lots of duplicated work. -Now that that we have the stack pointer delta, we can use it to locate the next `return address (pc)` value of the caller and do the same lookup for it as before until we reach the "bottom" of the stack. +Now that that we have the stack pointer delta, we can use it to locate the next `return address (pc)` value of the caller and do the same lookup for it. This continues until we reach the "bottom" of the stack. -So for non-recursive call stacks, the complexity for `gopclntab` unwinding is `O(N*M)` where `N` is the number of frames on the stack, and `M` is the average size of the generated machine code per function. I was able verify the impact of both factors [experimentally](https://github.com/DataDog/go-profiler-notes/tree/main/examples/stack-unwind-overhead), but in reality I'd assume both `N` and `M` to be fairly similar for most non-trivial Go applications. That being said, naive frame pointer unwinding appears to be [50x faster](https://github.com/felixge/gounwind) so high-resolution profiling and tracing use cases would certainly benefit from seeing [support for it](https://github.com/golang/go/issues/16638). +For non-recursive call stacks, the complexity for `gopclntab` unwinding is `O(N*M)` where `N` is the number of frames on the stack, and `M` is the average size of the generated machine code per function. I was able verify the impact of both factors [experimentally](https://github.com/DataDog/go-profiler-notes/tree/main/examples/stack-unwind-overhead), but in reality I'd assume both `N` and `M` to be fairly similar for most non-trivial Go applications. That being said, naive frame pointer unwinding appears to be [50x faster](https://github.com/felixge/gounwind) so high-resolution profiling and tracing use cases would certainly benefit from seeing [support for it](https://github.com/golang/go/issues/16638). -One thing that I found surprising is that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is also the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that implements largely the same code and seems to be used by the linker, `go tool addr2line` and more. If you want, you can use it in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) to go on your [gopclntab adventures](./examples/pclnttab). +TODO: Write about inlined functions. + +One thing that I found surprising is that Go ships with two `.gopclntab` implementations. In addition to the one I've just described, there is also the [debug/gosym](https://golang.org/pkg/debug/gosym/) package that implements largely the same code and seems to be used by the linker, `go tool addr2line` and more. If you want, you can use it yourself in combination with [debug/elf](./examples/pclnttab/linux.go) or ([debug/macho](./examples/pclnttab/darwin.go)) to go on your [gopclntab adventures](./examples/pclnttab). ### DWARF