Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teach std.debug to convert addresses to ELF symbols #22077

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

leroycep
Copy link
Contributor

@leroycep leroycep commented Nov 25, 2024

When reading std.debug.Dwarf fails, std.debug.SelfInfo will now try to load function names from symtab.

See related issue #18520

Example

//! main.zig
const std = @import("std");

noinline fn foo(x: u32) u32 {
    return x * x;
}

noinline fn bar() u32 {
    return foo(std.math.maxInt(u32));
}

pub fn main() !void {
    std.debug.print("{}", .{bar()});
}
> zig build-exe main.zig
> objcopy --strip-debug main main_strip-debug
> ls main main_strip-debug
╭───┬──────────────────┬──────┬─────────┬──────────────╮
│ # │       name       │ type │  size   │   modified   │
├───┼──────────────────┼──────┼─────────┼──────────────┤
│ 0 │ main             │ file │ 2.4 MiB │ 13 hours ago │
│ 1 │ main_strip-debug │ file │ 1.0 MiB │ 13 hours ago │
╰───┴──────────────────┴──────┴─────────┴──────────────╯
> try { ./main }
thread 2762936 panic: integer overflow
/home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main.zig:4:14: 0x103ad8a in foo (main)
    return x * x;
             ^
/home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main.zig:8:15: 0x103923d in bar (main)
    return foo(std.math.maxInt(u32));
              ^
/home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main.zig:12:32: 0x103920c in main (main)
    std.debug.print("{}", .{bar()});
                               ^
/home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:617:37: 0x10391e1 in posixCallMainAndExit (main)
            const result = root.main() catch |err| {
                                    ^
/home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:248:5: 0x1038ddf in _start (main)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
> try { ./main_strip-debug }
thread 2763662 panic: integer overflow
???:?:?: 0x103ad8a in main.foo (???)
???:?:?: 0x103923d in main.bar (???)
???:?:?: 0x103920c in main.main (???)
???:?:?: 0x10391e1 in start.posixCallMainAndExit (???)
???:?:?: 0x1038ddf in _start (???)
???:?:?: 0x0 in ??? (???)

Further work

  • add debug format stack trace checks that check that symbol based stack traces are correct.
  • make stack traces work when .eh_frame is included but other Dwarf debuginfo is missing.
  • test symbol based stack traces in release optimize modes
  • [ ] Make std.debug.ElfSymTab read from .dynsym section as well? .dynsym is meant to be a subset of .symtab, for now debug.ElfSymTab assumes .symtab is intentionally left in.

These changes have been moved out of this branch and into their own fdebuginfo branch:

  • [x] change -fstrip to allow -fstrip=debuginfo, which would remove the DWARF debug info but retain the ELF symbol table
    • [x] make similar changes for the std.Build API.symtab, so I think this would fit better into an issue about graceful degradation of stack traces.
  • [x] remove -fstrip, -gdwarf32, -gdwarf64; replace with -fdebuginfo=none, -fdebuginfo=symbols, -fdebuginfo=dwarf32, etc.?
  • [ ] less noisy stack trace format? Perhaps based on std: add std.options.debug_stacktrace_kind #19650 ? makes more sense as separate PR based on std: add std.options.debug_stacktrace_kind #19650
  • [ ] check that -fstrip=debuginfo works on macos and windows? move into fdebuginfo PR once that exists
This work is licensed on the same terms as the Zig project.

Copyright © 2024 TigerBeetle, Inc.

This code was written under contract for TigerBeetle. As a work made for hire, authorship and copyright goes to TigerBeetle.

Author certificate

LeRoyce Pearson <[email protected]> [TigerBeetle, Inc.]

This work is licensed on the same terms as this project (Zig).

@leroycep leroycep force-pushed the elf-symbol-debuginfo branch 2 times, most recently from df2397a to 3a34582 Compare November 25, 2024 23:24
@xdBronch
Copy link
Contributor

../release/bin/zig build-exe main.zig -Doptimize=ReleaseSafe -fno-strip

-Doptimize is only used in the build system, what that command is doing is defining a C macro, you want to use -OReleaseSafe

@leroycep
Copy link
Contributor Author

Oh, good catch. Had to add -fno-omit-frame-pointer as well. Here's the updated commands:

~/code/zig/build-example-elfsymtab-backtrace> ../release/bin/zig build-exe main.zig -OReleaseSafe -fno-strip -fno-omit-frame-pointer  -6 11/25/2024 04:46:46 PM
~/code/zig/build-example-elfsymtab-backtrace> objcopy --strip-debug main main_strip-debug                                                11/25/2024 04:47:23 PM
~/code/zig/build-example-elfsymtab-backtrace> ./main                                                                                     11/25/2024 04:47:25 PM
thread 772168 panic: integer overflow
Unwind error at address `exe:0x101a803` (error.MissingFDE), trace may be incomplete

/home/geemili/code/zig/build-example-elfsymtab-backtrace/main.zig:4:14: 0x100be87 in foo (main)
    return x * x;
             ^
/home/geemili/code/zig/build-example-elfsymtab-backtrace/main.zig:8:15: 0x100bc98 in bar (main)
    return foo(std.math.maxInt(u32));
              ^
/home/geemili/code/zig/build-example-elfsymtab-backtrace/main.zig:12:32: 0x100bc88 in main (main)
    std.debug.print("{}", .{bar()});
                               ^
/home/geemili/code/zig/lib/std/start.zig:617:37: 0x100bbc2 in posixCallMainAndExit (main)
            const result = root.main() catch |err| {
                                    ^
/home/geemili/code/zig/lib/std/start.zig:248:5: 0x100b89d in _start (main)
    asm volatile (switch (native_arch) {
    ^
Error: nu::shell::core_dumped

  × External command core dumped
   ╭─[entry #148:1:1]
 1 │ ./main
   · ───┬──
   ·    ╰── core dumped with SIGABRT (6)
   ╰────
~/code/zig/build-example-elfsymtab-backtrace> ./main_strip-debug                                                                      -6 11/25/2024 04:47:28 PM
thread 772213 panic: integer overflow
Unwind information for `exe:0x101a803` was not available, trace may be incomplete

???:?:?: 0x100be87 in main.foo (???)
???:?:?: 0x100bc98 in main.bar (???)
???:?:?: 0x100bc88 in main.main (???)
???:?:?: 0x100bbc2 in start.posixCallMainAndExit (???)
???:?:?: 0x100b89d in _start (???)
Error: nu::shell::core_dumped

  × External command core dumped
   ╭─[entry #149:1:1]
 1 │ ./main_strip-debug
   · ─────────┬────────
   ·          ╰── core dumped with SIGABRT (6)
   ╰────

@xdBronch
Copy link
Contributor

not sure if this would cause too much duplicated code but thoughts on omitting those ???s when the symbol table is being used? its quite noisy imo

@leroycep
Copy link
Contributor Author

leroycep commented Nov 26, 2024

The minimum change would be here:

return .{
.name = symbol.name,
};

Setting the compile_unit_name to "" and source_location to .invalid :

zig/lib/std/debug.zig

Lines 44 to 48 in 3a34582

pub const Symbol = struct {
name: []const u8 = "???",
compile_unit_name: []const u8 = "???",
source_location: ?SourceLocation = null,
};

And then it would produce something like:

:0:0: 0x100b89d in _start ()

Further changes would require some more thought

@matklad
Copy link
Contributor

matklad commented Nov 26, 2024

Are there any existing tests that could be extended to cover the new behavior?

@leroycep leroycep force-pushed the elf-symbol-debuginfo branch from 3a34582 to 958bf92 Compare November 26, 2024 20:32
@leroycep
Copy link
Contributor Author

leroycep commented Nov 27, 2024

  • [ ] add tests to check for graceful degradation when debuginfo is missing
  • [ ] add some test that would catch the mistake I made which broke stack unwinding in release modes
  • [ ] make stack traces work when other Dwarf debuginfo is missing

On that last point, objcopy --strip-debug will leave in the eh_frame sections, but std.debug will not make use of them:

~/code/zig/build-example-elfsymtab-backtrace> try { ./main_strip-debug }
thread 98264 panic: integer overflow
Unwind information for `exe:0x101b443` was not available, trace may be incomplete

~/code/zig/build-example-elfsymtab-backtrace> readelf -S main_strip-debug | rg eh
  [ 2] .eh_frame_hdr     PROGBITS         0000000001007da4  00007da4
  [ 3] .eh_frame         PROGBITS         0000000001008448  00008448

Of course, that could be pushed off to another PR, as you can work around this by using -fno-omit-frame-pointer:

~/code/zig/build-example-elfsymtab-backtrace> ../release/bin/zig build-exe main.zig -OReleaseSafe -fno-strip -fno-omit-frame-pointer
~/code/zig/build-example-elfsymtab-backtrace> objcopy --strip-debug main main_strip-debug
~/code/zig/build-example-elfsymtab-backtrace> try { ./main_strip-debug }
thread 99117 panic: integer overflow
Unwind information for `exe:0x101a803` was not available, trace may be incomplete

???:?:?: 0x100be87 in main.foo (???)
???:?:?: 0x100bc98 in main.bar (???)
???:?:?: 0x100bc88 in main.main (???)
???:?:?: 0x100bbc2 in start.posixCallMainAndExit (???)
???:?:?: 0x100b89d in _start (???)

@leroycep
Copy link
Contributor Author

leroycep commented Dec 6, 2024

I think this pull request is ready for review. Current state:

  • Adds support for falling back to the ELF symbol table when generating stack traces
  • Adds a new type of test case, DebugFormatStackTrace, which is a modified version of the StackTrace test

Some caveats:

@leroycep leroycep force-pushed the elf-symbol-debuginfo branch 3 times, most recently from f03fead to d97c614 Compare December 8, 2024 21:24
@leroycep
Copy link
Contributor Author

leroycep commented Dec 8, 2024

I removed the commit that skipped testing on aarch64, as commit e62aac3 fixes it by making -fno-omit-framepointer the default

@leroycep
Copy link
Contributor Author

leroycep commented Dec 9, 2024

Now that CI has passed, my plan is to work on a draft PR for -fdebuginfo that builds on top of this PR while I'm waiting for a review.

@leroycep leroycep mentioned this pull request Dec 9, 2024
3 tasks
@leroycep leroycep force-pushed the elf-symbol-debuginfo branch from d97c614 to f497573 Compare December 16, 2024 20:17
@leroycep leroycep force-pushed the elf-symbol-debuginfo branch 2 times, most recently from 04a2793 to 7fee893 Compare January 22, 2025 06:38
When reading `std.debug.Dwarf ` fails, `std.debug.SelfInfo` will now try to load
function names from `symtab`.
This makes it possible for executables built with `-fstrip=debuginfo` to unwind
frames. No need to specify `-fno-omit-frame-pointer`! This is only possible
because the `eh_frame` and `eh_frame_hdr` sections are not stripped when
`-fstrip=debuginfo`.
@leroycep leroycep force-pushed the elf-symbol-debuginfo branch from 7fee893 to 2b6c55e Compare January 22, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants