Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatic way to capture backtraces for RiscV (IDFGH-6189) #7866

Closed
ivmarkov opened this issue Nov 10, 2021 · 4 comments
Closed

Programmatic way to capture backtraces for RiscV (IDFGH-6189) #7866

ivmarkov opened this issue Nov 10, 2021 · 4 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Feature Request Feature request for IDF

Comments

@ivmarkov
Copy link
Contributor

(This request is primarily driven by Rust, but it might be useful in the C world as well.)

The ability to capture backtraces programmatically - outside of panic situations - has benefits in the Rust world. For one, this allows Rust's error handling to produce errors which are enriched with a backtrace information which is captured at the time when the Rust error is created.

The status quo regarding programmatic backtrace capturing in the ESP-IDF is that it is asymmetric in that there is an API here, but it is only available for Xtensa and is not (yet?) ported to RiscV.

So the first request would be:

(1) Single, unified API that allows to capture backtrace information for both Xtensa and RiscV.

There are complications with RiscV of course, in that - in the absence of FP - capturing backtraces on that platform is only possible when the eh_frame functionality is enabled, which increases the size of the final executable. Yet we believe that programmatic backtrace capturing would be useful even with that restriction in mind. (When eh_frame information is not complied into the firmware however, calling the backtrace API should return an error or an empty backtrace, and not crash the program. See also below the treatment of raw stack memory for having another - always available - option for programmatic backtraces for RiscV.)

RiscV panicking behavior in the ESP-IDF is interesting, in that - in the absence of eh_frame information - it outputs raw stack memory instead (1KB or more), which is then decoded into meaningful function names by the monitor, using GDB.

Now, I would argue that - in the absence of eh_frame information - outputting raw stack memory is actually an acceptable behavior even for the programmatic backtracing API, even if it sounds weird at a first glance. Why? Because even the "real" eh_frame (or FP-based - in the case of Xtensa) backtracing info does require decoding by a specialized code in the monitor, and does require access to the ELF executable by the monitor as the "real" backtracing info is just a sequence of raw IPs without any symbolic information (for obvious reasons). Where I'm getting at, is that - from the POV of the programmer - the "real" raw IP-based backtracing info is just as unergonomic as the "stack memory dump" backtracing information in that it requires a specialized monitor (as opposed to Linux's screen tty) to re-symbolize the stack call chain into something human readable.

Which brings the next request:

(2) Provide an option for the backtracing API to fallback to returning raw stack memory. Perhaps for Xtensa too (why not?)

(When I say "fallback" I don't really mean output on the console. I mean the API should return a reference to the stacktrace memory to the calling code - somehow. It might even be sneaked in the current API contract somehow. For our "ideal API contract (libunwind) - see below.)

This option can either be controlled at runtime, or at compile time, possibly with configuration (CONFIG_*) settings. E.g.:

  • Option 1: Return IP-based backtracing (will only work on RiscV when eh_frame is enabled)
  • Option 2: Return stack memory "backtrace"

The final topic is that calling this new/extended ESP-IDF backtracing API has to be upstreamed in Rust's backtrace-rs crate, which is also used in Rust's STD library.

Our experience so far is that if your platform is not one of the 4 major ones (win/lin/mac/wasm), upstreaming is easier if the changes are minimal.

The current code in backtrace-rs which is used for unix-like platforms (where ESP-IDF actually belongs!) that captures backtraces relies on API calls to the libunwind functionality (_Unwind_Backtrace, _Unwind_GetIP, _Unwind_FindEnclosingFunction and _Unwind_GetCFA) which is part of the GCC toolchain (and I think also part of the LLVM toolchain).

Now, to my shock and entertainment, these functions are available in ESP-IDF, and - up to version 4.3.1 inclusive - these used to work just fine on Xtensa and used to produce backtraces, even when ESP-IDF was NOT compiled with C++ exceptions enabled (the default)! (But these functions still did crash for RiscV. Always. Even with eh_frame enabled and even with C++ exception support enabled. ?!)

So in a way, for Xtensa and ESP-IDF <= 4.3.1, we did not have to "upstream" anything. It just works. :)

The situation is not so rosy since ESP-IDF 4.4+. Due to some issue related to final binary code size that I can't find right now, these functions are in later releases stubbed out in the ESP-IDF with custom implementations when C++ exceptions are not enabled (the default situation) and calling those stubs leads to a panic. Not even to returning an empty backtrace (which would've been a bit more tolerable).

Which brings the 3rd topic:

(3) To make upstreaming in Rust easier, the programmatic backtrace generation should (also) work via the above 4 __Unwind_* API calls

Ideally, the 4 __Unwind_* API calls:

  • Should be capable of always generating an IP-based backtrace for Xtensa, with or without C++ exceptions being enabled
  • Should be capable of generating an IP-based backtrace for RiscV, in case C++ exceptions are enabled OR in case eh_frame support is compiled into the firmware
  • (I know I'm pushing it!) Should be capable of returning raw stack memory "masqueraded" as IP-based backtrace in case RiscV has C++ exceptions disabled and no eh_frame info, OR if the user has explicitly requested raw stack memory via a CONFIG_* setting - for either Xtensa or RiscV.

Well, that's it. Sorry, this ended up like a "mini RFC" of sorts.

I'm available for questions, comments and experiments.

@ivmarkov ivmarkov added the Type: Feature Request Feature request for IDF label Nov 10, 2021
@ivmarkov
Copy link
Contributor Author

@igrr ^^^

@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 10, 2021
@github-actions github-actions bot changed the title Programmatic way to capture backtraces for RiscV Programmatic way to capture backtraces for RiscV (IDFGH-6189) Nov 10, 2021
@ivmarkov
Copy link
Contributor Author

@MabezDev ^^^. Sorry for pinging. In case you spot an incorrect statement reg. the Rust side of things, pls let me know.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels May 18, 2022
@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Nov 9, 2022
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels May 16, 2023
@ivmarkov
Copy link
Contributor Author

@o-marshmallow Unlike xtensa where we always have frame pointers, for riscv we need to enable CONFIG_ESP_SYSTEM_USE_EH_FRAME, to get backtraces, right?

@o-marshmallow
Copy link
Collaborator

Hello @ivmarkov ,

Indeed, for RISC-V targets you will need to enable CONFIG_ESP_SYSTEM_USE_EH_FRAME as it will let the compiler include the DWARF symbols inside the final binary, which are required to unwind backtraces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Feature Request Feature request for IDF
Projects
None yet
Development

No branches or pull requests

3 participants