-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement compact unwinding info #372
Conversation
I highly recommend pulling this PR and running |
Potentially important empirical observation: the conflicting info seems to come from specifically the
|
Pushed up that version for people to test out |
Thanks, @Gankra! I'll need some time to get to reviewing this in detail.
Apart from the Breakpad output, this would be well suited to be added to So far, we kept debugging functionality in symbolic, while trying to upstream file format support as much as possible. If for any reason it is not an option to upstream to
Going with my above argument around
I'll try to read up on why those differences might exist. If The rest is guesswork: It's extremely unlikely but not entirely impossible that the function prelude crashes a program. Maybe, compact_unwind is optimized for cases where the program needs to unwind, i.e. exception handling, but not hard crashes like stack overflows. |
I have two, possibly stupid, questions:
|
I believe __unwind_info is strictly compact, and the comment at the start of the file is discussing the DWARF opcode inside of compact unwinding, which is just "this is too complicated, use the DWARF" and a 24-bit pointer to the FDE in the __eh_frame that has that info. I don't think this happens much, so a first implementation can just ignore these.
I believe these are ~padding bytes in the binary which CFI carefully doesn't map, but Compact Unwinding incidentally maps because it doesn't care about having incorrect mappings for places that don't matter (the range covered by an opcode is only implicitly defined by where the next opcode starts, so there were necessarily be an entry for every address in the binary, although for our purposes there can still be holes because we don't emit
It basically just ignores support for the function prologue where callee-saved registers are being individually PUSHed to the stack. I believe @jan-auer is correct in describing the format as:
That said, in Firefox we're seeing that this is the only unwinding info for most of our x64 macos builds, so it's better than nothing. And it's still the case that most stack overflows will happen after the registers are saved (as the rest of the frame is usually bulk-allocated upfront). |
(force-push was just updating some of the comments with more details I've found) |
Thank you for the explanations! |
(more comment cleanup) |
Starting to look into what this would look like if defined in goblin but will need a while to acclimate to the structure of the library. Definitely wouldn't complain if this is something @jan-auer was willing to figure out for me, but will be proceeding on the assumption that I need to figure it out myself. |
force-push: tweaked the structure to still allow for spidering the structure in ARM64 even though we don't support decoding its opcodes, because the objdump-style "dump" operation doesn't need that. This also brings the structure closer to what it would be in its final form, properly stubbing out space for where the ARM impl would go (and properly marking all the x86-exclusive stuff as such). Also everything that should be an Err instead of a panic should be now. |
@Gankra this is awesome stuff! Just read the rustdocs and will try to decode some of these unwind entries by hand, which are not being handled by either symbolic master or your branch yet: (output from llvm-objdump -u)
I will continue reviewing this tomorrow as well :-) |
force-push:
|
@Swatinem could you elaborate on your comment? What isn't my branch handling? Edit: ah, you found an instance of "old" and "dwarf" |
Added the dwarf mode. Some inspection of the functions that are being mapped to "old" (0x00000000) mode suggests that it's basically "there is no info", as they seem to be hand written assembly routines (jsimd_rgb_ycc_convert_avx2, lucet_context_bootstrap, ...). |
Added tests |
I've added a section at the end of the documentation discussing all the corner cases that I was able to think of in the format, and how the implementation handles them. |
I've replaced the Vec for CompactCfiOps with a little adhoc ArrayVec impl to avoid allocating and avoid actually depending on ArrayVec. With this I regard the implementation no longer "WIP" and something that could land, modulo how you feel about the outstanding TODOs:
|
FWIW, having access to the full binary is also useful when doing stack scanning because you can check that potential return addresses are proceeded by call instructions. |
You're not doing stack scanning in the context of dumping the CFI into the breakpad format, which is the usecase here. It's not hard to get the binary (I don't think?), just an API change from what I have now (CompactUnwindInfoIter::new would take it as an (optional?) input). |
Ah right, this is the dumping not the unwinding. I was confused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! I have been looking through the docs and the implementation. I only skimmed the tests superficially since I’m super tired right now.
In general:
- The docs could use a few intra-doc links ;-)
- The iterator implementation is a bit hard to follow. How about we actually have a two-level iterator internally (not exposed to the outside), that would make it a bit easier to follow I think. If I understood correctly, the first level pages are marked by a sentinel, and the second level pages have a fixed size.
- Instead of having separate methods to get each of the pieces from the encoding, such as
x86_frameless_stack_size
and similar with their own shift and mask inside, I wish that would be inlined intox86_instructions
, together with a bit of ascii art to visually show what the exact pieces are that you are extracting via shift&mask. Maybe a bit along these lines: https://github.com/getsentry/sentry-native/blob/3fdaa5d5eeb306fa2d13ca66ec25e656960d19b8/src/sentry_value.c#L38-L45 - Please avoid using
unimplemented
,unreachable
or similar panics. I vaguely remember that we actually hit one in production at some time. Rather just return an error or comment the function out.
In general I would be happy to land this rather soon, maybe even as crate-private allow(dead_code)
. That way we can improve this in incremental PRs. I also haven’t really reviewed this for its API surface, regrading what we want to expose, what needs to be #[non_exhaustive]
, etc. Would like to get another opinion from the team on this. Anything that would make this easier to review incrementally would be nice.
I will be looking through your goblin PR soon, though I have no authority on that repo :-D
Realized ARM64 opcodes were really simple, so I added support for them (not tested on real-world data, don't have an M1 to test against).
Spent so long without them I never think to add them now that they exist. Will do.
I removed all of them but one which is basically a ward against someone using the internal pointer_size function with an unknown architecture (currently can't happen). Most notably I have now fully commented out the theoretical API entry point and marked its example as
While I'm usually a big proponent of megafunctions inlining everything they can, in this case I found it a bit too distracting, preferring instead to keep *_instructions mostly focused on the higher-level semantics. Also a few of these do genuinely get some reuse.
Hmm, I'll think about this. This code is a bit finicky because of the requirement that we need to peek at the next item and handle the sentinel as if it were an item in a peek but never fully return it. |
I have been looking at the CFI snapshot changes. clippy is being quite pedantic though :-D |
I fixed up the clippy complaints, supressing the one complaining about I added intradoc links, an also fixed up some old gunk in the docs I missed. glandium was gracious enough to run this on an ARM64 macos XUL and the output seems good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put impl blocks next to their types.
//! | ||
//! Similarly, the way ranges of instructions are mapped means that Compact | ||
//! Unwinding will generally incorrectly map the padding bytes between functions | ||
//! (attributing them to the previous function), while DWARF CFI tends to more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a "more" too many here
//! There are two high-level concepts in this format that enable significant | ||
//! compression of the tables: | ||
//! | ||
//! 1. Eliding duplicate function offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Function offsets" should be "instruction addresses" per the earlier terminology explanation.
//! | ||
//! Trick 2 is more novel: At the first level a global palette of up to 127 opcodes | ||
//! is defined. Each second-level "compressed" (leaf) page can also define up to 128 local | ||
//! opcodes. Then the entries mapping function offsets to opcodes can use 8-bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
//! | ||
//! (Currently Unimplemented) | ||
//! | ||
//! Stack-Indirect is exactly the same situation as Stack-Immediate, but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one "the" too many here.
//! It will never violate memory safety but it may start yielding chaotic | ||
//! values. | ||
//! | ||
//! If this implementation ever panics, that should be regarded as an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate "an"
//! | ||
//! * A corrupt unwind_info section may have its entries out of order. Since | ||
//! the next entry's instruction_address is always needed to compute the | ||
//! number of bytes the current entry cover, the implementation will report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "covers"
Other, | ||
} | ||
|
||
/// An iterator over the CompactUnwindInfoEntry's of a `.unwind_info` section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an intra-doc link.
/// The instructions can be described with simple CFI operations. | ||
CfiOps(CompactCfiOpIter), | ||
/// Instructions can't be encoded by Compact Unwinding, but an FDE | ||
/// with real DWARF CFI instructions are stored in the eh_frame section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are" should be "is"
review addressed, thanks! |
So the style/lint issues seem to be pre-existing issues flaring up because i happened to touch the file, and I'd rather not make unrelated changes here. The python tests failing appear to be needing a regeneration on the cache file used by test_macos_cficache, but I can't find the way to regenerate it. |
I have updated the snapshots and merged master here: https://github.com/getsentry/symbolic/tree/Gankra-compact Otherwise I have been trying to testdrive your branch looking at some cases that we currently don’t handle well. More specifically, we are failing to correctly unwind through
I still haven’t figured out why that happens though |
Assuming this is x86/x64, the opcode decodes to:
Is there an eh_frame entry there? It's possible it's running into the limits of your dwarf decoding. |
This was written by @Swatinem
I will check this, thanks for looking this up for me! |
(tests broke because the drive-by patch removed trailing whitespace, fixed) |
This is a WIP implementation of #368. It appears to generate reasonable-looking
STACK CFI
records, though I haven't tested them yet.A few major questions that need feedback from the symbolic folks:
STACK CFI
records between the DWARF CFI sections and the compact unwinding sections? This is not theoretical -- this happens on e.g. libmozglue in release firefox distributions.The conflicting records has been kind of useful for being able to see that my output does indeed seem reasonable, as it largely agrees with the DWARF CFI-based
STACK CFI
records. Although the compact unwind info entries seem to ignore properly adjusting .cfa during the function prelude, where registers are being saved. e.g. DWARF CFI produces this:While compact unwind info produces this (note how the final STACK CFI records of the above agree with the below):
This is a bit funky, but also kinda reasonable since these preludes necessarily shouldn't show up in a backtrace (except for maybe in the top frame, but then that frame is probably not very interesting to someone debugging a crash).