-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229
Conversation
@llvm/pr-subscribers-debuginfo Author: Alexander Yermolovich (ayermolo) ChangesThis stemps from conversatin in: #77457 (comment). Full diff: https://github.com/llvm/llvm-project/pull/80229.diff 3 Files Affected:
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index a427dd604ade7..1a51c2354dc29 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -847,8 +847,15 @@ void DWARFDebugNames::NameIndex::dumpForeignTUs(ScopedPrinter &W) const {
void DWARFDebugNames::NameIndex::dumpAbbreviations(ScopedPrinter &W) const {
ListScope AbbrevsScope(W, "Abbreviations");
- for (const auto &Abbr : Abbrevs)
- Abbr.dump(W);
+ std::vector<const Abbrev *> AbbrevsVect;
+ for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs)
+ AbbrevsVect.push_back(&Abbr);
+ std::sort(AbbrevsVect.begin(), AbbrevsVect.end(),
+ [](const Abbrev *LHS, const Abbrev *RHS) {
+ return LHS->Code < RHS->Code;
+ });
+ for (const llvm::DWARFDebugNames::Abbrev *Abbr : AbbrevsVect)
+ Abbr->dump(W);
}
void DWARFDebugNames::NameIndex::dumpBucket(ScopedPrinter &W,
diff --git a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
index c15e2ad1d56b0..62ab8de44f0a2 100644
--- a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
@@ -40,13 +40,13 @@
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
-; CHECK-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT: Tag: DW_TAG_variable
+; CHECK-NEXT: Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
+; CHECK-NEXT: Tag: DW_TAG_subprogram
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
-; CHECK-NEXT: Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
-; CHECK-NEXT: Tag: DW_TAG_subprogram
+; CHECK-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-NEXT: Tag: DW_TAG_variable
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
diff --git a/llvm/test/DebugInfo/X86/debug-names-types.ll b/llvm/test/DebugInfo/X86/debug-names-types.ll
index f41bb5524b9c3..ed32e56fa71b6 100644
--- a/llvm/test/DebugInfo/X86/debug-names-types.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-types.ll
@@ -37,14 +37,13 @@
; CHECK-NEXT: LocalTU[0]: 0x00000000
; CHECK-NEXT: ]
; CHECK: Abbreviations [
-; CHECK-NEXT: Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
; CHECK-NEXT: Tag: DW_TAG_structure_type
-; CHECK-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
-; CHECK-NEXT: Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-NEXT: Tag: DW_TAG_base_type
+; CHECK-NEXT: Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT: Tag: DW_TAG_structure_type
; CHECK-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
@@ -54,8 +53,9 @@
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
-; CHECK-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT: Tag: DW_TAG_structure_type
+; CHECK-NEXT: Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-NEXT: Tag: DW_TAG_base_type
+; CHECK-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-NEXT: }
@@ -140,14 +140,13 @@
; CHECK-SPLIT-NEXT: ForeignTU[0]: 0x675d23e4f33235f2
; CHECK-SPLIT-NEXT: ]
; CHECK-SPLIT-NEXT: Abbreviations [
-; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV:0x[0-9a-f]*]] {
; CHECK-SPLIT-NEXT: Tag: DW_TAG_structure_type
-; CHECK-SPLIT-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-SPLIT-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-SPLIT-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-SPLIT-NEXT: }
-; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT: Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT: Tag: DW_TAG_structure_type
; CHECK-SPLIT-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-SPLIT-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-SPLIT-NEXT: DW_IDX_parent: DW_FORM_flag_present
@@ -157,8 +156,9 @@
; CHECK-SPLIT-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-SPLIT-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-SPLIT-NEXT: }
-; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT: Tag: DW_TAG_structure_type
+; CHECK-SPLIT-NEXT: Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT: Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT: DW_IDX_type_unit: DW_FORM_data1
; CHECK-SPLIT-NEXT: DW_IDX_die_offset: DW_FORM_ref4
; CHECK-SPLIT-NEXT: DW_IDX_parent: DW_FORM_flag_present
; CHECK-SPLIT-NEXT: }
|
Alternative is to keep a vector as abbrevs read in. I looked into SetVector but doesn't look like it has find api. Used in NameIndex::getEntry. Also seems a tad wasteful since it only matters for printing out. |
This stemps from conversatin in: llvm#77457 (comment). Right now Abbrev code for abbrev is combination of DIE TAG and other attributes. In the future it will be changed to be an index. Since DenseSet does not preserve an order, added a sort based on abbrev code. Once change to index is made, it will print out abbrevs in the order they are stored.
219fce8
to
da9e927
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we dump things we should dump them in the order in which they appear in the .debug_xxx
section. Here we are sorting and dumping them in some order. I realize eventually the abbrevs will have an index and that will make the order more natural, but when dumping the contents of the section, we should do it in the order in which they are defined in the file at all times since we are wanting to see the contents of the section we asked to dump. There aren't usually that many abbreviation combinations right? So maybe the dump function can parse the info again and dump each entry it finds in the right order and throw away the temp llvm::DWARFDebugNames::Abbrev
item as it dumps each one? Or we can maintain an offset in each llvm::DWARFDebugNames::Abbrev
entry and then sort by that when dumping?
I agree we should print in the same order as it's in section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick things in the inline comments and this will be good to go.
How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good) |
The abbrevs currently makes a DWARFAbbreviationDeclarationSet which contains a
So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases. The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
std::vector<const Abbrev *> AbbrevsVect; | ||
for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs) | ||
AbbrevsVect.push_back(&Abbr); | ||
std::sort(AbbrevsVect.begin(), AbbrevsVect.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never use std::sort
, instead using the range-based (with some shuffling magic for expensive asserts) sort from stl extras:
sort(AbbrevsVect, [] (...){});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation that only applies if we have equal elements. Which shouldn't be the case in this case. Abbrev offsets are monotonically increasing.
Can we revisit this after llvm switches to indices? |
I am fine with taking care of switching to a more DebugAbbrev like solution later. |
I'm not sure I understand - what do you mean by "the abbrev codes are not indexed"? Because of LLVM's current output that uses the weird bit fiddling? I'd consider that a bug/suboptimality, and I'd be fine with llvm-dwarfdump devolving to a linear search through abbrevs in the case where the abbrevs are not monotonically increasing. |
The abbreviation codes are not 1 based indexes like they are for .debug_abbrev:
I was mostly commenting that .debug_abbrev code lookup is O(1) where if we don't use indexes for the abbrev codes we must do some sort of search and if this happens in .debug_abbrev the search is linear. And of course any code that tries to find abbrevs should work (linear or direct access) depending on what the input is. The main point is it is much more efficient to use 1 based indexes since zero is reserved so that we can do effecient lookups. Most of what I was pointing out was around what .debug_abbrev does right and how it is different from what we are doing for .debug_names abbrev codes. |
) This stemps from conversatin in: llvm#77457 (comment). Right now Abbrev code for abbrev is combination of DIE TAG and other attributes. In the future it will be changed to be an index. Since DenseSet does not preserve an order, added a sort based on abbrev code. Once change to index is made, it will print out abbrevs in the order they are stored.
Fair - thanks for helping me understand what we're doing today (on the LLVM emission side - we aren't emitting monotonically increasing debug_names abbrevation numbers, compared to debug_abbrev where we do use monotonically increasing abbrev numbers). I think that's a mistake in LLVM's emission code, and not one we should worry about when designing llvm-dwarfdump. Both sides (LLVM DWARF emission, and llvm-dwarfdump parsing) should be fixed (to behave similarly to - but I don't think we need to wait for the emission to get better/fixed before we make llvm-dwarfdump handle things well) I'd prefer to see the llvm-dwarfdump code not use a hash map, ideally reuse a generalized form of the debug_abbrev handling code - that's efficient for monotonically increasing abbrev numbers, and falls back to a linear search otherwise - and the printing should print from that list that's stored in the same order it's read from the input - and that list can be directly indexed if it's monotonically increasing, or linearly searched if it's not. |
OK, put up a PR for .debug_names in BOLT (no parent index support for now). Let me circle back to this, and change implementation to be sequential on LLVM side. |
Based on the discussion in llvm#80229 changed implementation to align with how .debug_abbrev is handled. So that .debug_names abbrev tag is a monotonically increasing index. This allows for tools like LLDB to access it in constant time.
Based on the discussion in llvm#80229 changed implementation to align with how .debug_abbrev is handled. So that .debug_names abbrev tag is a monotonically increasing index. This allows for tools like LLDB to access it in constant time.
Based on the discussion in #80229 changed implementation to align with how .debug_abbrev is handled. So that .debug_names abbrev tag is a monotonically increasing index. This allows for tools like LLDB to access it in constant time using array like data structure. clang-19 debug build before change [41] .debug_names PROGBITS 0000000000000000 8f9e0350 137fdbe0 00 0 0 4 after change [41] .debug_names PROGBITS 0000000000000000 8f9e0350 125bfde 00 0 0 4 Reduction ~19.1MB
This stemps from conversatin in: #77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.