Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

Merged
merged 6 commits into from
Feb 2, 2024

Conversation

ayermolo
Copy link
Contributor

@ayermolo ayermolo commented Feb 1, 2024

This stemps from conversatin in: #77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.

@llvmbot
Copy link

llvmbot commented Feb 1, 2024

@llvm/pr-subscribers-debuginfo

Author: Alexander Yermolovich (ayermolo)

Changes

This stemps from conversatin in: #77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.


Full diff: https://github.com/llvm/llvm-project/pull/80229.diff

3 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp (+9-2)
  • (modified) llvm/test/DebugInfo/X86/debug-names-dwarf64.ll (+4-4)
  • (modified) llvm/test/DebugInfo/X86/debug-names-types.ll (+12-12)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index a427dd604ade7..1a51c2354dc29 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -847,8 +847,15 @@ void DWARFDebugNames::NameIndex::dumpForeignTUs(ScopedPrinter &W) const {
 
 void DWARFDebugNames::NameIndex::dumpAbbreviations(ScopedPrinter &W) const {
   ListScope AbbrevsScope(W, "Abbreviations");
-  for (const auto &Abbr : Abbrevs)
-    Abbr.dump(W);
+  std::vector<const Abbrev *> AbbrevsVect;
+  for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs)
+    AbbrevsVect.push_back(&Abbr);
+  std::sort(AbbrevsVect.begin(), AbbrevsVect.end(),
+            [](const Abbrev *LHS, const Abbrev *RHS) {
+              return LHS->Code < RHS->Code;
+            });
+  for (const llvm::DWARFDebugNames::Abbrev *Abbr : AbbrevsVect)
+    Abbr->dump(W);
 }
 
 void DWARFDebugNames::NameIndex::dumpBucket(ScopedPrinter &W,
diff --git a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
index c15e2ad1d56b0..62ab8de44f0a2 100644
--- a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
@@ -40,13 +40,13 @@
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_variable
+; CHECK-NEXT:     Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_subprogram
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_subprogram
+; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_variable
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
diff --git a/llvm/test/DebugInfo/X86/debug-names-types.ll b/llvm/test/DebugInfo/X86/debug-names-types.ll
index f41bb5524b9c3..ed32e56fa71b6 100644
--- a/llvm/test/DebugInfo/X86/debug-names-types.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-types.ll
@@ -37,14 +37,13 @@
 ; CHECK-NEXT:        LocalTU[0]: 0x00000000
 ; CHECK-NEXT:      ]
 ; CHECK:        Abbreviations [
-; CHECK-NEXT:     Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
 ; CHECK-NEXT:       Tag: DW_TAG_structure_type
-; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_base_type
+; CHECK-NEXT:     Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_structure_type
 ; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
@@ -54,8 +53,9 @@
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_structure_type
+; CHECK-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_base_type
+; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
@@ -140,14 +140,13 @@
 ; CHECK-SPLIT-NEXT:     ForeignTU[0]: 0x675d23e4f33235f2
 ; CHECK-SPLIT-NEXT:   ]
 ; CHECK-SPLIT-NEXT:   Abbreviations [
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV:0x[0-9a-f]*]] {
 ; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
-; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT:       Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
 ; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
@@ -157,8 +156,9 @@
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:       Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }

@ayermolo
Copy link
Contributor Author

ayermolo commented Feb 1, 2024

Alternative is to keep a vector as abbrevs read in. I looked into SetVector but doesn't look like it has find api. Used in NameIndex::getEntry. Also seems a tad wasteful since it only matters for printing out.

This stemps from conversatin in: llvm#77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.
Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we dump things we should dump them in the order in which they appear in the .debug_xxx section. Here we are sorting and dumping them in some order. I realize eventually the abbrevs will have an index and that will make the order more natural, but when dumping the contents of the section, we should do it in the order in which they are defined in the file at all times since we are wanting to see the contents of the section we asked to dump. There aren't usually that many abbreviation combinations right? So maybe the dump function can parse the info again and dump each entry it finds in the right order and throw away the temp llvm::DWARFDebugNames::Abbrev item as it dumps each one? Or we can maintain an offset in each llvm::DWARFDebugNames::Abbrev entry and then sort by that when dumping?

@ayermolo
Copy link
Contributor Author

ayermolo commented Feb 1, 2024

When we dump things we should dump them in the order in which they appear in the .debug_xxx section. Here we are sorting and dumping them in some order. I realize eventually the abbrevs will have an index and that will make the order more natural, but when dumping the contents of the section, we should do it in the order in which they are defined in the file at all times since we are wanting to see the contents of the section we asked to dump. There aren't usually that many abbreviation combinations right? So maybe the dump function can parse the info again and dump each entry it finds in the right order and throw away the temp llvm::DWARFDebugNames::Abbrev item as it dumps each one? Or we can maintain an offset in each llvm::DWARFDebugNames::Abbrev entry and then sort by that when dumping?

I agree we should print in the same order as it's in section.
Changed to sort by offset.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few quick things in the inline comments and this will be good to go.

@dwblaikie
Copy link
Collaborator

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

@clayborg
Copy link
Collaborator

clayborg commented Feb 1, 2024

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

The abbrevs currently makes a DWARFAbbreviationDeclarationSet which contains a std::vector<DWARFAbbreviationDeclaration> Decls; and it also maintains a value call FirstAbbrCode which is set to UINT32_MAX if the abbrevs were not indexed starting with some number. Compilers like to use 1 and the starting number. Then when we ask for an abbrev from the DWARFAbbreviationDeclarationSet, it sees is FirstAbbrCode is not set to UINT32_MAX, and if it isn't it returns a direct access using:

return &Decls[AbbrCode - FirstAbbrCode];

So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.

The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

Copy link
Contributor

@felipepiovezan felipepiovezan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp Outdated Show resolved Hide resolved
llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp Outdated Show resolved Hide resolved
std::vector<const Abbrev *> AbbrevsVect;
for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs)
AbbrevsVect.push_back(&Abbr);
std::sort(AbbrevsVect.begin(), AbbrevsVect.end(),
Copy link
Contributor

@felipepiovezan felipepiovezan Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never use std::sort, instead using the range-based (with some shuffling magic for expensive asserts) sort from stl extras:

sort(AbbrevsVect, [] (...){});

https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation that only applies if we have equal elements. Which shouldn't be the case in this case. Abbrev offsets are monotonically increasing.

@ayermolo
Copy link
Contributor Author

ayermolo commented Feb 1, 2024

Can we revisit this after llvm switches to indices?

@clayborg
Copy link
Collaborator

clayborg commented Feb 2, 2024

Can we revisit this after llvm switches to indices?

I am fine with taking care of switching to a more DebugAbbrev like solution later.

@ayermolo ayermolo merged commit 095367a into llvm:main Feb 2, 2024
3 of 4 checks passed
@ayermolo ayermolo deleted the dwarfDumpAbbrevInOrder branch February 2, 2024 20:36
@dwblaikie
Copy link
Collaborator

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

The abbrevs currently makes a DWARFAbbreviationDeclarationSet which contains a std::vector<DWARFAbbreviationDeclaration> Decls; and it also maintains a value call FirstAbbrCode which is set to UINT32_MAX if the abbrevs were not indexed starting with some number. Compilers like to use 1 and the starting number. Then when we ask for an abbrev from the DWARFAbbreviationDeclarationSet, it sees is FirstAbbrCode is not set to UINT32_MAX, and if it isn't it returns a direct access using:

return &Decls[AbbrCode - FirstAbbrCode];

So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.

The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

I'm not sure I understand - what do you mean by "the abbrev codes are not indexed"? Because of LLVM's current output that uses the weird bit fiddling?

I'd consider that a bug/suboptimality, and I'd be fine with llvm-dwarfdump devolving to a linear search through abbrevs in the case where the abbrevs are not monotonically increasing.

@clayborg
Copy link
Collaborator

clayborg commented Feb 5, 2024

So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.
The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

I'm not sure I understand - what do you mean by "the abbrev codes are not indexed"? Because of LLVM's current output that uses the weird bit fiddling?

The abbreviation codes are not 1 based indexes like they are for .debug_abbrev:

  Abbreviations [
    Abbreviation 0x1718 {
      Tag: DW_TAG_subprogram
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0xb18 {
      Tag: DW_TAG_typedef
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0x998 {
      Tag: DW_TAG_structure_type
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0x1218 {
      Tag: DW_TAG_base_type
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
  ]

I'd consider that a bug/suboptimality, and I'd be fine with llvm-dwarfdump devolving to a linear search through abbrevs in the case where the abbrevs are not monotonically increasing.

I was mostly commenting that .debug_abbrev code lookup is O(1) where if we don't use indexes for the abbrev codes we must do some sort of search and if this happens in .debug_abbrev the search is linear. And of course any code that tries to find abbrevs should work (linear or direct access) depending on what the input is. The main point is it is much more efficient to use 1 based indexes since zero is reserved so that we can do effecient lookups. Most of what I was pointing out was around what .debug_abbrev does right and how it is different from what we are doing for .debug_names abbrev codes.

agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
)

This stemps from conversatin in:
llvm#77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other
attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to
index is
made, it will print out abbrevs in the order they are stored.
@dwblaikie
Copy link
Collaborator

I was mostly commenting that .debug_abbrev code lookup is O(1) where if we don't use indexes for the abbrev codes we must do some sort of search and if this happens in .debug_abbrev the search is linear. And of course any code that tries to find abbrevs should work (linear or direct access) depending on what the input is. The main point is it is much more efficient to use 1 based indexes since zero is reserved so that we can do effecient lookups. Most of what I was pointing out was around what .debug_abbrev does right and how it is different from what we are doing for .debug_names abbrev codes.

Fair - thanks for helping me understand what we're doing today (on the LLVM emission side - we aren't emitting monotonically increasing debug_names abbrevation numbers, compared to debug_abbrev where we do use monotonically increasing abbrev numbers).

I think that's a mistake in LLVM's emission code, and not one we should worry about when designing llvm-dwarfdump. Both sides (LLVM DWARF emission, and llvm-dwarfdump parsing) should be fixed (to behave similarly to - but I don't think we need to wait for the emission to get better/fixed before we make llvm-dwarfdump handle things well)

I'd prefer to see the llvm-dwarfdump code not use a hash map, ideally reuse a generalized form of the debug_abbrev handling code - that's efficient for monotonically increasing abbrev numbers, and falls back to a linear search otherwise - and the printing should print from that list that's stored in the same order it's read from the input - and that list can be directly indexed if it's monotonically increasing, or linearly searched if it's not.

@ayermolo
Copy link
Contributor Author

ayermolo commented Feb 8, 2024

OK, put up a PR for .debug_names in BOLT (no parent index support for now). Let me circle back to this, and change implementation to be sequential on LLVM side.

ayermolo added a commit to ayermolo/llvm-project that referenced this pull request Feb 8, 2024
Based on the discussion in llvm#80229
changed implementation to align with how .debug_abbrev is handled. So that
.debug_names abbrev tag is a monotonically increasing index. This allows for
tools like LLDB to access it in constant time.
ayermolo added a commit to ayermolo/llvm-project that referenced this pull request Feb 14, 2024
Based on the discussion in llvm#80229
changed implementation to align with how .debug_abbrev is handled. So that
.debug_names abbrev tag is a monotonically increasing index. This allows for
tools like LLDB to access it in constant time.
ayermolo added a commit that referenced this pull request Feb 14, 2024
Based on the discussion in
#80229
changed implementation to align with how .debug_abbrev is handled. So
that
.debug_names abbrev tag is a monotonically increasing index. This allows
for
tools like LLDB to access it in constant time using array like data
structure.

clang-19 debug build
before change

[41] .debug_names PROGBITS 0000000000000000 8f9e0350 137fdbe0 00 0 0 4
after change
[41] .debug_names PROGBITS 0000000000000000 8f9e0350 125bfde 00 0 0 4

Reduction ~19.1MB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants