Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-47106] Return to separate CUs per class to improve gdb startup time #6937

Merged
merged 7 commits into from
Jul 21, 2023

Conversation

adinn
Copy link
Collaborator

@adinn adinn commented Jul 5, 2023

This patch restores the DWARF debug info generator to generating a compilation unit (CU) per Java class. This helps alleviate the performance problem identified in issue #6936.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 5, 2023
@adinn adinn requested review from olpaw, fniephaus and zakkak July 5, 2023 16:13
@adinn
Copy link
Collaborator Author

adinn commented Jul 5, 2023

Thanks for the review Fabio!

@olpaw olpaw self-assigned this Jul 6, 2023
@fniephaus fniephaus changed the title Return to separate CUs per class to improve gdb startup time [GR-47106] Return to separate CUs per class to improve gdb startup time Jul 7, 2023
@fniephaus
Copy link
Member

Thanks for the review Fabio!

Of course, thanks for addressing. While we're waiting for Paul's review, I've started the integration process already and am running the internal gates on this.

Copy link
Collaborator

@zakkak zakkak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

As mentioned in Karm/mandrel-integration-tests#160 (comment) it still feels slow but it's about 3x faster on my machine

@olpaw
Copy link
Member

olpaw commented Jul 11, 2023

@adinn please fixup the commits

* fd4313e26ef - (6 days ago) more style fixes - Andrew Dinn
* 2623562c669 - (6 days ago) style fixes - Andrew Dinn
* 8d5750d2fc5 - (6 days ago) style fixes - Andrew Dinn

into a single commit. Also it would be nice if the commit message would all start uppercase.

@adinn
Copy link
Collaborator Author

adinn commented Jul 11, 2023

@adinn please fixup the commits

Done.

@fniephaus
Copy link
Member

Hi @adinn, we failed to integrate this due to a few memory-constraint jobs that are now reliably failing with your changes, either because they're running out of memory or because they're killed by the deadlock watchdog (very likely also because the GC is very busy and slows down the creation of debuginfo). Before we increase the memory limits of these jobs, could you maybe check whether there's something that can be done to bring the memory usage back down? Here's an example stack trace from a watchdog-killed job:

=== Image generator watchdog detected no activity. This can be a sign of a deadlock during image building. Dumping all stack traces. Current time: Thu Jul 13 11:28:43 UTC 2023
"main" Id=1 in RUNNABLE
    at java.base@21/java.util.ArrayList.spliterator(ArrayList.java:1613)
    at java.base@21/java.util.Collection.stream(Collection.java:750)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.debugentry.ClassEntry.compiledEntries(ClassEntry.java:249)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfARangesSectionImpl.lambda$writeContent$2(DwarfARangesSectionImpl.java:148)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfARangesSectionImpl$$Lambda/0x00000001021af108.accept(Unknown Source)
    at java.base@21/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.base@21/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
    at java.base@21/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1707)
    at java.base@21/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base@21/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base@21/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.base@21/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.base@21/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base@21/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:601)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfARangesSectionImpl.writeContent(DwarfARangesSectionImpl.java:144)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfSectionImpl$$Lambda/0x00000001021ae3b8.accept(Unknown Source)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.debugContext(ObjectFile.java:1833)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfSectionImpl.getOrDecideContent(DwarfSectionImpl.java:647)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.dwarf.DwarfARangesSectionImpl.getOrDecideContent(DwarfARangesSectionImpl.java:131)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.elf.ELFUserDefinedSection.getOrDecideContent(ELFUserDefinedSection.java:110)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.bake(ObjectFile.java:1625)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.write(ObjectFile.java:1282)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.lambda$write$5(ObjectFile.java:1274)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile$$Lambda/0x00000001021a8d58.run(Unknown Source)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.withDebugContext(ObjectFile.java:1814)
    at app/org.graalvm.nativeimage.objectfile/com.oracle.objectfile.ObjectFile.write(ObjectFile.java:1273)
    at app/org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImage.write(NativeImage.java:166)
    at app/org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.NativeImageViaCC.write(NativeImageViaCC.java:98)
    at app/org.graalvm.nativeimage.builder/com.oracle.svm.hosted.image.SharedLibraryImageViaCC.write(SharedLibraryImageViaCC.java:57)

@adinn
Copy link
Collaborator Author

adinn commented Jul 17, 2023

Hi, @fniephaus. Sure, I'll create some heapdumps and analyze them to see if I can find some obvious thing that accounts for the extra memory use.

@adinn
Copy link
Collaborator Author

adinn commented Jul 19, 2023

Hi @fniephaus. I looked at what was driving the high memory use in the debug info generator and have managed to remove a lot of the overhead.

The problem was to do with keeping track of debug info references originating from the .debug_info section that required relocations. It is not just any old relocations either. The memory problem is tightly coupled with three very specific .debug_info section references:

  1. abstract_origin attribute that links an inlined subprogram DIE to the corresponding method_declaration DIE in the .debug_info section
  2. abstract_origin attribute that links an inlined parameter_location or local_location DIE to the corresponding parameter_declaration or local_declaration DIE in the .debug_info section
  3. location attribute that links an inlined parameter_location or local_location DIE to the corresponding location_list in the .debug_loc section

In these 3 cases the reference is across a compile unit (CU) or section boundary and so needs a relocatable offset for the reference. By contrast a reference within a CU can use a CU-relative offset with no relocation (if the CU moves it remains valid).

There were (are still) other links that required relocations but these three categories dominate. In particular, the count of these references (and hence the resulting reloc count) is determined by the total number of nodes in the inline tree for case 1 and by the total number of parameter location nodes attached to those inline tree nodes in cases 2 and 3. So, the overhead is very dependent on how many info points/source positions there are and the depth of inlining in the inline trees.

n.b. the .debug_info_reloc content byte[] only requires 24 bytes for each reloc but there is also an extra overhead of 40 bytes for an ElfRelocSection$Entry to model it and another 40 bytes for a TreeMap$Entry to index it i.e. the storage multiplier per reloc is 104 bytes.

My first attempt at a fix was to ensure all the info --> info links that are internal to a CU use a CU-relative offset. However, that made little difference since it does not help with the 3 dominant cases highlighted above. I have now managed to significantly reduce the overhead for cases 1 and 2 by reviving a trick that I used in the previous version of the debug info model (and had simply forgotten to reapply). Case 3 cannot really be remedied as each inline parameter needs to have its own location list and hence needs to include a (relocatable) cross-section link. However, not every parameter location entry has a loc list (it can be omitted when the param/local remains undefined across an inline region). So, as a result, the latest commit makes a substantial (> 50%) reduction to reloc memory use.

I'll explain what I did to improve cases 1 and 2, just for the record and also provide some before and after numbers ot give an idea of the scale of the problem.

The reloc overhead is minimized by changing the info model to employ a combination of link indirection and DIE cloning. Each inlined_subprogram DIE now refers to an abstract_inline_method DIE in the same CU and hence can use a CU-relative offset. The latter DIE includes a specification link, possibly crossing CUs, to the original declaration which has to be a relocatable section offset. This indirection collapses the reloc count since the inline tree contains many nodes referring to the same method.

The parameters and locals from the original declaration are now cloned under the abstract_inline_method DIE, allowing inlined parameter_location or local_location DIEs to reference the clone using a CU-relative offset. n.b. as an improvement on the previous implementation I don't clone parameter or local declarations when the inlined method belongs to the current CU's class -- we can reference the original declarations in that case.

This makes a big difference to the amount of reloc storage employed. I took some before and after heap dumps running the test program which manifested the gdb startup problem that prompted this patch. n.b. this is the worst case scenario with full debug infopoint and source position generation i.e. the command line included the following options:

-H:GenerateDebugInfo=1 -H:+SourceLevelDebug -H:+DebugCodeInfoUseSourceMappings -H:-DeleteLocalSymbols -H:+TrackNodeSourcePosition

The count of inline subroutines and inline parameter/local DIEs can be approximated by grepping keyword abstract_origin in an info dump. Obviously this did not change much between runs.

Run #total relocs
before 17,830,903
after 17,843,044

The split between the two types of DIE can be approximated by counting formal_parameter and variable records (which overcounts case 2). This indicated that about 15-20% of the relocs were inline subroutines DIES (hits case 1) and 80-85% inline parameter/local DIEs (hits cases 2 and 3).

The heap dump revealed the following summary storage costs related to info section content and reloc management:

Run byte[] TreeMap$Entry ElfRelocationSection$Entry
before 1,240,285,224 1,166,034,160 1,130,002,320
after 908,729,856 568,131,400 532,098,240

The corresponding instance counts for the latter two types are

Run #TreeMap$Entry #ElfRelocationSection$Entry
before 29,150,854 28,250,058
after 14,203,285 13,302,456

Looking at individual byte[] instances the ranking by size is as follows

Run Rank Section #bytes
before 1 debug_info_reloc 655,000,000
" 2 debug_info 198,000,000
" 3 debug_loc 70,000,000
" 4 text 40,000,000
after 1 debug_info_reloc 298,000,000
" 2 debug_info 218,000,000
" 3 debug_loc 70,000,000
" 4 text 40,000,000

So, although the info section byte array has increased slightly in size (thanks to both the new and cloned DIEs), this is more than made up for by the reduction in the size of the associated reloc section.

Looking at individual TreeMap instances it is possible to see how many of the TreeMap$Entry objects belong to the info section showing that this is where the real problem lies (other sections contribute little to the total) and also giving a measure of how much of an issue case 3 remains.

Run #TreeMap$Entry (info) #TreeMap$Entry (total)
before 27,342,167 29,150,854
after 12,394,039 14,203,285

Note that the absolute numbers and ratios for the ElfRelocationSection$Entry instances are similar. This confirms that over half of the relocs have been eliminated as indicated by the reduction in the size of the info section content byte[] but that case 3 can still make a sizeable contribution to memory when full debug info is switched on..

@adinn
Copy link
Collaborator Author

adinn commented Jul 19, 2023

@fniephaus The latest (espresso) gate failure appears to be unrelated to this patch. I think this is ready to be resubmitted to the internal gate.

@fniephaus
Copy link
Member

I looked at what was driving the high memory use in the debug info generator and have managed to remove a lot of the overhead.

Fantastic, thanks a lot, @adinn! Let me run the gates again and see whether it passes our benchmarking infra. :)

And special thanks for the detailed analysis report. I always enjoy reading them!

The latest (espresso) gate failure appears to be unrelated to this patch. I think this is ready to be resubmitted to the internal gate.

No need to worry about it. We're already working on fixing that.

@adinn
Copy link
Collaborator Author

adinn commented Jul 19, 2023

And special thanks for the detailed analysis report. I always enjoy reading them!

Well, indeed, much as I do when it is 6 months down the line and I have near zero recollection of what changes I made or why I made them ;-). All the same, your words of appreciation are gratefully received.

@fniephaus
Copy link
Member

... and we're past the failing gates, yay! Thanks for fixing...getting this merged now (see #6984).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
native-image-debuginfo OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants