Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler is failing to track source position for NewInstance nodes #3620

Closed
adinn opened this issue Jul 23, 2021 · 14 comments · Fixed by #3621
Closed

Compiler is failing to track source position for NewInstance nodes #3620

adinn opened this issue Jul 23, 2021 · 14 comments · Fixed by #3621
Assignees
Labels

Comments

@adinn
Copy link
Collaborator

adinn commented Jul 23, 2021

The current GraalVM head is manifesting a problem with tracking source positions for compiled Java methods that affects debugging in native images.

During compilation a NewInstance node is often replaced by a VirtualInstance node.The source position (NodeSourcePosition) associated with the NewInstance is not copied across to the VirtualInstance. As a consquence the caller information in the SourceMapping for the compiled method that relates to the inlined allocation instructions ends in a root call that has no associated file or line number.

Manifesting this problem without the latest in progress debuginfo update is complicated. It can only be done by using a debugger to inspect the SourceMapping of a CompilationResult for a method that includes a new operation. The mapping will include one or more NodeSourcePosition objects for the inlined allocation. These will have caller chains that have valid inline positions for methods like GenScavengeAllocationSnippets::readTlabTop, AllocationSnippets.allocateInstanceImpl etc but the inlined caller chains all end with a root NodeSourcePosiiton for the top level method that is marked as a placeholder and has bci = -1. This root called should actually identify the bci for the new operation in the outer compiled method.

The latest debug info code manifests the problem more immediately. At the point where a new is inlinedin the Hello debugger test the code produces this backtrace:

#0  com.oracle.svm.core.genscavenge.graal.GenScavengeAllocationSnippets::readTlabTop () at com/oracle/svm/core/genscavenge/graal/GenScavengeAllocationSnippets.java:129
#1  org.graalvm.compiler.replacements.AllocationSnippets::allocateInstanceImpl () at org/graalvm/compiler/replacements/AllocationSnippets.java:59
#2  com.oracle.svm.core.graal.snippets.SubstrateAllocationSnippets::allocateInstance () at com/oracle/svm/core/graal/snippets/SubstrateAllocationSnippets.java:124
#3  0x0000000000405087 in Hello$Greeter::greeter(java.lang.String[] *) ()
#4  0x0000000000405593 in Hello::main(java.lang.String[] *) () at Hello.java:43
#5  0x000000000041b9e5 in com.oracle.svm.core.JavaMainWrapper::runCore () at com/oracle/svm/core/JavaMainWrapper.java:146
#6  com.oracle.svm.core.JavaMainWrapper::run () at com/oracle/svm/core/JavaMainWrapper.java:182
#7  0x000000000041b9e5 in com.oracle.svm.core.code.IsolateEnterStub::JavaMainWrapper_run_5087f5482cc9a6abc971913ece43acb471d2631b(int, org.graalvm.nativeimage.c.type.CCharPointerPointer *) ()

A proposed patch that simply copies the the node source position across from the NewInstance to the VirtualInstance at the point of replacement corrects the backtrace as follows

#0  com.oracle.svm.core.genscavenge.graal.GenScavengeAllocationSnippets::readTlabTop () at com/oracle/svm/core/genscavenge/graal/GenScavengeAllocationSnippets.java:129
#1  org.graalvm.compiler.replacements.AllocationSnippets::allocateInstanceImpl () at org/graalvm/compiler/replacements/AllocationSnippets.java:59
#2  com.oracle.svm.core.graal.snippets.SubstrateAllocationSnippets::allocateInstance () at com/oracle/svm/core/graal/snippets/SubstrateAllocationSnippets.java:124
#3  Hello$Greeter::greeter(java.lang.String[] *) () at Hello.java:12
#4  0x0000000000405593 in Hello::main(java.lang.String[] *) () at Hello.java:43
#5  0x000000000041b9e5 in com.oracle.svm.core.JavaMainWrapper::runCore () at com/oracle/svm/core/JavaMainWrapper.java:146
#6  com.oracle.svm.core.JavaMainWrapper::run () at com/oracle/svm/core/JavaMainWrapper.java:182
#7  0x000000000041b9e5 in com.oracle.svm.core.code.IsolateEnterStub::JavaMainWrapper_run_5087f5482cc9a6abc971913ece43acb471d2631b(int, org.graalvm.nativeimage.c.type.CCharPointerPointer *) ()

Note that in the second backtrace the root of the inline tree for pc 0x405087 at frame #3 is showing the correct file and line number, Hello.java:12.

@adinn
Copy link
Collaborator Author

adinn commented Jul 23, 2021

@christianwimmer @peter-hofer

I have a simple patch for this issue which resolves the problem that manifests in the latest debugger code. It is clearly along the right lines. However, I am not sure if the proposed code is all that is needed or that it is being executed at the right place/in the right way.

Could one or other of you please link the patch to this issue and either review it or get someone in the compiler team to review it?

Also, I am concerned that there may be other points (now or, perhaps, in the future) where the compiler fails to carry across source position info when it is doing graph transformations and that this will impact on the quality of debugging. I'd prefer to try to find some way of automatically testing that substitutions and plugin replacements don't do that. Perhaps we can add this to our discussion agenda?

@adinn
Copy link
Collaborator Author

adinn commented Jul 23, 2021

@zakkak A fix to this issue is needed to ensure that the gdb displays the correct file and line for the root method of an inline method hiererachy when using the latest version of the DWARF inline method info generation code.

@peter-hofer
Copy link
Member

@dougxc @tkrodriguez please let us know how we can best improve node source tracking for debug information generation and have a look at #3621.

@dougxc
Copy link
Member

dougxc commented Jul 26, 2021

I'm not sure that are many strategies for improving source node position info apart from finding places where it is missing and fixing them.
@tkrodriguez maybe there's some extra verification code we can add to detect missing propagation of source position info?

@adinn
Copy link
Collaborator Author

adinn commented Jul 26, 2021

@dougxc So does that mean the patch in #3621 is ok to push? I ask because I was hoping there might be some way to ensure source position propagation happens as a side-effect of the graph node replacement (e.g. in this case the call to setSourceNodePosition would happen in the call to tool.replaceWithVirtual). If not then I guess we just have a game of whack-a-mole on our hands.

@dougxc
Copy link
Member

dougxc commented Jul 26, 2021

I like the suggestion of tool.replaceWithVirtual and other graph transforming methods in VirtualizerTool taking responsibility for source node info propagation. What do you think @tkrodriguez ?
As a short term fix, #3621 is ok to merge.

@adinn
Copy link
Collaborator Author

adinn commented Jul 26, 2021

@dougxc sounds like a good idea to me. Would you or @tkrodriguez be able to review #3621and shephered it through the internal gate. I would normally ask @olpaw or @christianwimmer but they are both on holiday at present.

@dougxc
Copy link
Member

dougxc commented Jul 26, 2021

I'll shepherd it through once @tkrodriguez signs off on it.

@adinn
Copy link
Collaborator Author

adinn commented Jul 26, 2021

@dougxc thanks very much!

@peter-hofer
Copy link
Member

@dougxc thank you!

@tkrodriguez
Copy link
Member

Yes to support more accurate information we'd need to tracked it through the PEA process. Currently it's somewhat incomplete I think. It's also a little complicated to get the right answer in all cases since PEA allows merging of allocations so we'd end up just picking one arbitrarily in that case. A more systematic change would be to require passing in a NodeSourcePosition to the VirtualObjectNode constructor and then ensuring that a valid position is correctly passed down or some other piece of code takes care of setting an initial value.

@christianwimmer
Copy link

@adinn It is a good question what stack trace we actually want to see when we are in the slow path of an allocation. Is it useful to show the stack trace where the original new bytecode was? Or do we want to show the stack trace that is "closest" to the point where escape analysis actually placed the allocation?

@tkrodriguez
Copy link
Member

I think the answer for something like profiling is that we want to properly attribute the work as a new bytecode. For debugging you might be able to argue that closest makes sense but I'm not sure it's helpful. We will happily float operations like loads, stores and arithmetic out of loops but keep the source position within the loop. I'm not sure it makes sense to treat the allocation differently. I'm not even sure how you express the notion of closest since it may be near other operations that floated to their current location.

@adinn
Copy link
Collaborator Author

adinn commented Jul 27, 2021

@christianwimmer I think for this case it makes most sense to keep the inlined allocation calls tree rooted at the source bytecode/line which led to the inline code. That's very much in line with what gcc tries to do when generating debug info for -g -O3 compiled code. With such code the debugger jumps backwards and forwards in the source to reflect the re-ordering of integral source level operations into separate, interleaved strands of generated machine code.

The utility of having debug info in optimized code is not to support debugging of operations in source order -- that's just not an option. It is to help whoever is debugging the machine code to relate that code to the original source in as coherent a manner as possible, wherever that is possible. In this case, rooting the inline tree at some other bytecode/line would just make it harder to work out why this specific machine code was generated.

As @tkrodriguez says there is not always going to be a clear mapping. But that's ok -- propagating source bytecode/line info through to machine code addresses is simply a convenience. It does not need to be completely accurate. It merely needs to reflect some more or less tenuous connection between the final code and an originating point in the source. The imprecision and incompleteness of any such mapping is no reason to give up on providing source position info. We just need to find places where we can retain some sort of link that is helpful, like this one.

Of course, this information is not going to help every developer. The target audience for debug info cannot realistically be people who have no understanding of what the compiler is going to do -- you need some basic knowledge even if you compile with the economy tier. The most important use of debuginfo is not to diagnose and fix errors in the original Java code (we have the JVM and JVM debuggers for that) but to diagnose and fix errors in the way the code has been translated to a compiled native image if/when a disparity between JVM and native execution arises. Whoever is debugging will need, and indeed need to be able, to apply both judgement and a knowledge of the compile process to whatever relation between source and generated code the debug info records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants