Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug info frame size changes are not being recorded and processed correctly #2624

Closed
adinn opened this issue Jun 29, 2020 · 5 comments
Closed
Assignees

Comments

@adinn
Copy link
Collaborator

adinn commented Jun 29, 2020

The compiler adds frame-event marks to a compilation result at 6 distinct points: prologue entry, prologue stack grow, prologue exit, epilogue entry, epilogue stack shrink and epilogue exit. These events are passed on to debug info generation code as frame change events. There is an error in the generation of events and an error in the way they are converted to frame change events.

  1. The compiler currently marks epilogue end events at the point where the frame has been torn down i.e. the same point as the epilogue stack shrink. This is incorrect because the epilogue should included all code up to and including any ensuing return (or, in special cases, up to a block end via a branch out of the method). The range of code up to that final transfer of control at epilogue exit needs to be marked as having an empty stack. Furthermore, the epilogue exit mark is needed in order to identify that subsequent code generated in the method -- code which can only be reached via a jump past the epilogue -- should be understood as having a non-empty stack i.e. the epilogue exit mark serves to re-establish the presence of a stack frame for that next segment of generated method code.

  2. The debug info frame size change notifications notify the initial frame growth during the prologue and notfiy a frame shrink during each epilogue. However, they do not re-notify stack growth for generated method code that follows an epilogue exit. Clearly, doing so wihtout a remedy for the previous error is not possible since the point at which the epilogue exit is generated cannot currently be used to decide if the epilogue is actually at the end of a method.

How to reproduce:

Build libpolgloty.so with debug info enabled as per instructions in issue #2587
build and run a.out as per instructions in issue #2587
Debug the core in gdb as per instructions in issue #2587
Observe the following stack trace
(gdb) bt
#0 com.oracle.svm.core.jdk.VMErrorSubstitutions::doShutdown(java.lang.String, java.lang.Throwable)(void) () at com/oracle/svm/core/log/RealLog.java:385
#1 0x00007f78baf03879 in com.oracle.svm.core.util.VMError::shouldNotReachHere(java.lang.String, java.lang.Throwable)(void) () at com/oracle/svm/core/threadlocal/FastThreadLocalInt.java:51
#2 0x00007f78baf0380c in com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void) () at com/oracle/svm/core/util/VMError.java:88
#3 0x0000000000000000 in ?? ()
(gdb) p/x $rsp

Use objdump to dump info and frame sections so that code addresses and frame sizes can be identified

The problem is that the frame for VMError::guarantee is size 16 but is treated as size 8 at address 0x00007f78baf0380c:

(gdb) p/x $rsp
$4 = 0x7ffe698a9c88
(gdb) x/24gx 0x7ffe698a9c00
96 bytes for frame
0x7ffe698a9c00:	0x00007f78ba6a59e0	0x0000000000000040
0x7ffe698a9c10:	0x00000000021bd290	0x00007f78c4fb3cd8
0x7ffe698a9c20:	0x0000000000000000	0x00007ffe698a9e38
0x7ffe698a9c30:	0x00007ffe6d8e2240	0x0000000000001000
0x7ffe698a9c40:	0x00007ffe698a9c70	0xdca7ffe742e93000
0x7ffe698a9c50:	0x0000000000000001	0x00007f78baf03879 # ==> VMError::shouldNotReachHere
32 bytes for frame
0x7ffe698a9c60:	0x0000000000000400	0x00007ffe6d8e2240
0x7ffe698a9c70:	0x00007ffe698a9e38	0x00007f78baf0380c # ==> VMError::guarantee
16 bytes for frame but actually treated as 8
0x7ffe698a9c80:	0x0000000000000000	0x00007f78baefdb45 ==> Safepoint::enterSlowPathTransitionFromNativeToNewStatus
32 bytes for frame
0x7ffe698a9c90:	0x0000000000000000	0x0000000000000000
0x7ffe698a9ca0:	0x0000000100000000	0x00007f78bae624fd ==> IsolateEnterStub::PolyglotNativeAPI_poly_create_context_builder_232c64e3b5beacd8181c73f1c062cb7e8da69f3d

The code address 0x7ffe698a9c70 in VMError::guarantee is in code generated after a retq arrived at by jumping around the block containing the retq

(gdb) x/12i 0x00007f78baf037f0
   0x7f78baf037f0 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)>:	sub    $0x8,%rsp
   0x7f78baf037f4 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+4>:	test   %edi,%edi
   0x7f78baf037f6 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+6>:	je     0x7f78baf03801 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+17>
   0x7f78baf037fc <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+12>:	add    $0x8,%rsp
   0x7f78baf03800 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+16>:	retq   
   0x7f78baf03801 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+17>:	mov    %rsi,%rdi
   0x7f78baf03804 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+20>:	mov    %r14,%rsi
   0x7f78baf03807 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+23>:	callq  0x7f78baf03830 <com.oracle.svm.core.util.VMError::shouldNotReachHere(java.lang.String, java.lang.Throwable)(void)>
=> 0x7f78baf0380c <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+28>:	nop
   0x7f78baf0380d <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+29>:	cmp    %r14,%rax
   0x7f78baf03810 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+32>:	jne    0x7f78baf0381c <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+44>
   0x7f78baf03816 <com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void)+38>:	callq  0x7f78baef2aa0 <com.oracle.svm.core.snippets.ImplicitExceptions::throwCachedNullPointerException(void)>

n.b. Problem 1 manifests here with the epilogue stack shrink and epilogue exit both currently generated at 0x7f78baf03800. The epilogue exit should be generated at 0x7f78baf03801.

n.b. Problem 2 manifests here with class DebugCodeInfo posting only a CONTRACT FrameSizeChange at 0x7f78baf03800 shrinking the stack sizeto 8. It should post an EXPAND FrameSizeChange event at 0x7f78baf03801 restoring the stack size to 16.

Describe GraalVM and your environment:

  • GraalVM version current head
  • JDK major version: latest graal jdku11
  • OS: Linux
  • Architecture: affects x86 and AArch64
@adinn
Copy link
Collaborator Author

adinn commented Jun 29, 2020

@YaSuenag this is the problem that was causing the bad frame in the gdb crash backtrace for libpolygot.so. I have a patch which I am currently testing.

@adinn adinn mentioned this issue Jun 29, 2020
@YaSuenag
Copy link
Contributor

Thanks a lot @adinn !!
I got all backtraces as below with your #2625 patch:

#0  com.oracle.svm.core.jdk.VMErrorSubstitutions::doShutdown(java.lang.String, java.lang.Throwable)(void) () at com/oracle/svm/core/log/RealLog.java:385
#1  0x00007facf46ff0d9 in com.oracle.svm.core.util.VMError::shouldNotReachHere(java.lang.String, java.lang.Throwable)(void) () at com/oracle/svm/core/threadlocal/FastThreadLocalInt.java:51
#2  0x00007facf46ff06c in com.oracle.svm.core.util.VMError::guarantee(boolean, java.lang.String)(void) () at com/oracle/svm/core/util/VMError.java:88
#3  0x00007facf46f93a5 in com.oracle.svm.core.thread.Safepoint::enterSlowPathTransitionFromNativeToNewStatus(int)(void) () at com/oracle/svm/core/thread/Safepoint.java:467
#4  0x00007facf465d52d in com.oracle.svm.core.code.IsolateEnterStub::PolyglotNativeAPI_poly_create_context_builder_232c64e3b5beacd8181c73f1c062cb7e8da69f3d(org.graalvm.polyglot.nativeapi.types.PolyglotNativeAPITypes$PolyglotIsolateThread, org.graalvm.nativeimage.c.type.CCharPointerPointer, org.graalvm.word.UnsignedWord, org.graalvm.polyglot.nativeapi.types.PolyglotNativeAPITypes$PolyglotContextBuilderPointer)(void) () at com/oracle/svm/core/code/IsolateEnterStub.java:1
#5  0x0000000000401149 in main () at libpolyglot-crash.c:6

I wonder why frame # 4 did not point PolyglotNativeAPI.java . I guess the cause is it is called via IsolateEnterStub because it is C entry point. I'm happy if I can see line number of PolyglotNativeAPI.java on backtrace. Is it diffucult?

@adinn
Copy link
Collaborator Author

adinn commented Jun 30, 2020

Hi @YaSuenag

I wonder why frame # 4 did not point PolyglotNativeAPI.java . I guess the cause is it is called via IsolateEnterStub because it is C entry point. I'm happy if I can see line number of PolyglotNativeAPI.java on backtrace. Is it difficult?

The compiled method which is being listed here is attached to class IsolateEnterStub. It doesn't strictly have a source file because it is a native (C) linkage stub method that is generated by Graal. Looking at the compiled code I can see that somewhere in the middle of the method it executes this call

   <...+266>:	callq  0x7f5359b648e0 <org.graalvm.polyglot.nativeapi.PolyglotNativeAPI::withHandledErrors(org.graalvm.polyglot.nativeapi.PolyglotNativeAPI$VoidThunk)(void)>

This corresponds to offset 0x79236a in the text section. The line number info for that offset identifies file PolyglotNativeAPI.java line 260. So, if you were to break that line or step into it the debugger should show the correct file and line.

In other words the source code you want to point to has been inlined into this method and the debugger will find it when executing the inlined instructions. But for the entry instruction address there is not really any sensible source code to point the debugger at.

@YaSuenag
Copy link
Contributor

Thanks for clarification!

@olpaw
Copy link
Member

olpaw commented Jul 7, 2020

Fixed by #2625

@olpaw olpaw closed this as completed Jul 7, 2020
@olpaw olpaw linked a pull request Jul 7, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants