Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK17 Assertion Failure at jswalk.c:538 #15569

Closed
connglli opened this issue Jul 19, 2022 · 6 comments · Fixed by #16155
Closed

JDK17 Assertion Failure at jswalk.c:538 #15569

connglli opened this issue Jul 19, 2022 · 6 comments · Fixed by #16155

Comments

@connglli
Copy link

Java -version output

openjdk version "17.0.4-internal" 2022-07-19
OpenJDK Runtime Environment (build 17.0.4-internal+0-adhoc..openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build master-79f0b73fa, JRE 17 Linux amd64-64-Bit Compressed References 20220701_000000 (JIT enabled, AOT enabled)
OpenJ9   - 79f0b73fa
OMR      - d018241d7
JCL      - c6e2f71170b based on jdk-17.0.4+7)

Summary of problem

The following triggers an assertion failure in jswalk.c (not swalk.c as in #15477).

import java.net.InetSocketAddress;
import java.net.SocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.DatagramChannel;

class Test {
  long instanceCount;

  void vMeth2() {
    int i3, i4 = 6, i7 = 13;
    for (i3 = 7; i3 < 227; ++i3) i4 += i3;
    for (; 238 > i7; ++i7)
      for (int ax$24 = -4717; ax$24 < 4162; ax$24 += 1)
        try {
          String[] ax$21 = {"f1"};
          int ax$18 = 38070;
          ByteBuffer ax$17 = ByteBuffer.wrap("TESTING DATA".getBytes());
          DatagramChannel ax$19 = DatagramChannel.open();
          for (int ax$20 = 0; ax$20 < ax$21.length; ax$20++) {
            ax$17.rewind();
            SocketAddress ax$22 = new InetSocketAddress(ax$21[ax$20], ax$18);
            int n = ax$19.send(ax$17, ax$22);
          }
          ax$19.close();
        } catch (Throwable ax$23) {
        } finally {
        }
  }

  void vMeth1(float f, float f1, int i1) {
    vMeth2();
  }

  void vMeth(long l) {
    vMeth1((float) 0.2515371447919026, (float) 0.11737603546530229, 857532125);
  }

  void mainTest(String[] strArr1) {
    vMeth(instanceCount);
  }

  public static void main(String[] strArr) {
    Test _instance = new Test();
    for (; ; ) _instance.mainTest(strArr);
  }
}

Diagnostic files

By issuing

$ java -Xmx1G -Xshareclasses:none Test

the following crash log is given:

JVMCDRT000E Unable to locate JIT stack map - aborting VM
JVMCDRT001E Method: java/net/InetAddress.getAllByName(Ljava/lang/String;Ljava/net/InetAddress;)[Ljava/net/InetAddress; (00000000002118D0)
JVMCDRT002E Failing PC: 00007F604A259C33 (offset 00000000000001B3), metaData = 00007F6023C26A78
16:38:04.586 0x153d00j9codertvm(j9ji.110    *   ** ASSERTION FAILED ** at /root/hostdir/openj9-openjdk-jdk17/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/07/19 18:38:04 - please wait.
JVMDUMP032I JVM requested System dump using '/zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/core.20220719.183804.1215736.0001.dmp' in response to an event
JVMDUMP010I System dump written to /zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/core.20220719.183804.1215736.0001.dmp
JVMDUMP032I JVM requested Java dump using '/zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/javacore.20220719.183804.1215736.0002.txt' in response to an event
JVMDUMP010I Java dump written to /zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/javacore.20220719.183804.1215736.0002.txt
JVMDUMP032I JVM requested Snap dump using '/zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/Snap.20220719.183804.1215736.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /zdata/congli/test/ttt/openj9-jdk17.2/157/mutant/red/Snap.20220719.183804.1215736.0003.trc
JVMDUMP013I Processed dump event "traceassert", detail "".

Note,

  1. The test may take ~2min to crash; so please be patient
  2. There's a little non-determinism inside the tests; so you may run it for several times

Please also check openj9-bug-109.tar.gz for all the logs (core, snap, etc.), the reduced test (Test.java, Test.class), and unreduced (Test.java.orig, FuzzerUtils.java).

@pshipton
Copy link
Member

@0xdaryl fyi

@hzongaro
Copy link
Member

Peter @pshipton, I can reproduce the failure, but no one has had a chance to investigate it yet. I don't think we will be able to get a fix in for the 0.35 release.

@hzongaro
Copy link
Member

hzongaro commented Oct 7, 2022

I spent a little time trying to see I could narrow things down. The test will crash if both java/net/InetAddress.getAllByName(Ljava/lang/String;)[Ljava/net/InetAddress; and Test.vMeth2()V are compiled:

java -Xjit:limit={java/net/InetAddress.getAllByName*\|Test.vMeth2*},optDetails,log=test.log Test

@hzongaro
Copy link
Member

I can reproduce the crash when Test.vMeth2()V is compiled at scorching, with lastOptIndex=1:

java -Xjit:limit={java/net/InetAddress.getAllByName\(*\|Test.vMeth2*},{Test.vMeth2*}{scorching}\(lastOptIndex=1,optDetails,log=test.vmeth2.log\),verbose Test

The failure is triggered by a segfault in InetAddress.getAllByName when the value field of a java/lang/String that is dereferenced is null, which is never supposed to happen, so the lack of a stack map at that point is actually reasonable:

JVMCDRT000E Unable to locate JIT stack map - aborting VM
JVMCDRT001E Method: java/net/InetAddress.getAllByName(Ljava/lang/String;)[Ljava/net/InetAddress; (0000000000207678)
JVMCDRT002E Failing PC: 00007F5DA9C1D247 (offset 00000000000000A7), metaData = 00007F5D8A13ACF8
23:51:49.464 0x158300j9codertvm(j9ji.110    *   ** ASSERTION FAILED ** at /root/hostdir/defects/issue15730/openj9-openjdk-jdk18/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))
0x7f5da9c1d243 000000a3 [0x7f5d886b6980]                                    fence Relative [ 00007F5DA8B5C9D0 ] # fence BBStart <block_3> (frequency 3336) (extension of previous block)

 \\ java/net/InetAddress.getAllByName(Ljava/lang/String;)[Ljava/net/InetAddress;
 \\    5 JBinvokevirtual 85 java/lang/String.isEmpty()Z
 \\       1 JBgetfield 18 java/lang/String.value [B

0x7f5da9c1d243 000000a3 [0x7f5d886b6cc0] 8b 48 04                           mov ecx, dword ptr [rax+0x4]                # L4RegMem, SymRef  java/lang/String.value [B[#2307  final Shadow +4] [flags 0x400a0607 0x0 ]

 \\ java/net/InetAddress.getAllByName(Ljava/lang/String;)[Ljava/net/InetAddress;
 \\    5 JBinvokevirtual 85 java/lang/String.isEmpty()Z
 \\       4 JBarraylength

0x7f5da9c1d246 000000a6 [0x7f5d886b70d0] 44 8b 69 04                        mov r13d, dword ptr [rcx+0x4]               # L4RegMem, SymRef  <contiguous-array-size>[#331  Shadow +4] [flags 0x603 0x0 ]

The String in this case is the argument to InetAddress.getAllByName, so presumably something is happening in Test.vMeth2 that causes the String.value field to be null.

@hzongaro hzongaro self-assigned this Oct 13, 2022
@hzongaro
Copy link
Member

I was mistaken about the failure occurring when Test.vMeth2()V is compiled at scorching. I was basing this on verbose JIT log output in which the failure appeared to occur immediately after that method was compiled at scorching. In fact, that method runs for a very long time, and the failure occurs after the method is compiled at profiled very-hot - it just happens that entering the method then triggers a scorching compile to be queued, but the failure happens when running the very-hot compiled method.

It looks like the problem occurs because two locals are mapped to the same spot, but the value for one is loaded in a block just after a value for the second is stored in the same block. It appears to be due to an interaction between compactLocals and deadTreesElimination.

In the log file, tracecompactlocals.log we see before Compact Locals, this IL in block_266, in part:

n3941n    BBStart <block_266> (freq 13488) (extension of previous block) (cold) (in loop 8)   [0x7f8615ba5f90] bci=[35,0,224] rc=0 vc=14840 vn=- li=-1 udi=- nc=0
                                                              // Live locals: #419 #421 #434 #443 #886 #1392 #1393 #1397 #1398 #1409 #1413 #1414 #1425 #1429 #1519 #1523 #1524
...
n17992n   aRegStore r13d                                                                      [0x7f86161a8790] bci=[-1,101,21] rc=0 vc=14840 vn=- li=-1 udi=- nc=1
n6859n      aload  <temp slot 26>[#1392  Auto] [flags 0x20000007 0x0 ] (X!=0 sharedMemory )   [0x7f86161aefb0] bci=[-1,101,21] rc=7 vc=15198 vn=- li=2 udi=165 nc=0 flg=0x4
...
n6872n    astore  <temp slot 9>[#986  Auto] [flags 0x20000007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af3c0] bci=[-1,93,21] rc=0 vc=14840 vn=- li=-1 udi=16 nc=1 flg=0x2004
n6868n      ==>loadaddr
...
n6873n    astore  <temp slot 10>[#987  Auto] [flags 0x20000007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af410] bci=[-1,101,21] rc=0 vc=14840 vn=- li=-1 udi=19 nc=1 flg=0x2004
n6859n      ==>aload

Note that #1392 is loaded before the store to #986.

#986:   <temp slot 9>[ Auto] [flags 0x20000007 0x0 ] [0x7f8615cca180] (Address)
#1392:  <temp slot 26>[ Auto] [flags 0x20000007 0x0 ] [0x7f861671c360] (Address)

During Compact Locals, no interference is marked between the two variables in block_266:

Local #74 is symbol at 00007F861671C360
...
Local #146 is symbol at 00007F8615CCA180
...
Live variables for block_266: {45, 46, 69, 70, 71, 72, 73, 74, 164, 182, 184, 189, 191}

Now in block_266
---> visiting tt node 00007F86161AF5A0
BB_End for block_266: live vars = {45, 46, 68, 69, 70, 71, 72, 73, 140, 145, 146, 164, 182, 184, 189, 191}
---> visiting tt node 00007F86161AF550
---> visiting tt node 00007F86161AF460
---> visiting tt node 00007F86161A8920
---> visiting tt node 00007F86161A8560
---> visiting tt node 00007F86161A88D0
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161A8880
---> visiting tt node 00007F86161AEFB0
---> visiting tt node 00007F86161A8830
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161A8510
---> visiting tt node 00007F85B5CC5370
---> visiting tt node 00007F86161AF4B0
---> visiting tt node 00007F86161AF410
--- local index 145 KILLED
---> visiting tt node 00007F86161AEFB0
---> visiting tt node 00007F86161AF320
--- local index 140 KILLED
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161AF370
Adding interference between 45 and 186
Adding interference between 68 and 186
Adding interference between 70 and 186
Adding interference between 72 and 186
Adding interference between 146 and 186
Adding interference between 164 and 186
Adding interference between 182 and 186
Adding interference between 184 and 186
--- local index 186 KILLED
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161AF3C0
--- local index 146 KILLED
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86167839C0
--- local index 68 KILLED
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161A87E0
---> visiting tt node 00007F86161AEFB0
---> visiting tt node 00007F86161A8790
---> visiting tt node 00007F86161AEFB0
+++ local index 74 LIVE
---> visiting tt node 00007F86161A8740
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161A86F0
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86161A86A0
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86167837E0
---> visiting tt node 00007F8615BA5D10
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F8616783740
---> visiting tt node 00007F86167836F0
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F86167836A0
---> visiting tt node 00007F8615BA5D10
---> visiting tt node 00007F86161AF280
---> visiting tt node 00007F8615BA5E50
---> visiting tt node 00007F8615BA5E00
---> visiting tt node 00007F86161A8510
---> visiting tt node 00007F8615BA5D60
---> visiting tt node 00007F8615BA5D10
---> visiting tt node 00007F86161A8510

Later in Dead Trees Elimination, we have this:

[  9871] O^O DEAD TREES ELIMINATION: Move tree : [00007F86161A87E0]([00007F86161AEFB0] = aload) to end of block
...
n3941n    BBStart <block_266> (freq 13488) (extension of previous block) (cold) (in loop 8)   [0x7f8615ba5f90] bci=[35,0,224] rc=0 vc=7 vn=- li=-1 udi=- nc=0
                                                              // Live locals: #419 #421 #434 #443 #886 #1392 #1393 #1397 #1398 #1409 #1413 #1414 #1425 #1429 #1519 #1523 #1524
...
n6872n    astore  <temp slot 9>[#986  Auto] [flags 0x20004007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af3c0] bci=[-1,93,21] rc=0 vc=7 vn=- li=-1 udi=16 nc=1 flg=0x2004
n6868n      ==>loadaddr
...
n6873n    astore  <temp slot 10>[#987  Auto] [flags 0x20004007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af410] bci=[-1,101,21] rc=0 vc=7 vn=- li=-1 udi=19 nc=1 flg=0x2004
n6859n      aload  <temp slot 26>[#1392  Auto] [flags 0x20004007 0x0 ] (X!=0 sharedMemory )   [0x7f86161aefb0] bci=[-1,101,21] rc=7 vc=7 vn=- li=3 udi=165 nc=0 flg=0x4
...
n17992n   aRegStore r13d                                                                      [0x7f86161a8790] bci=[-1,101,21] rc=0 vc=7 vn=- li=-1 udi=- nc=1
n6859n      ==>aload
n17993n   aRegStore r9d                                                                       [0x7f86161a87e0] bci=[-1,101,21] rc=0 vc=7 vn=- li=-1 udi=- nc=1
n6859n      ==>aload

So the load of #1392 has moved after the store to #986. Then during instruction selection this becomes:

------------------------------
 n6872n   (  0)  astore  <temp slot 9>[#986  Auto] [flags 0x20004007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af3c0] bci=[-1,93,21] rc=0 vc=52 vn=- li=266 udi=- nc=1 flg=0x2004
 n6868n   ( 11)    ==>loadaddr (in &GPR_1665) (highWordZero Unsigned X!=0 X>=0 cannotOverflow nodePointsToNonNull cannotTrackLocalUses escapesInColdBlock sharedMemory )
------------------------------

 [0x7f85b3bd3c80]       mov     qword ptr [vfp], &GPR_1665              # S8MemReg, SymRef  <temp slot 9>[#1951  Auto] [flags 0x20004007 0x0 ]
...
 n6873n   (  0)  astore  <temp slot 10>[#987  Auto] [flags 0x20004007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [0x7f86161af410] bci=[-1,101,21] rc=0 vc=52 vn=- li=266 udi=- nc=1 flg=0x2004
 n6859n   (  6)    aload  <temp slot 26>[#1392  Auto] [flags 0x20004007 0x0 ] (in &GPR_1666) (X!=0 sharedMemory )  [0x7f86161aefb0] bci=[-1,101,21] rc=6 vc=52 vn=- li=266 udi=18352 nc=0 flg=0x4
------------------------------

 [0x7f85b3bd4830]       mov     &GPR_1666, qword ptr [vfp]              # L8RegMem, SymRef  <temp slot 26>[#1954  Auto] [flags 0x20004007 0x0 ]
 [0x7f85b3bd4980]       mov     qword ptr [vfp], &GPR_1666              # S8MemReg, SymRef  <temp slot 10>[#1955  Auto] [flags 0x20004007 0x0 ]

And finally, Post Stack Map, we have this, with both #1951 and #1954 at the same offset

[ 11088] O^O COMPACT LOCALS: Sharing slot for local 00007F8615CCA180   <<< #986
...
 [0x7f85b3bd36f0]       lea     rbx, qword ptr [vfp-0xb8]               # LEA8RegMem, SymRef  <temp slot 31>[#1949  Auto] [flags 0x60000008 0x0 ]
 [0x7f85b3bd3810]       mov     qword ptr [vfp-0xf8], rbx               # S8MemReg, SymRef  <temp slot 32>[#1950  Auto] [flags 0x7 0x0 ]
 [0x7f85b3bd3c80]       mov     qword ptr [vfp-0x88], rbx               # S8MemReg, SymRef  <temp slot 9>[#1951  Auto] [flags 0x20004007 0x0 ]
 [0x7f85b3bd4020]       mov     qword ptr [vfp-0xc0], rbx               # S8MemReg, SymRef  <pending push temp 0>[#1952  Auto] [flags 0x4007 0x800 ]
 [0x7f85b3bd4390]       mov     qword ptr [vfp-0x70], rbx               # S8MemReg, SymRef  <temp slot 18>[#1953  Auto] [flags 0x4007 0x0 ]
 [0x7f85b3bd4830]       mov     r13, qword ptr [vfp-0x88]               # L8RegMem, SymRef  <temp slot 26>[#1954  Auto] [flags 0x20004007 0x0 ]
 [0x7f85b3bd4980]       mov     qword ptr [vfp-0xc0], r13               # S8MemReg, SymRef  <temp slot 10>[#1955  Auto] [flags 0x20004007 0x0 ]

Ultimately, block_266 falls through to block_572, which copies the value of r13 to r9, and then falls through to block_573, then block_576, which copies r9 to rax, and finally falls through to block_290, which calls java/net/InetAddress.getAllByName, with the wrong reference loaded from the shared slot:

 [0x7f85b3bd5c60]       fence Relative [ 00007F8616239F10 ]     # fence BBStart <block_572> (frequency 13488) (extension of previous block) (cold) (in loop 8)
...
[0x7f85b2304d40]       mov     r9, r13         # MOV8RegReg
...
[0x7f85b3bd6700]       fence Relative [ 00007F8616239F14 ]     # fence BBEnd </block_572>
...
 [0x7f85b3bd87d0]       fence Relative [ 00007F861623A430 ]     # fence BBStart <block_576> (frequency 13488) (cold) (in loop 8)
...
[0x7f85b2304960]       mov     rax, r9         # MOV8RegReg
...
 [0x7f85b3bd94e0]       fence Relative [ 00007F861623A434 ]     # fence BBEnd </block_576>
........................................
[0x7f85b3bd9600]       fence Relative [ 00007F8615C74E40 ]     # fence BBStart <block_290> (frequency 13488) (extension of previous block) (cold) (in loop 8)
[0x7f85b3bd9860]       mov     qword ptr [vfp-0x78], 0x00000000        # S8MemImm4, SymRef  addr<auto slot 3>[#1956  Auto] [flags 0x4007 0x0 ]
[0x7f85b3bdaa50]       Label L0880:                    # label # (Start of internal control flow)
[0x7f85b3bda3b0]       nop                     # Align patchable code @32 [0x0:5]
[0x7f85b3bda310]       call    java/net/InetAddress.getAllByName(Ljava/lang/String;)[Ljava/net/InetAddress;# CALLImm4 (00007F861692D285)# CALLImm4

@hzongaro
Copy link
Member

hzongaro commented Oct 19, 2022

Hmmm. It looks like Compact Locals is not supposed to happen when profiling is enabled, according to the definition of finalGlobalOpts. Continuing to investigate.

Update: It looks like that restriction only applies to JitProfile mode, not to JProfile mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants