Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic NPEs when running Clojure #12191

Closed
ghadishayban opened this issue Mar 11, 2021 · 28 comments · Fixed by #12213
Closed

Non-deterministic NPEs when running Clojure #12191

ghadishayban opened this issue Mar 11, 2021 · 28 comments · Fixed by #12213

Comments

@ghadishayban
Copy link

ghadishayban commented Mar 11, 2021

Hello kind folks,
I have an interesting issue with OpenJ9 running Clojure. Running Clojure on OpenJ9 throws non-deterministic NPEs, and this appears to be related to the "locals clearing" characteristic of Clojure's bytecode. Locals clearing is when a reference in a bytecode local variable is set to null using aconst_null; astore X before the last usage of that local variable (within a compilation unit, the function). I will explain how to repro, then I will explain what is happening in the specific bytecode.

Speculation: The compiler is applying an incorrect optimization of this locals clearing pattern across control flow / jumps.

Env

openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 15.0.2+7)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 15 Linux amd64-64-Bit Compressed References 20210121_172 (JIT enabled, AOT enabled)
OpenJ9   - 345e1b09e
OMR      - 741e94ea8
JCL      - 863b523566 based on jdk-15.0.2+7)

Repro by building Clojure

git clone https://github.com/clojure/clojure.git
cd clojure
mvn -Plocal -Dmaven.test.skip=true package

then run the test until it fails:

while : ; do
java -XX:-ShowCodeDetailsInExceptionMessages -jar clojure.jar -e '(vec (range 10))'
[[ $? == '0' ]] || break
done

Alternative repro

Install the Clojure CLI, then run:

while : ; do
clojure -J-XX:-ShowCodeDetailsInExceptionMessages -M -e '(vec (range 10))'
[[ $? == '0' ]] || break
done

When it fails non-deterministically, it outputs:

Exception in thread "main" java.lang.NullPointerException
        at clojure.core$reduce1.invokeStatic(core.clj:946)
        at clojure.core$reduce1.invokeStatic(core.clj:936)
        at clojure.core$merge.invokeStatic(core.clj:3052)
        at clojure.core$Throwable__GT_map$base__7472.invoke(core_print.clj:491)
        at clojure.core$map$fn__5884.invoke(core.clj:2757)
        at clojure.lang.LazySeq.sval(LazySeq.java:42)
        at clojure.lang.LazySeq.seq(LazySeq.java:51)
        at clojure.lang.RT.seq(RT.java:535)
        at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:44)
        at clojure.core$vec.invokeStatic(core.clj:379)
        at clojure.core$Throwable__GT_map.invokeStatic(core_print.clj:492)
        at clojure.main$report_error.invokeStatic(main.clj:593)
        at clojure.main$main.invokeStatic(main.clj:666)
        at clojure.main$main.doInvoke(main.clj:616)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at clojure.lang.Var.applyTo(Var.java:705)
        at clojure.main.main(main.java:40)

which points here.

This failure indicates that f is null, and it's trying to call (.invoke NULL val (first s)). f should never be null within this function.

Here is the annotated bytecode for this function:

// all clojure functions implement the IFn interface, here via AFunction
public final class clojure.core$reduce1 extends clojure.lang.AFunction {
  public clojure.core$reduce1();
    Code:
       0: aload_0
       1: invokespecial #9                  // Method clojure/lang/AFunction."<init>":()V
       4: return

// the impl for the three-arity invoke call with args [f val coll]
// arg0 = f, arg1 = val, arg2 = coll
// the two "recur" expressions within the body of the source will GOTO the head of this method (AKA bytecode 0)
  public static java.lang.Object invokeStatic(java.lang.Object, java.lang.Object, java.lang.Object);
    Code:
       0: aload_2            // load arg 2 onto stack "coll"
       1: aconst_null        // clear arg 2 "coll" because last usage within function body
       2: astore_2           // 

                             // (seq coll)
       3: invokestatic  #16                 // Method clojure/core$seq__5419.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: astore_3           // bind "s" to (seq coll) in local variable 3
       7: aload_3            // load "s" (seq coll)
       8: dup
       9: ifnull        104  // clojure conditions check for null, then Boolean.FALSE, everything else is truthy
      12: getstatic     #22                 // Field java/lang/Boolean.FALSE:Ljava/lang/Boolean;
      15: if_acmpeq     105
      18: aload_3            // then clause of first conditional L941
                             // test for (chunked-seq? s)
      19: invokestatic  #25                 // Method clojure/core$chunked_seq_QMARK_.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
      22: dup
      23: ifnull        69   // another conditional L942
      26: getstatic     #22                 // Field java/lang/Boolean.FALSE:Ljava/lang/Boolean;
      29: if_acmpeq     70
                             // then clause of second conditional L943
      32: aload_0            // push "f" onto stack
      33: aload_3            // push "s" onto stack
                             // call chunk-first on "s"
      34: invokestatic  #28                 // Method clojure/core$chunk_first.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
      37: checkcast     #30                 // class clojure/lang/IChunk
      40: aload_0            // push "f" onto stack
      41: aconst_null        // now clear "f" local variable
      42: astore_0           //
      43: checkcast     #32                 // class clojure/lang/IFn
      46: aload_1            // push "val" onto stack
      47: aconst_null        // clear "val" local variable
      48: astore_1           //
                             // call (.reduce (chunk-first s) f val)
      49: invokeinterface #36,  3           // InterfaceMethod clojure/lang/IChunk.reduce:(Lclojure/lang/IFn;Ljava/lang/Object;)Ljava/lang/Object;
      54: aload_3            // push "s" onto stack
      55: aconst_null        // now can clear "s" (last usage)
      56: astore_3           // 
                             // (chunk-next s)
      57: invokestatic  #39                 // Method clojure/core$chunk_next.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
                             // recur to top of this function with the contents of the stack
                             // lv2=(chunk-next s), lv1=(.reduce (chunk-first s) f val), lv0=f
      60: astore_2
      61: astore_1
      62: astore_0
      63: goto          0    // </recur>
      66: nop
      67: nop
      68: athrow
      69: pop
      70: aload_0            // L946 else branch of second conditional; push "f" onto stack 
      71: aload_0            // push "f" again 
      72: aconst_null        // clear "f" because last usage
      73: astore_0
      74: checkcast     #32                 // class clojure/lang/IFn
      77: aload_1            // push "val" onto stack
      78: aconst_null        // clear "val" after last usage
      79: astore_1
      80: aload_3            // push "s" onto stack
                             // call (first s)
      81: invokestatic  #42                 // Method clojure/core$first__5401.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
                             // call (f val (first s))
      84: invokeinterface #46,  3           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      89: aload_3            // push "s" onto stack
      90: aconst_null        // clear "s" after last usage
      91: astore_3
                             // call (next s)
      92: invokestatic  #49                 // Method clojure/core$next__5403.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
                             // recur to top of this function with the contents of the stack
                             // lv2 = (next s), lv1 = (f val (first s)), lv0 = f
      95: astore_2
      96: astore_1
      97: astore_0
      98: goto          0    // </recur>
     101: nop
     102: nop
     103: athrow
     104: pop
     105: aload_1
     106: aconst_null
     107: astore_1
     108: areturn

// IFn interface method, delegates directly to invokeStatic(O,O,O) above
  public java.lang.Object invoke(java.lang.Object, java.lang.Object, java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null   // locals clearing for arg1
       2: astore_1      //
       3: aload_2
       4: aconst_null   // locals clearing for arg2
       5: astore_2      //
       6: aload_3
       7: aconst_null   // locals clearing for arg3
       8: astore_3      //
       9: invokestatic  #60                 // Method invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      12: areturn

// the impl for the three-arity invoke call with args [f val coll]
  public static java.lang.Object invokeStatic(java.lang.Object, java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: invokestatic  #16                 // Method clojure/core$seq__5419.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: astore_2
       7: aload_2
       8: dup
       9: ifnull        37
      12: getstatic     #22                 // Field java/lang/Boolean.FALSE:Ljava/lang/Boolean;
      15: if_acmpeq     38
      18: aload_0
      19: aconst_null
      20: astore_0
      21: aload_2
      22: invokestatic  #42                 // Method clojure/core$first__5401.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
      25: aload_2
      26: aconst_null
      27: astore_2
      28: invokestatic  #49                 // Method clojure/core$next__5403.invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
      31: invokestatic  #60                 // Method invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      34: goto          49
      37: pop
      38: aload_0
      39: aconst_null
      40: astore_0
      41: checkcast     #32                 // class clojure/lang/IFn
      44: invokeinterface #63,  1           // InterfaceMethod clojure/lang/IFn.invoke:()Ljava/lang/Object;
      49: areturn

// IFn interface method, delegates directly to invokeStatic(O,O) above
  public java.lang.Object invoke(java.lang.Object, java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: aload_2
       4: aconst_null
       5: astore_2
       6: invokestatic  #65                 // Method invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
       9: areturn

  public static {};
    Code:
       0: return
}
@fjeremic
Copy link
Contributor

Given the intermittency perhaps it is worth running without the JIT. Perhaps we have a bug in the JIT compiler which can introduce such intermittency. You can run without the JIT buy adding -Xint to the command line. If you still get an NPE then it rules the JIT out of the picture. If you cannot get the NPE then the result is inconclusive (as the timings have changed).

@ghadishayban
Copy link
Author

Cannot reproduce it with -Xint, suggesting JIT.

@fjeremic
Copy link
Contributor

It is likely JIT then, but not conclusive. What is the reproducibility rate on your hardware? Does it fail 1/10 for example or 1/100, etc.? That will dictate how the investigation should proceed. I'll also try and spin up a VM and see if I can reproduce using your instructions above. That may greatly speed up the investigation.

@ghadishayban
Copy link
Author

0/1000 on -Xint, ~5/21 on JIT

@fjeremic
Copy link
Contributor

Thanks! So 1/4 roughly. That's fairly good to investigate. I'm going to try and reproduce on a VM now and see if I can get the same reproduction rate.

@ghadishayban
Copy link
Author

It's frequent enough that new Clojure users using J9 run into it nearly immediately

@pshipton
Copy link
Member

pshipton commented Mar 11, 2021

There are newer builds you could try to see if the issue is resolved. The Java 15 support cycle is done, replaced with Java 16 shortly.

Java 16 on 0.25.0 m2 https://github.com/AdoptOpenJDK/openjdk16-binaries/releases/tag/jdk-16%2B36_openj9-0.25.0-m2

  • Java 16 0.25.0 releases next Tues March 16

Java 16 on 0.26.0 m1 https://github.com/AdoptOpenJDK/openjdk16-binaries/releases/tag/jdk-16%2B36_openj9-0.26.0-m1

  • will ship as 16.0.1 in April
  • this is branched from the head stream just last weekend

@fjeremic
Copy link
Contributor

I used Peter's second link above and can reproduce an NPE which happens much earlier in the run if I force JIT compilation on one specific method at no optimization:

clojure/core$filter_key.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;

It is deterministically reproducible. If I exclude this method we progress but get an NPE in a similar method. Here is the command line dump for reference:

root@kulaks1:~/clojure# /root/jdk-16+36/bin/java -version
openjdk version "16" 2021-03-16
OpenJDK Runtime Environment AdoptOpenJDK (build 16+36)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.26.0-m1, JRE 16 Linux amd64-64-Bit Compressed References 20210309_67 (JIT enabled, AOT enabled)
OpenJ9   - b227feba2
OMR      - 4665e2f72
JCL      - 7d78cfa0dc2 based on jdk-16+36)

root@kulaks1:~/clojure# /root/jdk-16+36/bin/java -XX:-ShowCodeDetailsInExceptionMessages '-Xjit:count=1,disableAsyncCompilation,limit={clojure/core$filter_key.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;}(traceFull,log=invokeStatic.trace,lastOptIndex=0,lastOptTransformationIndex=0)' '-Xdump:java+system:events=throw,filter=java/lang/NullPointerException,range=1..1' -jar clojure.jar -e '(vec (range 10))'

JVMDUMP039I Processing dump event "throw", detail "java/lang/NullPointerException" at 2021/03/11 09:06:15 - please wait.
JVMDUMP032I JVM requested System dump using '/root/clojure/core.20210311.090615.8288.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.8310.

JVMDUMP010I System dump written to /root/clojure/core.20210311.090615.8288.0001.dmp
JVMDUMP032I JVM requested Java dump using '/root/clojure/javacore.20210311.090615.8288.0002.txt' in response to an event
JVMDUMP010I Java dump written to /root/clojure/javacore.20210311.090615.8288.0002.txt
JVMDUMP013I Processed dump event "throw", detail "java/lang/NullPointerException".
Exception in thread "main" java.lang.ExceptionInInitializerError
	at java.base/java.lang.J9VMInternals.ensureError(J9VMInternals.java:184)
	at java.base/java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:173)
	at java.base/java.lang.Class.forNameImpl(Native Method)
	at java.base/java.lang.Class.forName(Class.java:420)
	at clojure.lang.RT.classForName(RT.java:2212)
	at clojure.lang.RT.classForName(RT.java:2221)
	at clojure.lang.RT.loadClassForName(RT.java:2240)
	at clojure.lang.RT.load(RT.java:449)
	at clojure.lang.RT.load(RT.java:424)
	at clojure.core$load$fn__6856.invoke(core.clj:6115)
	at clojure.core$load.invokeStatic(core.clj:6114)
	at clojure.core$load.doInvoke(core.clj:6098)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invokeStatic(core.clj:5897)
	at clojure.core$load_one.invoke(core.clj:5892)
	at clojure.core$load_lib$fn__6796.invoke(core.clj:5937)
	at clojure.core$load_lib.invokeStatic(core.clj:5936)
	at clojure.core$load_lib.doInvoke(core.clj:5917)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:669)
	at clojure.core$load_libs.invokeStatic(core.clj:5974)
	at clojure.core$load_libs.doInvoke(core.clj:5958)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invokeStatic(core.clj:669)
	at clojure.core$require.invokeStatic(core.clj:5996)
	at clojure.core$require.doInvoke(core.clj:5996)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$serialized_require.invokeStatic(core.clj:6068)
	at clojure.core$requiring_resolve.invokeStatic(core.clj:6077)
	at clojure.main$report_error$fn__9184$fn__9185.invoke(main.clj:602)
	at clojure.main$report_error$fn__9184.invoke(main.clj:602)
	at clojure.main$report_error.invokeStatic(main.clj:601)
	at clojure.main$main.invokeStatic(main.clj:666)
	at clojure.main$main.doInvoke(main.clj:616)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.lang.Var.applyTo(Var.java:705)
	at clojure.main.main(main.java:40)
Caused by: java.lang.NullPointerException
	at clojure.core$refer.invokeStatic(core.clj:4234)
	at clojure.core$refer.doInvoke(core.clj:4205)
	at clojure.lang.RestFn.applyTo(RestFn.java:139)
	at clojure.core$apply.invokeStatic(core.clj:669)
	at clojure.core$load_lib.invokeStatic(core.clj:5955)
	at clojure.core$load_lib.doInvoke(core.clj:5917)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:669)
	at clojure.core$load_libs.invokeStatic(core.clj:5974)
	at clojure.core$load_libs.doInvoke(core.clj:5958)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$use.invokeStatic(core.clj:6082)
	at clojure.pprint$loading__6737__auto____9921.invoke(pprint.clj:14)
	at clojure.pprint__init.load(Unknown Source)
	at clojure.pprint__init.<clinit>(Unknown Source)
	... 36 more

I've confirmed when we run with -Xint we do not throw any NPEs in the entire run, so the first NPE thrown is caused by compiling that particular JIT method. Unfortunately the JIT method is nowhere on the backtrace of the exception, so I'm not really sure how it plays part into producing the error observed.

@fjeremic
Copy link
Contributor

Here are the first couple of stack slots which are all interpreted from the above core dump:

<16f00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x0000000000016F00 ***
<16f00>         ITERATE_O_SLOTS
<16f00>         RECORD_BYTECODE_PC_OFFSET
<16f00> Initial values: walkSP = 0x000000000011C680, PC = 0x0000000000000001, literals = 0x0000000000000000, A0 = 0x000000000011C698, j2iFrame = 0x0000000000000000, ELS = 0x00007F543DADD390, decomp = 0x0000000000000000
<16f00> Generic special frame: bp = 0x000000000011C698, sp = 0x000000000011C680, pc = 0x0000000000000001, cp = 0x0000000000000000, arg0EA = 0x000000000011C698, flags = 0x0000000000000000
<16f00> Bytecode frame: bp = 0x000000000011C6B0, sp = 0x000000000011C6A0, pc = 0x00007F543849F05A, cp = 0x000000000038FEB0, arg0EA = 0x000000000011C748, flags = 0x0000000000000000
<16f00>         Method: clojure/core$refer.invokeStatic(Ljava/lang/Object;Lclojure/lang/ISeq;)Ljava/lang/Object; !j9method 0x0000000000390AB0
<16f00>         Bytecode index = 542
<16f00>         Using local mapper
<16f00>         Locals starting at 0x000000000011C748 for 0x0000000000000013 slots
<16f00>                 I-Slot: a0[0x000000000011C748] = 0x00000000FFEEAAD0
<16f00>                 I-Slot: a1[0x000000000011C740] = 0x0000000000000000
<16f00>                 O-Slot: t2[0x000000000011C738] = 0x00000000FFF8F888
<16f00>                 I-Slot: t3[0x000000000011C730] = 0x0000000000000000
<16f00>                 O-Slot: t4[0x000000000011C728] = 0x0000000000000000
<16f00>                 O-Slot: t5[0x000000000011C720] = 0x0000000083AC0DC8
<16f00>                 O-Slot: t6[0x000000000011C718] = 0x00000000FFF20308
<16f00>                 I-Slot: t7[0x000000000011C710] = 0x0000000000000000
<16f00>                 O-Slot: t8[0x000000000011C708] = 0x0000000083AC0680
<16f00>                 O-Slot: t9[0x000000000011C700] = 0x00000000FFF20370
<16f00>                 I-Slot: t10[0x000000000011C6F8] = 0x000000000021B540
<16f00>                 I-Slot: t11[0x000000000011C6F0] = 0x0000000000000001
<16f00>                 I-Slot: t12[0x000000000011C6E8] = 0x00007F54385260E4
<16f00>                 I-Slot: t13[0x000000000011C6E0] = 0x0000000000000000
<16f00>                 O-Slot: t14[0x000000000011C6D8] = 0x00000000FFEEAB10
<16f00>                 I-Slot: t15[0x000000000011C6D0] = 0x0000000000000000
<16f00>                 I-Slot: t16[0x000000000011C6C8] = 0x0000000000000000
<16f00>                 I-Slot: t17[0x000000000011C6C0] = 0x0000000000451080
<16f00>                 I-Slot: t18[0x000000000011C6B8] = 0x00000000FFF1D6C0
<16f00> Bytecode frame: bp = 0x000000000011C760, sp = 0x000000000011C750, pc = 0x00007F543849F949, cp = 0x000000000038FEB0, arg0EA = 0x000000000011C778, flags = 0x0000000000000000
<16f00>         Method: clojure/core$refer.doInvoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; !j9method 0x0000000000390AD0
<16f00>         Bytecode index = 9
<16f00>         Using local mapper
<16f00>         Locals starting at 0x000000000011C778 for 0x0000000000000003 slots
<16f00>                 I-Slot: a0[0x000000000011C778] = 0x0000000083C3BF90
<16f00>                 I-Slot: a1[0x000000000011C770] = 0x0000000000000000
<16f00>                 I-Slot: a2[0x000000000011C768] = 0x0000000000000000
<16f00> Bytecode frame: bp = 0x000000000011C790, sp = 0x000000000011C780, pc = 0x00007F5438346B33, cp = 0x0000000000200890, arg0EA = 0x000000000011C7A0, flags = 0x0000000000000000
<16f00>         Method: clojure/lang/RestFn.applyTo(Lclojure/lang/ISeq;)Ljava/lang/Object; !j9method 0x00000000001FFF98
<16f00>         Bytecode index = 159
<16f00>         Using local mapper
<16f00>         Locals starting at 0x000000000011C7A0 for 0x0000000000000002 slots
<16f00>                 I-Slot: a0[0x000000000011C7A0] = 0x0000000083C3BF90
<16f00>                 I-Slot: a1[0x000000000011C798] = 0x0000000000000000

I guess we would have to figure out what bytecode index 542 in this top method is doing and how it possibly relates to the JIT compiled method. @ghadishayban any ideas on this?

Subscribing @0xdaryl as this could very well be an x86 codegen issue given it reproduces at lastOptIndex=0,lastOptTransformationIndex=0 so the optimizer is definitely out of the picture. It can also be an ILGen problem as well.

@ghadishayban
Copy link
Author

ghadishayban commented Mar 11, 2021

Bytecode 542 in clojure/core$refer::invokeStatic is this expression, which invokes IFn::invoke(O,O) with the arguments nspublics, a local variable (number 4) bound to an instance of a persistent hash map, and sym which is apparently stored in local variable number 14.

       535: aload         4
       537: checkcast     #161                // class clojure/lang/IFn
       540: aload         14
       542: invokeinterface #164,  2          // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
       547: astore        15
       549: aload         15
       551: dup
       552: ifnull        566

@ghadishayban
Copy link
Author

tried it on

openjdk version "16" 2021-03-16
OpenJDK Runtime Environment AdoptOpenJDK (build 16+36)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.26.0-m1, JRE 16 Linux amd64-64-Bit Compressed References 20210309_67 (JIT enabled, AOT enabled)
OpenJ9   - b227feba2
OMR      - 4665e2f72
JCL      - 7d78cfa0dc2 based on jdk-16+36)

same issues.

@fjeremic
Copy link
Contributor

Bytecode 542 in clojure/core$refer::invokeStatic is this expression, which invokes IFn::invoke(O,O) with the arguments nspublics, a local variable (number 4) bound to an instance of a persistent hash map, and sym which is apparently stored in local variable number 14.

Do you might have an idea how this relates to clojure/core$filter_key.invokeStatic method?

@ghadishayban
Copy link
Author

@fjeremic
clojure/core$filter_key.invokeStatic is called on this line via the function ns-publics (note the dash). The result of that call is bound to nspublics (a local variable, note no dash). When it's bound to the local variable, filter_key.invokeStatic should no longer be in the stack frame.

@ghadishayban
Copy link
Author

If I set jit count=0 and compile only reduce1::invokeStatic, it's 100% reproducible

java \
  -XX:-ShowCodeDetailsInExceptionMessages \
  '-Xjit:count=0,disableAsyncCompilation,limit={clojure/core$reduce1.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;}(traceFull,log=invokeStatic.trace,lastOptIndex=0,lastOptTransformationIndex=0)' \
  '-Xdump:java+system:events=throw,filter=java/lang/NullPointerException,range=1..1' \
  -jar clojure.jar \
  -e nil

@ghadishayban
Copy link
Author

@fjeremic
Copy link
Contributor

I used @ghadishayban's command above to confirm:

/root/jdk-16+36/bin/java -XX:-ShowCodeDetailsInExceptionMessages '-Xjit:disableAsyncCompilation,limit={clojure/core$reduce1.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;}(traceFull,log=invokeStatic.trace,lastOptIndex=0,lastOptTransformationIndex=0,count=0),disableGuardedCountingRecompilation,verbose' -Xdump:java+system:events=throw,filter=java/lang/NullPointerException#clojure/lang/LazySeq.sval#5,range=1..1 -jar clojure.jar -e '(vec (range 10))'

This generates a javacore, JIT trace log, and system core dump when the exception occurs. Looking at the stackslots of the failed thread we see the following:

<16f00> Initial values: walkSP = 0x000000000011D0D8, PC = 0x0000000000000005, literals = 0x0000000000000000, A0 = 0x000000000011D0F8, j2iFrame = 0x0000000000000000, ELS = 0x00007FDD199EBA90, decomp = 0x0000000000000000
<16f00> JIT resolve frame: bp = 0x000000000011D0F8, sp = 0x000000000011D0D8, pc = 0x0000000000000005, cp = 0x0000000000000000, arg0EA = 0x000000000011D0F8, flags = 0x0000000000080000
<16f00>         Object push (savedJITException)
<16f00>                 O-Slot[0x000000000011D0D8] = 0x0000000000000000
<16f00>         Generic resolve
<16f00>         unwindSP initialized to 0x000000000011D100
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB940] = UDATA(0x0000000000000000) (jit_rax)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB948] = UDATA(0x00000000FFD92358) (jit_rbx)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB950] = UDATA(0x00007FDCF62EE255) (jit_rcx)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB958] = UDATA(0x0000000000000000) (jit_rdx)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB960] = UDATA(0x00000000604AA430) (jit_rdi)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB968] = UDATA(0x00000000FFD92358) (jit_rsi)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB970] = UDATA(0x0000000000016F00) (jit_rbp)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB978] = UDATA(0x0000000000000034) (jit_rsp)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB980] = UDATA(0x00000000FFD92300) (jit_r8)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB988] = UDATA(0x0000000000000000) (jit_r9)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB990] = UDATA(0x00000000604AA430) (jit_r10)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB998] = UDATA(0x000000001895C9CF) (jit_r11)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB9A0] = UDATA(0x00007FDD1405E518) (jit_r12)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB9A8] = UDATA(0x00007FDD191E8360) (jit_r13)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB9B0] = UDATA(0x00007FDD140573F0) (jit_r14)
<16f00>         JIT-Resolve-RegisterMap[0x00007FDD199EB9B8] = UDATA(0x00007FDD18400C9E) (jit_r15)
<16f00> JIT frame: bp = 0x000000000011D138, pc = 0x00007FDCF62EE25F, unwindSP = 0x000000000011D100, cp = 0x00000000002FABE0, arg0EA = 0x000000000011D150, jitInfo = 0x00007FDCBC64E038
<16f00>         Method: clojure/core$reduce1.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; !j9method 0x00000000002FADB8
<16f00>         Bytecode index = 86, inlineDepth = 0, PC offset = 0x00000000000001F7
<16f00>         stackMap=0x00007FDCBC64E188, slots=I16(0x0003) parmBaseOffset=I16(0x0008), parmSlots=U16(0x0003), localBaseOffset=I16(0xFFF0)
<16f00>         Described JIT args starting at 0x000000000011D140 for U16(0x0003) slots
<16f00>                 O-Slot: : a2[0x000000000011D140] = 0x0000000000000000
<16f00>                 O-Slot: : a1[0x000000000011D148] = 0x0000000000000000
<16f00>                 O-Slot: : a0[0x000000000011D150] = 0x0000000000000000
<16f00>         Described JIT temps starting at 0x000000000011D128 for IDATA(0x0000000000000002) slots
<16f00>                 O-Slot: : t1[0x000000000011D128] = 0x00000000839D93F0
<16f00>                 O-Slot: : t0[0x000000000011D130] = 0x00000000FFD92C90

Then mapping the PC to the JIT log we can see:

0x7fdcf62ee255 000001ed [0x7fdcf5329c50]                                    Label L0156:                        # LABEL # (End of internal control flow)
0x7fdcf62ee255 000001ed [0x7fdcf53361a0] 48 8b f3                           mov rsi, rbx                # MOV8RegReg
0x7fdcf62ee258 000001f0 [0x7fdcf5336110] 48 8b d0                           mov rdx, rax                # MOV8RegReg
0x7fdcf62ee25b 000001f3 [0x7fdcf532ad60] 49 8b c1                           mov rax, r9         # MOV8RegReg
0x7fdcf62ee25e 000001f6 [0x7fdcf532af80] 41 8b 39                           mov edi, dword ptr [r9]             # L4RegMem, SymRef  <vft-symbol>[#288  Shadow] [flags 0x18607 0x0 ]
0x7fdcf62ee261 000001f9 [0x7fdcf532b010] 48 81 e7 00 ff ff ff               and rdi, 0xffffffffffffff00 # AND8RegImm4

The SIGSEGV occurred at offset 1F6 when we dereferenced r9 which was generated form the following tree:

------------------------------
 n96n     (  0)  ResolveAndNULLCHK on n80n [#32]                                                      [0x7fdcf526ddc0] bci=[-1,86,946] rc=0 vc=7 vn=- li=8 udi=- nc=1
 n95n     (  1)    acalli  clojure/lang/IFn.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;[#372  unresolved interface Method] (Interface class) [flags 0x400 0x0 ] (
 n94n     (  0)      aloadi  <vft-symbol>[#288  Shadow] [flags 0x18607 0x0 ] (in GPR_0163)            [0x7fdcf526dd20] bci=[-1,86,946] rc=0 vc=7 vn=- li=8 udi=44800 nc=1
 n80n     (  0)        ==>aload (in &GPR_0145) (X!=0 sharedMemory )
 n80n     (  0)      ==>aload (in &GPR_0145) (X!=0 sharedMemory )
 n87n     (  0)      ==>aload (in &GPR_0147)
 n92n     (  0)      ==>acall (in &GPR_0160)
------------------------------

 [0x7fdcf532ad60]       mov     &GPR_0161, &GPR_0145            # MOV8RegReg
 [0x7fdcf532af80]       mov     GPR_0163, dword ptr [&GPR_0145]         # L4RegMem, SymRef  <vft-symbol>[#288  Shadow] [flags 0x18607 0x0 ]
 [0x7fdcf532b010]       and     GPR_0163, 0xffffffffffffff00    # AND8RegImm4

It looks like the receiver object was NULL here. This node n80n only seems to be referenced a few times, and it comes from an input parameter to this method:

n7n       BBStart <block_8>                                                                   [0x7fdcf51f1b00] bci=[-1,70,944] rc=0 vc=7 vn=- li=8 udi=- nc=0
n105n     asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7fdcf526e090] bci=[-1,98,946] rc=0 vc=7 vn=- li=8 udi=- nc=1
n112n       lcmpeq                                                                            [0x7fdcf526e2c0] bci=[-1,98,946] rc=1 vc=7 vn=- li=- udi=- nc=2
n110n         lload  stackOverflowMark[#376  MethodMeta +80] [flags 0x200 0x0 ]               [0x7fdcf526e220] bci=[-1,98,946] rc=1 vc=7 vn=- li=- udi=- nc=0
n111n         lconst -1                                                                       [0x7fdcf526e270] bci=[-1,98,946] rc=1 vc=7 vn=- li=- udi=- nc=0
n83n      treetop                                                                             [0x7fdcf526d9b0] bci=[-1,73,944] rc=0 vc=7 vn=- li=8 udi=- nc=1
n79n        aload  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]           [0x7fdcf526d870] bci=[-1,70,944] rc=1 vc=7 vn=- li=8 udi=- nc=0
n84n      treetop                                                                             [0x7fdcf526da00] bci=[-1,73,944] rc=0 vc=5 vn=- li=8 udi=- nc=1
n80n        aload  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]           [0x7fdcf526d8c0] bci=[-1,71,944] rc=4 vc=5 vn=- li=8 udi=- nc=0
n82n      astore  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]            [0x7fdcf526d960] bci=[-1,73,944] rc=0 vc=5 vn=- li=8 udi=- nc=1
n81n        aconst NULL                                                                       [0x7fdcf526d910] bci=[-1,72,944] rc=1 vc=5 vn=- li=8 udi=- nc=0
n86n      checkcast [#86]                                                                     [0x7fdcf526daa0] bci=[-1,74,946] rc=0 vc=5 vn=- li=8 udi=- nc=2
n80n        ==>aload
n85n        loadaddr  clojure/lang/IFn[#366  Static] [flags 0x18307 0x0 ]                     [0x7fdcf526da50] bci=[-1,74,946] rc=1 vc=5 vn=- li=8 udi=- nc=0
n90n      treetop                                                                             [0x7fdcf526dbe0] bci=[-1,79,946] rc=0 vc=5 vn=- li=8 udi=- nc=1
n87n        aload  val<parm 1 Ljava/lang/Object;>[#354  Parm] [flags 0xc0000107 0x0 ]         [0x7fdcf526daf0] bci=[-1,77,946] rc=2 vc=5 vn=- li=8 udi=- nc=0
n89n      astore  val<parm 1 Ljava/lang/Object;>[#354  Parm] [flags 0xc0000107 0x0 ]          [0x7fdcf526db90] bci=[-1,79,946] rc=0 vc=5 vn=- li=8 udi=- nc=1
n88n        aconst NULL                                                                       [0x7fdcf526db40] bci=[-1,78,946] rc=1 vc=5 vn=- li=8 udi=- nc=0
n93n      treetop                                                                             [0x7fdcf526dcd0] bci=[-1,81,946] rc=0 vc=5 vn=- li=8 udi=- nc=1
n92n        acall  clojure/core$first__5401.invokeStatic(Ljava/lang/Object;)Ljava/lang/Object;[#371  final static Method] [flags 0x20500 0x0 ]  [0x7fdcf526dc80] bci=[-1,81,946] rc=
n91n          aload  s<auto slot 3>[#358  Auto] [flags 0x7 0x0 ]                              [0x7fdcf526dc30] bci=[-1,80,946] rc=1 vc=5 vn=- li=8 udi=- nc=0
n96n      ResolveAndNULLCHK on n80n [#32]                                                     [0x7fdcf526ddc0] bci=[-1,86,946] rc=0 vc=5 vn=- li=8 udi=- nc=1
n95n        acalli  clojure/lang/IFn.invoke(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;[#372  unresolved interface Method] (Interface class) [flags 0x400 0x0 ] (sharedM
n94n          aloadi  <vft-symbol>[#288  Shadow] [flags 0x18607 0x0 ]                         [0x7fdcf526dd20] bci=[-1,86,946] rc=1 vc=5 vn=- li=8 udi=- nc=1
n80n            ==>aload
n80n          ==>aload
n87n          ==>aload
n92n          ==>acall

It's a bit odd because the bytecode seems to zero out symref #353 after the load, see node n82n. There is also another block which zeroes out this slot as well:

n46n      BBStart <block_6>                                                                   [0x7fdcf51f2730] bci=[-1,32,942] rc=0 vc=7 vn=- li=6 udi=- nc=0
n77n      asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7fdcf526d7d0] bci=[-1,63,944] rc=0 vc=7 vn=- li=6 udi=- nc=1
n109n       lcmpeq                                                                            [0x7fdcf526e1d0] bci=[-1,63,944] rc=1 vc=7 vn=- li=- udi=- nc=2
n107n         lload  stackOverflowMark[#375  MethodMeta +80] [flags 0x200 0x0 ]               [0x7fdcf526e130] bci=[-1,63,944] rc=1 vc=7 vn=- li=- udi=- nc=0
n108n         lconst -1                                                                       [0x7fdcf526e180] bci=[-1,63,944] rc=1 vc=7 vn=- li=- udi=- nc=0
n52n      treetop                                                                             [0x7fdcf526d000] bci=[-1,34,942] rc=0 vc=7 vn=- li=6 udi=- nc=1
n51n        acall  clojure/core$chunk_first.invokeStatic(Ljava/lang/Object;)Ljava/lang/Object;[#364  final static Method] [flags 0x20500 0x0 ]  [0x7fdcf51f28c0] bci=[-1,34,942] rc=
n50n          aload  s<auto slot 3>[#358  Auto] [flags 0x7 0x0 ]                              [0x7fdcf51f2870] bci=[-1,33,942] rc=1 vc=7 vn=- li=6 udi=- nc=0
n54n      checkcast [#86]                                                                     [0x7fdcf526d0a0] bci=[-1,37,942] rc=0 vc=7 vn=- li=6 udi=- nc=2
n51n        ==>acall
n53n        loadaddr  clojure/lang/IChunk[#365  Static] [flags 0x18307 0x0 ]                  [0x7fdcf526d050] bci=[-1,37,942] rc=1 vc=7 vn=- li=6 udi=- nc=0
n58n      treetop                                                                             [0x7fdcf526d1e0] bci=[-1,42,942] rc=0 vc=7 vn=- li=6 udi=- nc=1
n49n        aload  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]           [0x7fdcf51f2820] bci=[-1,32,942] rc=1 vc=7 vn=- li=6 udi=- 

nc=0
n59n      treetop                                                                             [0x7fdcf526d230] bci=[-1,42,942] rc=0 vc=7 vn=- li=6 udi=- nc=1
n55n        aload  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]           [0x7fdcf526d0f0] bci=[-1,40,942] rc=3 vc=7 vn=- li=6 udi=- nc=0
n57n      astore  f<parm 0 Ljava/lang/Object;>[#353  Parm] [flags 0xc0000107 0x0 ]            [0x7fdcf526d190] bci=[-1,42,942] rc=0 vc=7 vn=- li=6 udi=- nc=1
n56n        aconst NULL                                                                       [0x7fdcf526d140] bci=[-1,41,942] rc=1 vc=7 vn=- li=6 udi=- nc=0

Looking at the CFG of the method it looks like this:
image

There is a backedge in block_2 to block_3 and block_3 dominates both block_6 and block_8. This means if we had looped here we will have zeroed out that parameter slot and we will end up with the NPE observed.

Attaching the log for reference:
invokeStatic.trace.11278.4430.20210311.151350.11278.txt

@fjeremic
Copy link
Contributor

block_2 is the entry block. I'm wondering if it supposed to be impossible to execute block_8 multiple times. That seems to be the only place where we dereference this particular symref.

n9n       BBStart <block_2>                                                                   [0x7fdcf51f1ba0] bci=[-1,0,932] rc=0 vc=7 vn=- li=2 udi=- nc=0
n14n      treetop                                                                             [0x7fdcf51f1d30] bci=[-1,2,932] rc=0 vc=7 vn=- li=2 udi=- nc=1
n11n        aload  coll<parm 2 Ljava/lang/Object;>[#355  Parm] [flags 0xc0000107 0x0 ]        [0x7fdcf51f1c40] bci=[-1,0,932] rc=2 vc=7 vn=- li=2 udi=- nc=0
n13n      astore  coll<parm 2 Ljava/lang/Object;>[#355  Parm] [flags 0xc0000107 0x0 ]         [0x7fdcf51f1ce0] bci=[-1,2,932] rc=0 vc=7 vn=- li=2 udi=- nc=1
n12n        aconst NULL                                                                       [0x7fdcf51f1c90] bci=[-1,1,932] rc=1 vc=7 vn=- li=2 udi=- nc=0
n16n      treetop                                                                             [0x7fdcf51f1dd0] bci=[-1,3,932] rc=0 vc=7 vn=- li=2 udi=- nc=1
n15n        acall  clojure/core$seq__5419.invokeStatic(Ljava/lang/Object;)Ljava/lang/Object;[#357  final static Method] [flags 0x20500 0x0 ]  [0x7fdcf51f1d80] bci=[-1,3,932] rc=2 vn11n          ==>aload
n17n      astore  s<auto slot 3>[#358  Auto] [flags 0x7 0x0 ]                                 [0x7fdcf51f1e20] bci=[-1,6,932] rc=0 vc=7 vn=- li=2 udi=- nc=1
n15n        ==>acall
n22n      astore  <pending push temp 0>[#359  Auto] [flags 0x7 0x800 ]                        [0x7fdcf51f1fb0] bci=[-1,9,941] rc=0 vc=7 vn=- li=2 udi=- nc=1
n18n        aload  s<auto slot 3>[#358  Auto] [flags 0x7 0x0 ]                                [0x7fdcf51f1e70] bci=[-1,7,941] rc=2 vc=7 vn=- li=2 udi=- nc=0
n25n      ifacmpeq --> block_9 BBStart at n1n ()                                              [0x7fdcf51f20a0] bci=[-1,9,941] rc=0 vc=7 vn=- li=2 udi=- nc=2 flg=0x20
n18n        ==>aload
n19n        aconst NULL                                                                       [0x7fdcf51f1ec0] bci=[-1,9,941] rc=1 vc=7 vn=- li=2 udi=- nc=0
n10n      BBEnd </block_2> =====                                                              [0x7fdcf51f1bf0] bci=[-1,9,941] rc=0 vc=7 vn=- li=2 udi=- nc=0

@fjeremic
Copy link
Contributor

Still fails even with disabling typical suspect things:

/root/jdk-16+36/bin/java -XX:-ShowCodeDetailsInExceptionMessages '-Xjit:disableAsyncCompilation,limit={clojure/core$reduce1.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;}(traceFull,log=invokeStatic.trace,lastOptIndex=0,lastOptTransformationIndex=0,count=10),disableGuardedCountingRecompilation,verbose,disableVirtualGuardNOPing,disableNextGenHCR,disableOSR,disableMergeStackMaps' -XX:-EnableHCR -Xdump:java+system:events=throw,filter=java/lang/NullPointerException#clojure/lang/LazySeq.sval#5,range=1..1 -jar clojure.jar -e '(vec (range 10))'

I'm not really sure how to proceed here. Given the above analysis there exists a path through which we can get the observed NPE, but the fact that it only happens when we JIT compile this one method is troublesome. There must be something wrong happening.

@0xdaryl / @liqunl any ideas on how to proceed for this one? We basically have a test which deterministically fails with NPE at noOpt when one single JIT method is compiled. There exists a path through the CFG which can exhibit the NPE that gets thrown because a certain slot is cleared right after it is loaded, so if we were to execute the same path twice we get an NPE. However I have no idea whether we should be executing such a path twice. The fact that it doesn't happen with interpreter makes me say no, or something else is happening. Not sure how to proceed on this one. Thought you may have ideas. It is super easy to reproduce. I have a Fyre image setup with the environment.

@liqunl
Copy link
Contributor

liqunl commented Mar 12, 2021

Will look at the trace log and reproduce on my vm

@ghadishayban
Copy link
Author

@liqunl @fjeremic

Hi, I have created a far simpler reproduction case that takes Clojure out of the picture -- just a class generated with ASM.

Should I add the class to the issue, or the output of the trace, or the ASM code that generated the class?

@ghadishayban
Copy link
Author

Here is the trace of calling this synthesized bad method directly

java -XX:-ShowCodeDetailsInExceptionMessages '-Xjit:count=0,disableAsyncCompilation,limit={BadJump.badMethod(Ljava/util/function/BiFunction;Ljava/lang/Object;Ljava/util/Iterator;)Ljava/lang/Object;}(traceFull,log=badness.trace,lastOptIndex=0,lastOptTransformationIndex=0)' -Xdump:java+system:events=throw,filter=java/lang/NullPointerException -cp . Runner

results in an NPE:

Exception in thread "main" java.lang.NullPointerException
        at BadJump.badMethod(Unknown Source)
        at Runner.main(Runner.java:13)

Bytecode for the offending method:

public class BadJump {
  public static java.lang.Object badMethod(java.util.function.BiFunction, java.lang.Object, java.util.Iterator);
    Code:
       0: aload_2
       1: invokeinterface #12,  1           // InterfaceMethod java/util/Iterator.hasNext:()Z
       6: ifeq          36
       9: aload_0
      10: aload_0
      11: aconst_null
      12: astore_0
      13: aload_1
      14: aconst_null
      15: astore_1
      16: aload_2
      17: invokeinterface #16,  1           // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
      22: invokeinterface #22,  3           // InterfaceMethod java/util/function/BiFunction.apply:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      27: aload_2
      28: aconst_null
      29: astore_2
      30: astore_2
      31: astore_1
      32: astore_0
      33: goto          0
      36: aload_1
      37: aconst_null
      38: astore_1
      39: areturn
}

@liqunl
Copy link
Contributor

liqunl commented Mar 12, 2021

@ghadishayban Thanks. Could you upload the class file?

@ghadishayban
Copy link
Author

Github didn't accept the class file, delivered them in slack.

The script to create the BadJump.class needs dependencies:

        <dependency>
            <groupId>org.ow2.asm</groupId>
            <artifactId>asm</artifactId>
            <version>9.1</version>
        </dependency>
        <dependency>
            <groupId>org.ow2.asm</groupId>
            <artifactId>asm-commons</artifactId>
            <version>9.1</version>
        </dependency>

Here is the script:

import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.Opcodes;
import org.objectweb.asm.Type;
import org.objectweb.asm.commons.GeneratorAdapter;
import org.objectweb.asm.commons.Method;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.function.BiFunction;

public class GenClass {
    static void pushAndClear(GeneratorAdapter gen, int lvar) {
        gen.loadArg(lvar);
        gen.visitInsn(Opcodes.ACONST_NULL);
        gen.storeArg(lvar);
    }

    static void genMethod(GeneratorAdapter gen) {
        var loopHeader = gen.mark();
        var loopEnd = gen.newLabel();

        // Iterator.hasNext()
        gen.loadArg(2);
        gen.invokeInterface(Type.getType(Iterator.class), Method.getMethod("boolean hasNext()"));

        // if false, jump to end of loop
        gen.ifZCmp(GeneratorAdapter.EQ, loopEnd);

        // BiFunction
        gen.loadArg(0);

        // BiFunction.apply(x, y)
        pushAndClear(gen,0);
        pushAndClear(gen,1);
        gen.loadArg(2);
        gen.invokeInterface(Type.getType(Iterator.class), Method.getMethod("Object next()"));
        gen.invokeInterface(Type.getType(BiFunction.class), Method.getMethod("Object apply(Object, Object)"));

        pushAndClear(gen,2);
        gen.storeArg(2);
        gen.storeArg(1);
        gen.storeArg(0);
        gen.goTo(loopHeader);

        gen.mark(loopEnd);
        pushAndClear(gen, 1);
        gen.returnValue();
        gen.endMethod();
    }

    static public void main(String[] argv) {
        var cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS);
        cw.visit(Opcodes.V11, Opcodes.ACC_PUBLIC, "BadJump", null, "java/lang/Object", null);
        var m = Method.getMethod("Object badMethod(java.util.function.BiFunction, Object, java.util.Iterator)");
        genMethod(new GeneratorAdapter(Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC, m, null, null, cw         ));
        cw.visitEnd();
        try(var os = new FileOutputStream("BadJump.class")) {
            os.write(cw.toByteArray());
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }
}

compile and run it and it creates BadJump.class

...to which you can link this Runner class:

import java.util.ArrayList;
import java.util.List;

public class Runner {

   static Object conj(Object list, Object item) {
      var l = (java.util.List) list;
      l.add(item);
      return l;
   }

public static void main(String[] argv) {
   var ret = BadJump.badMethod(Runner::conj, new ArrayList(), List.of(1, 2, 3, 4, 5).iterator());
   System.out.println(ret);
   System.exit(0);
}

}

...which if you compile and run with

java -XX:-ShowCodeDetailsInExceptionMessages '-Xjit:count=0,disableAsyncCompilation,limit={BadJump.badMethod(Ljava/util/function/BiFunction;Ljava/lang/Object;Ljava/util/Iterator;)Ljava/lang/Object;}(traceFull,log=badness.trace,lastOptIndex=0,lastOptTransformationIndex=0)' -Xdump:java+system:events=throw,filter=java/lang/NullPointerException -cp . Runner

...results in an NPE and a trace.

@liqunl
Copy link
Contributor

liqunl commented Mar 13, 2021

Sorry it turned out to be my code. I've opened #12213 to revert the problematic change. @pshipton The bug was introduced last September, do I need to do double delivery?

@pshipton
Copy link
Member

@liqunl pls create a PR for the v0.26.0-release branch as well. We may want to update the IBM 21_02 release branch as well if this affects Java 8.

@ghadishayban
Copy link
Author

I tested the proposed patch with OpenJ9 16 + Clojure: seems to be resolved.

Thanks everyone for diagnosing this tricky bug!

@ghadishayban
Copy link
Author

This bug also manifests on 8 & 11, versions below @liqunl @pshipton

openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-b08)
Eclipse OpenJ9 VM (build openj9-0.24.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20210120_930 (JIT enabled, AOT enabled)
OpenJ9   - 345e1b09e
OMR      - 741e94ea8
JCL      - ab07c6a8fd based on jdk8u282-b08)


openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 11 Linux amd64-64-Bit Compressed References 20210120_910 (JIT enabled, AOT enabled)
OpenJ9   - 345e1b09e
OMR      - 741e94ea8
JCL      - 0a86953833 based on jdk-11.0.10+9)

@fjeremic
Copy link
Contributor

Thanks @ghadishayban for the unit test. That helped a lot! The fix delivered #12221 backports this to the 0.26 release which will be generated for Java 8, 11, and 16. We use the same JIT compiler for all of those JVM levels so the fix will be there in 0.26 releases.

This release is scheduled for April 23, 2021:
https://projects.eclipse.org/projects/technology.openj9/releases/0.26.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants