Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdLineTester_jvmtitests_hcr_OSRG_nongold_1 Test-extended.functional-JDK10-linux_x86-64 rc004 gpf #2472

Closed
pshipton opened this issue Jul 25, 2018 · 13 comments · Fixed by #2569
Assignees

Comments

@pshipton
Copy link
Member

https://ci.eclipse.org/openj9/job/Test-extended.functional-JDK10-linux_x86-64/26

Testing: rc004
Test start time: 2018/07/25 07:30:50 Coordinated Universal Time
Running command: "/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdkbinary/j2sdk-image/bin/java"  -Xnocompressedrefs -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation -Xgcpolicy:gencon  -Xdump    -agentlib:jvmtitest=test:rc004 -cp "/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/lib/asm-all.jar" com.ibm.jvmti.tests.util.TestRunner
Time spent starting: 42 milliseconds
Time spent executing: 7667 milliseconds
Test result: FAILED
Output from test:
 [OUT] *** Testing [1/2]:	testReOrderingStaticFields
 [OUT] *** Test took 871 milliseconds
 [OUT] OK
 [OUT] 
 [OUT] *** Testing [2/2]:	testStaticFieldIDsAfterRedefine
 [ERR] Unhandled exception
 [ERR] Type=Segmentation error vmState=0x00000000
 [ERR] J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
 [ERR] Handler1=00007FCE9F7ECC10 Handler2=00007FCE9EBE05C0 InaccessibleAddress=0000000000000000
 [ERR] RDI=00007FCDF9D1EB10 RSI=0000000000000020 RAX=0000000000000000 RBX=00007FCE9853AA48
 [ERR] RCX=00007FCE983CD300 RDX=0000000000000000 R8=00007FCE985F4FC0 R9=0000000000000001
 [ERR] R10=0000000000000000 R11=0000000000000246 R12=00007FCE983CD300 R13=00007FCE97CDC998
 [ERR] R14=00007FCE9854F9A8 R15=00007FCDF9D1EB10
 [ERR] RIP=00007FCE9F7995BE GS=0000 FS=0000 RSP=00007FCDF9D1E5D0
 [ERR] EFlags=0000000000010202 CS=0033 RBP=00007FCDF9D1ECD0 ERR=0000000000000004
 [ERR] TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
 [ERR] xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm3 726574616c666e49 (f: 1818652288.000000, d: 1.144477e+243)
 [ERR] xmm4 dddddddddd00646e (f: 3707790336.000000, d: -1.456816e+144)
 [ERR] xmm5 6176616a5f617661 (f: 1600222848.000000, d: 3.146502e+161)
 [ERR] xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm7 00007fce98492770 (f: 2554930944.000000, d: 6.942872e-310)
 [ERR] xmm8 3bbcc86800000000 (f: 0.000000, d: 6.095003e-21)
 [ERR] xmm9 3fd466cb8aff6253 (f: 2331992576.000000, d: 3.187741e-01)
 [ERR] xmm10 3d75440d7bd58731 (f: 2077591296.000000, d: 1.208823e-12)
 [ERR] xmm11 402f22d27af00813 (f: 2062551040.000000, d: 1.556801e+01)
 [ERR] xmm12 bcca000000000000 (f: 0.000000, d: -7.216450e-16)
 [ERR] xmm13 bfd466cb8aff6260 (f: 2331992576.000000, d: -3.187741e-01)
 [ERR] xmm14 402e7f9c1e980d00 (f: 513281280.000000, d: 1.524924e+01)
 [ERR] xmm15 3c7f463c704039b0 (f: 1883257216.000000, d: 2.712618e-17)
 [ERR] Module=/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdkbinary/j2sdk-image/lib/default/libj9vm29.so
 [ERR] Module_base_address=00007FCE9F755000
 [ERR] Target=2_90_20180725_227 (Linux 4.4.0-130-generic)
 [ERR] CPU=amd64 (4 logical CPUs) (0x1f2ae6000 RAM)
 [ERR] ----------- Stack Backtrace -----------
 [ERR] (0x00007FCE9F7995BE [libj9vm29.so+0x445be])
 [ERR] (0x00007FCE9F790944 [libj9vm29.so+0x3b944])
 [ERR] (0x00007FCE9F827D72 [libj9vm29.so+0xd2d72])
 [ERR] ---------------------------------------
 [ERR] JVMDUMP039I Processing dump event "gpf", detail "" at 2018/07/25 07:30:57 - please wait.
 [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/core.20180725.073057.16888.0001.dmp' in response to an event
 [ERR] JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.16917.
 [ERR] 
 [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/core.20180725.073057.16888.0001.dmp
 [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/javacore.20180725.073057.16888.0002.txt' in response to an event
 [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/javacore.20180725.073057.16888.0002.txt
 [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/Snap.20180725.073057.16888.0003.trc' in response to an event
 [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/Snap.20180725.073057.16888.0003.trc
 [ERR] JVMDUMP007I JVM Requesting JIT dump using '/home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/jitdump.20180725.073057.16888.0004.dmp'
 [ERR] JVMDUMP010I JIT dump written to /home/jenkins/workspace/Test-extended.functional-JDK10-linux_x86-64/openjdk-tests/functional/cmdLineTests/jvmtitests/jitdump.20180725.073057.16888.0004.dmp
 [ERR] JVMDUMP013I Processed dump event "gpf", detail "".
>> Success condition was not found: [Return code: 0]
@pshipton pshipton changed the title cmdLineTester_jvmtitests_hcr_OSRG_nongold_1 Test-extended.functional-JDK10-linux_x86-64 gpf cmdLineTester_jvmtitests_hcr_OSRG_nongold_1 Test-extended.functional-JDK10-linux_x86-64 rc004 gpf Jul 25, 2018
@pshipton
Copy link
Member Author

@gacholio

@gacholio
Copy link
Contributor

I've run this hundreds of times on the latest nightly from adopt (which is surprisingly old) with no failures. Not spending any more time.

@pshipton
Copy link
Member Author

It only failed yesterday. The build from last night can be found https://ci.eclipse.org/openj9/job/Build-JDK10-linux_x86-64/231/, for a short time.

@gacholio
Copy link
Contributor

Not useful to me - I want a binary (and really, a source drop which doesn't exist) to test against.

@pshipton
Copy link
Member Author

pshipton commented Jul 26, 2018

Note this failure is on a non-compressedrefs build, in case you grabbed the wrong one from Adopt.

There is a binary there. The direct link is https://ci.eclipse.org/openj9/job/Build-JDK10-linux_x86-64/231/artifact/OpenJ9-JDK10-linux_x86-64-201826070443.tar.gz

The matching source is easily accessible from the -version shas.

@pdbain-ibm
Copy link
Contributor

I see about a 1-3% failure rate on 393062 [Grinder] in our internal farm.

[linux_x86-64] SE80_GIT cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_0

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 8.0.6.0 - pxa6480sr6-20180711_03(SR6))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20180726_393023 (JIT enabled, AOT enabled)
OpenJ9   - cefb8a6
OMR      - 26e24bf
IBM      - 98805ca)
JCL - 20180704_01 based on Oracle jdk8u181-b12

@pdbain-ibm
Copy link
Contributor

pdbain-ibm commented Jul 27, 2018

Though it may be a different issue:

Testing: rc018
Test start time: 2018/07/27 11:31:24 Eastern Standard Time
Running command: "/bluebird/builds/bld_393023/sdk/xa6480/jre/bin/java"  -Xnocompressedrefs -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation  -Xdump    -agentlib:jvmtitest=test:rc018 -cp "/bluebird/builds/bld_393023/jvmtest/test/SE80/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/bluebird/builds/bld_393023/jvmtest/test/SE80/TestConfig/lib/asm-all.jar" com.ibm.jvmti.tests.util.TestRunner
Time spent starting: 0 milliseconds
Time spent executing: 11348 milliseconds
Test result: FAILED
Output from test:
 [OUT] *** Testing [1/1]:	testReflectRedefineAtSameTime
 [OUT] starting reflect worker threads
 [OUT] starting to populate java heap
 [ERR] Unhandled exception
 [ERR] Type=Segmentation error vmState=0x00000000
 [ERR] J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000080
 [ERR] Handler1=00007F8B36FDF880 Handler2=00007F8B368DE580 InaccessibleAddress=0000000000000000
 [ERR] RDI=00007F8B30010A20 RSI=00007F8B370912D7 RAX=275AF44316A30DFF RBX=00007F8AFCA7D7C0
 [ERR] RCX=D8A58B4820658948 RDX=00007F8B358D6FF0 R8=0000000000030000 R9=00007F8B30414C00
 [ERR] R10=0000000000000001 R11=0000000000000000 R12=00007F8B30468E00 R13=0000000066240000
 [ERR] R14=0000000000000000 R15=00007F8B305E9200
 [ERR] RIP=00007F8B3700EF2F GS=0000 FS=0000 RSP=00007F8AFCA7D710
 [ERR] EFlags=0000000000010217 CS=0033 RBP=0000000000000000 ERR=0000000000000000
 [ERR] TRAPNO=000000000000000D OLDMASK=0000000000000000 CR2=0000000000000000
 [ERR] xmm0 404eac8f50956d91 (f: 1351970176.000000, d: 6.134812e+01)
 [ERR] xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm2 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
 [ERR] xmm3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
 [ERR] xmm4 3d6ef00de3afd8ff (f: 3819952384.000000, d: 8.793027e-13)
 [ERR] xmm5 3ff196ff60000000 (f: 1610612736.000000, d: 1.099365e+00)
 [ERR] xmm6 3fed1b93a727c143 (f: 2804400384.000000, d: 9.096163e-01)
 [ERR] xmm7 3bbcc86800000000 (f: 0.000000, d: 6.095003e-21)
 [ERR] xmm8 40265ec3c33bf67d (f: 3275486720.000000, d: 1.118509e+01)
 [ERR] xmm9 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
 [ERR] xmm10 3ea4000000000000 (f: 0.000000, d: 5.960464e-07)
 [ERR] xmm11 3d6ef35793c76730 (f: 2479318784.000000, d: 8.796677e-13)
 [ERR] xmm12 3ea2313c4878d8ca (f: 1215879424.000000, d: 5.421736e-07)
 [ERR] xmm13 bbfc53d3a23f359c (f: 2722051584.000000, d: -9.597713e-20)
 [ERR] xmm14 40262e42fefa3800 (f: 4277811200.000000, d: 1.109035e+01)
 [ERR] xmm15 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
 [ERR] Module=/bluebird/builds/bld_393023/sdk/xa6480/jre/lib/amd64/default/libj9vm29.so
 [ERR] Module_base_address=00007F8B36F53000
 [ERR] Target=2_90_20180726_393023 (Linux 2.6.32-279.el6.x86_64)
 [ERR] CPU=amd64 (8 logical CPUs) (0x3e4d1c000 RAM)

@pshipton
Copy link
Member Author

Please point @gacholio at a core file.

@gacholio
Copy link
Contributor

Yes, core file with matching SDK would be very helpful. Even better if you could reproduce with a debug-built one.

@gacholio
Copy link
Contributor

#13
#14 walkBytecodeFrame (currentThread=0x7f8b30468e00, walkState=0x7f8afca7d7c0) at swalk.c:963
#15 walkStackFrames (currentThread=0x7f8b30468e00, walkState=0x7f8afca7d7c0) at swalk.c:325
#16 0x00007f8b358c7067 in jitDecompileMethod () from /team/gac/pr/jre/lib/amd64/default/libj9jit29.so
#17 0x00007f8b358c7345 in c_jitDecompileAtCurrentPC () from /team/gac/pr/jre/lib/amd64/default/libj9jit29.so
#18 0x00007f8b358d700c in jitDecompileAtCurrentPC () from /team/gac/pr/jre/lib/amd64/default/libj9jit29.so

!stackslots 0x7f8b30468e00
<7f8b30468e00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x00007F8B30468E00 ***
<7f8b30468e00> ITERATE_O_SLOTS
<7f8b30468e00> RECORD_BYTECODE_PC_OFFSET
<7f8b30468e00> Initial values: walkSP = 0x00007F8AB4061E50, PC = 0x00007F8B37089747, literals = 0x00007F8B358D6FF0, A0 = 0x00007F8AB4061E50, j2iFrame = 0x0000000000000000, ELS = 0x00007F8AFCA7DC50, decomp = 0x00007F8B30605DC0
Jul 30, 2018 4:44:43 PM com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x00007F8B30468E00

@gacholio
Copy link
Contributor

gacholio commented Aug 1, 2018

I have an inkling of what's going on - we've somehow reached jitDecompileAtCurrentPC incorrectly. It states that the resolve frame is already built, which it clearly is not in these cores, making the stack unwalkable.

@gacholio gacholio self-assigned this Aug 8, 2018
@gacholio
Copy link
Contributor

gacholio commented Aug 8, 2018

Found the problem. The scenario on the java stack is:

JIT recompilation resolve frame
Call-in frame
Native method frame
JIT frame
...

In every resolve case other than recompilation, there is guaranteed to be a compiled frame next on the stack.

When the stack walk begins, resolveFrameFlags in the walk state is initialized to 0. While walking the resolve frame, resolveFrameFlags is set to the flags of the resolve frame. The issue arises because resolveFrameFlags is set back to 0 after walking the expected JIT frame immediately after. Because that frame isn't there in this case, the walk continues down to the lower JIT frame with resolveFrameFlags still set to the recompile frame flags.

The fix will be to zero resolveFrameFlags either when the JIT walker returns back to the interpreter one, or when transitioning to the JIT walker for a reason other than a resolve frame.

@gacholio
Copy link
Contributor

gacholio commented Aug 8, 2018

Note that the native and call-in frames are not necessary - any interpreted frame between the recompilation frame and the JIT frame would have the same issue.

gacholio added a commit to gacholio/openj9 that referenced this issue Aug 9, 2018
Clear resolveFrameFlags in the interpreter stack walker after reporting
the frame. This is consistent with the JIT stack walker and avoids
leaving invalid flags set on entry to the JIT walker in very rare cases.

Also fix a small error in the resumeable stack walker, which has no
effect on the only current consumer of the feature.

Fixes: eclipse-openj9#2472

Signed-off-by: Graham Chapman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants