-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenJ9 JVM crash when loading native library in Linux #13269
Comments
@a0304 Could you provide the diagnosis file? In addition, a standalone testcase will be helpful if it is available. |
Thanks for looking into this issue @JasonFengJ9. The .so files were compiled in the Intel® Fortran Compiler 19.0.5. The compilers were recently updated from Intel Fortran Compiler 16. And the version 16 compiled .so(s) work successfully with the OpenJ9 JVMs. Is there a known supported Intel compiler version for OpenJ9 ? I have been trying to put together a standalone test case but I am unable to replicate the crash when only the 2 attached libraries (libsvml.so, libintlc.so.5) are being loaded in a simple java main program. So i tried printing all the libs that get loaded using LD_DEBUG=libs and tried to load them all in the same sequence from my program. I noticed that calling a system.loadlibrary on the /jre/lib/libnet.so also causes a similar segmentation fault. |
@a0304 I can confirm that the box folder is well received.
OpenJ9 build environments is documented at [1]. For JDK11 Linux x86 64 bit, it is Did you have a chance to run the application with Edit: Please run [1] https://www.eclipse.org/openj9/docs/openj9_support/ |
Yes, we tried this on JDK 8 (version details below) and we see similar segmentation faults. We have not tried JDK 16 yet and provide the results. openjdk version "1.8.0_265" |
I tested the application with the latest JDK16 and i can still reproduce the crash. The crash dump files and the outputs from jpackcore have been shared in the same BOX folder. You should have received an email with the file url. The jdk 16 version used was openjdk version "16.0.2" 2021-07-20 The native code was built with gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3). Is gcc 7.5 the minimum required version or is it the only gcc version supported ? My understand is, it is the minimum version. Can you please confirm ? |
Yes, I did receive the dump files. |
With the JDK16 core files collected via
@a0304 is this what you see in local system? From JDK11 snaptrace
It appears the segmentation error occurred at Couple of things to try at the local system:
[1] https://github.com/ibmruntimes/semeru11-binaries/releases/download/jdk-11.0.12%2B7_openj9-0.27.0/ibm-semeru-open-jdk_x64_linux_11.0.12_7_openj9-0.27.0.tar.gz |
@JasonFengJ9, i have downloaded the tars from [1] and [2]. But i am unable to use them. i have untar'd the files and tried calling debuginfo-install on them using, i see the following error, Loaded plugins: fastestmirror, langpacks Do i need to change something ? |
@a0304 Usually I just overlay the JDK with debug files like |
@JasonFengJ9 i have shared with you the Box folder containing the crash dump logs, gdb traces, jpackcore zip output and the sys out trace from the time of the crash. This was generated using the OpenJDK11.0.12_debugimage that you provided. Can you please check the GDB_Trace_Full.txt in the shared .zip archive
The gdb traces seem to show that the .so file is being loaded from the Java installation, instead of the application lib folders. I am not sure if i am running the gdb right.
|
@JasonFengJ9, One workaround that seems to work is to do a LD_PRELOAD of the dependencies of our .so before starting the OpenJ9 JVM. In this particular case the dependencies were the Intel libs libsvml.so and libintlc.so.5. I made sure that these dependency lib paths are available in the LD_LIBRARY_PATH of the JVM and the .so has these dependencies listed in its make file before build. |
This is a problem across Java 8/11/16 levels, a workaround has been identified as per #13269 (comment). |
Hi @JasonFengJ9, What dlopen flag does OpenJ9 JDK use in System.loadlibrary. Is it RTLD_GLOBAL ? That would help explain why LD_PRELOAD of the libs fixes the issue. |
The actual library loading code snippet for Linux is [1]
Neither From the snaptrace log
Is [1] https://github.com/eclipse-openj9/openj9-omr/blob/1d8fb435675f022855948c08f08f6db66cbe38d8/port/unix/omrsl.c#L163 |
No the libSMAOPTnlpql.so is not available in the /scratch/SMAEXE_g6h/templib56973 dir. The lib is not meant to be available there. And the actual location of the lib is available in the LD_Library_Path. |
@a0304 Sure, is it ok to attach the text file in this issue? |
Thanks @JasonFengJ9. I have sent you an invite (as editor) to a Box dir. Can you please add the file there ? |
@a0304 Just uploaded, please check. |
Thanks |
Hi @JasonFengJ9, I have shared a .zip file in a Box dir containing the necessary .so file and the Java code files to reproduce this crash in a standalone program.
Steps to reproduce,
Please let me know if you need more details. |
@a0304 got couple of questions:
Does this resemble the native stack in your environment?
Are these error expected? |
@JasonFengJ9, Yes, i did see a similar native stack track. And yes the version that i earlier supplied does fail in my Hotspot set up too. That is because I had missed including the libintlc.so.5 file in the test libs dir (apologies for the mess up). This is a necessary reference file and this file too is also delivered by the fortran compiler. I have shared a new box dir (Stand Alone Crash Test V1.0) with the right set of refs, test classes and the stack trace and dumps i see when running OptLoadInThread.class in an OpenJ9 vm. Please use this. The native stack i see with this version is `Unhandled exception
|
Hi @JasonFengJ9, Were you able to replicate the crash with the new set of libs and code that I added to the box dir with my last comment ? Do we know why this crash happens ? Thanks. |
Yeah, I am able to reproduce the segmentation error w/ the testcases supplied, the investigation is in progress. On the other hand, JDK17 [1][2] works in both testcases, can you verify if it works in your application as well? [1] https://github.com/ibmruntimes/semeru17-binaries/releases/download/jdk-17%2B35_openj9-0.28.0-m1/ibm-semeru-open-jdk_x64_linux_17_35_openj9-0.28.0-m1.tar.gz |
Summary of investigation so far: Testcase passed:
within Testcase failed
The segmentation error occurred at dl-load.c:1970(open_verify)
Set a breakpoint at
It is not clear why segmentation fault occurred at |
Thanks @JasonFengJ9. I will test with the new JDK17 and let you know the results. |
Update: The testcase caused JDK11 [1] segmentation error at @a0304 can you check if your application works in the OS versions specified above? |
Hi,
We have been seeing a JVM crash when loading native code using System.loadLibrary in Linux .
This crash is only visible on OpenJDK11 OpenJ9 JVMs. But the load works successfully as expected in OpenJDK 11 Hotspot JVMs.
The JVMs where we can reproduce the crash are,
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 11 Linux amd64-64-Bit Compressed References 20201022_810 (JIT enabled, AOT enabled)
OpenJ9 - 0394ef7
OMR - 582366ae5
JCL - 3b09cfd7e9 based on jdk-11.0.9+11)
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 11 Linux amd64-64-Bit Compressed References 20210120_910 (JIT enabled, AOT enabled)
OpenJ9 - 345e1b0
OMR - 741e94ea8
JCL - 0a86953833 based on jdk-11.0.10+9)
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9 - b4cc246
OMR - 162e6f729
JCL - 7796c80419 based on jdk-11.0.11+9)
The Hotspot JVM where the load works successfully is
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
Summary of the problem:
The OpenJ9 JVM running on Linux crashed with a Segmentation Fault while trying to load a .so. The application doesnt log any exceptions to the stdout or stderr and simply crashes. The JVM being tested was openjdk version "11.0.11" 2021-04-20. Also, this crash happens consistently with the above listed OpenJ9 JVMs.
At the time of the crash the application was trying to load one of our native libraries which depend on the Intel proprietary reference libraries libsvml.so which in turn refers to libintlc.so.5. We tracked this by setting "LD_DEBUG=libs".
We suspect that this is a bug in the OpenJ9 implementation, because this same application works successfully and the crash could not be reproduced when using an OpenJDK 11 Hotspot JVM. The Linux environment, LD_LIBRARY_PATH, application code, the native libraries, and Intel compilers were the same between the 2 tests.
Linux Environment:
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Trying to open the crash core dump file using GDB showed the following trace,
#0 0x00007f19fb7ac54f in renameDump () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#1 0x00007f19fb796bd1 in omrdump_create () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#2 0x00007f19f49cf212 in doSystemDump () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so
#3 0x00007f19f49cb005 in protectedDumpFunction () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so
#4 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#5 0x00007f19f49ce67b in runDumpFunction () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so
#6 0x00007f19f49ce80f in runDumpAgent () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so
#7 0x00007f19f49e6eab in triggerDumpAgents () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so
#8 0x00007f19fba36f22 in generateDiagnosticFiles () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#9 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#10 0x00007f19fba37135 in vmSignalHandler () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#11 0x00007f19fb797c3a in mainSynchSignalHandler () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#12
#13 0x00007f1a024a5357 in open_verify () from /lib64/ld-linux-x86-64.so.2
#14 0x00007f1a024a5892 in open_path () from /lib64/ld-linux-x86-64.so.2
#15 0x00007f1a024a8689 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2
#16 0x00007f1a024acb92 in openaux () from /lib64/ld-linux-x86-64.so.2
#17 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#18 0x00007f1a024ad39d in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2
#19 0x00007f1a024b423b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#20 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#21 0x00007f1a024b3acb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#22 0x00007f1a01c59eeb in dlopen_doit () from /lib64/libdl.so.2
#23 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#24 0x00007f1a01c5a4ed in _dlerror_run () from /lib64/libdl.so.2
#25 0x00007f1a01c59f81 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#26 0x00007f19fb79c178 in omrsl_open_shared_library () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#27 0x00007f19fba7f727 in classLoaderRegisterLibrary () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#28 0x00007f19fba7fd1d in openNativeLibrary.constprop.3 () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#29 0x00007f19fba7ff59 in registerNativeLibrary () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#30 0x00007f19fba8b578 in VM_BytecodeInterpreterCompressed::run(J9VMThread*) () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#31 0x00007f19fba882f5 in bytecodeLoopCompressed () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#32 0x00007f19fbb336b2 in c_cInterpreter () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#33 0x00007f19fba1509a in runJavaThread () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#34 0x00007f19fba87af1 in javaProtectedThreadProc () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#35 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so
#36 0x00007f19fba83c6a in javaThreadProc () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so
#37 0x00007f19fb5614f6 in thread_wrapper () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9thr29.so
#38 0x00007f1a02075dd5 in start_thread () from /lib64/libpthread.so.0
#39 0x00007f1a01989ead in clone () from /lib64/libc.so.6
The std out message at the time of the crash was
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=00007F6346193380 Handler2=00007F6345EF3A10 InaccessibleAddress=00007F631BE7A5E8
RDI=00007F631BE435F0 RSI=0000000001795E60 RAX=00000000000720F0 RBX=00007F631BEB5898
RCX=0000000000000008 RDX=00007F631BE435F0 R8=00007F631BEB5AA0 R9=00007F6348C0EC60
R10=00007F631BEB54A0 R11=0000001400000004 R12=00000000000720E0 R13=00007F631BEB5AA0
R14=00007F631BEB58E0 R15=00007F631BEB59C0
RIP=00007F6348BF5357 GS=0000 FS=0000 RSP=00007F631BE795F0
EFlags=0000000000010206 CS=0033 RBP=00007F631BEB5640 ERR=0000000000000006
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=00007F631BE7A5E8
xmm0 6370682d67786d2f (f: 1735945472.000000, d: 9.907070e+170)
xmm1 732f6d63732f6836 (f: 1932486656.000000, d: 6.866786e+246)
xmm2 6e696c2f6836675f (f: 1748395904.000000, d: 7.351682e+223)
xmm3 2f6e69622f65646f (f: 795174016.000000, d: 3.206057e-80)
xmm4 71706c6e006c7170 (f: 7106928.000000, d: 2.673645e+238)
xmm5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 000000003e17cee7 (f: 1041747712.000000, d: 5.146917e-315)
xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11 ca62c1d6ca62c1d6 (f: 3395469824.000000, d: -2.193092e+50)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/ld-linux-x86-64.so.2
Module_base_address=00007F6348BF0000
Target=2_90_20210421_975 (Linux 3.10.0-957.el7.x86_64)
CPU=amd64 (20 logical CPUs) (0xfb302b000 RAM)
----------- Stack Backtrace -----------
(0x00007F6348BF5357 [ld-linux-x86-64.so.2+0x5357])
(0x00007F6348BF5892 [ld-linux-x86-64.so.2+0x5892])
(0x00007F6348BF8689 [ld-linux-x86-64.so.2+0x8689])
(0x00007F6348BFCB92 [ld-linux-x86-64.so.2+0xcb92])
(0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714])
(0x00007F6348BFD39D [ld-linux-x86-64.so.2+0xd39d])
(0x00007F6348C0423B [ld-linux-x86-64.so.2+0x1423b])
(0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714])
(0x00007F6348C03ACB [ld-linux-x86-64.so.2+0x13acb])
(0x00007F63483A9EEB [libdl.so.2+0xeeb])
(0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714])
(0x00007F63483AA4ED [libdl.so.2+0x14ed])
dlopen+0x31 (0x00007F63483A9F81 [libdl.so.2+0xf81])
(0x00007F6345EF8178 [libj9prt29.so+0x2e178])
(0x00007F63461DB727 [libj9vm29.so+0x86727])
(0x00007F63461DBD1D [libj9vm29.so+0x86d1d])
(0x00007F63461DBF59 [libj9vm29.so+0x86f59])
(0x00007F63461E7578 [libj9vm29.so+0x92578])
(0x00007F63461E42F5 [libj9vm29.so+0x8f2f5])
(0x00007F634628F6B2 [libj9vm29.so+0x13a6b2])
---------------------------------------
We have made sure that the path and the LD_LIBRARY_PATH are valid and all native libraries and their references listed using the ldd are available at runtime.
We followed the Mustgather guide in https://www.ibm.com/support/pages/node/344411 and got the java core dump, snap trace, etc during further tests. We can provide these files and part of the stdout and stderr of our application to the assignee of this issue.
We appreciate help from anyone who works in this area and/or who has encountered a similar issue before.
Thanks !
Ashwin
The text was updated successfully, but these errors were encountered: