-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java: try do dlopen libmpi with the full path #1227
java: try do dlopen libmpi with the full path #1227
Conversation
@@ -14,7 +16,7 @@ | |||
if OMPI_WANT_JAVA_BINDINGS | |||
|
|||
# Get the include files that were generated from the .java source files | |||
AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" | |||
AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" -DOMPI_LIBDIR=\"$(OMPI_WRAPPER_LIBDIR)\" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is the right way. You should be using the OPAL installdirs framework to get the libdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with this approach ...
until I figured out opal is not available nor initialized before libmpi is dlopen'ed (!)
that being said, we could use an other environment variable instead of hard-coding the path.
any thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the JNI_onLoad()
fires before MPI_INIT
? Hmm. Yes, that does create a quandary.
Hard-coding the answer at compile time seems like it will someday lead to sadness when someone tries to relocate their Open MPI installation, setenv OPAL_PREFIX
(or one of the others), and is confused as to why the Java bindings don't work.
I'm not sure what the Right answer is.
BTW, mention @shurickdaryin just so that he is aware of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me too...
btw, is there any reason why libmpijava is not linked with libmpi ?
that should make things trivial (thought not relocatable), at least when ompi is configure'd with --enable-mpirun-prefix-by-default.
I can also try to dlopen $OPAL_PREFIX/lib64/libmpi, then $OPAL_PREFIX/lib/libmpi before trying to dlopen the hard coded path. that would make things a bit better on el capitan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had an other idea...
libmpi is dlopen'ed from libmpi_java, and both libraries are in the same directory.
so if we can find a way to retrieve libmpi_java directory, we can build the path for libmpi.
I ll try to achieve this with dlsym and friends, otherwise I will try to see what the jvm can do for us
(api and/or environment and/or command line)
any thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, I can't think of a reason why libmpijava is not linked against libmpi (!). That may actually solve the problem described in the comment (that java will dlopen libmpi in a private namespace) -- but I think you'll need to make at least one code path from libmpijava to call into libmpi to make that linkage work.
If that works, when libmpijava.so
is loaded, it'll pull libmpi.so
with it, and we won't need the dlopen()
.
As for the other ideas (if linking against libmpi doesn't work out):
- trying to dlopen
$OPAL_PREFIX...
is pretty much what installdirs does (you'd also need to check$OPAL_LIBDIR
, IIRC). I'm a bit hesitant to basically repeat (some of) that logic up here. Hmm. - I think you're right that we should be able to assume that libmpijava is in the same dir as libmpi. So if you can find some way to discover that directory (even if it's an OS X-specific way), that might be a good way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will try tomorrow.
dladdr can be used to find the path of libmpijava, this is a gnu extension on linux, but "standard" in OS X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool -- can easily add a configure test for dladdr()
.
libmpi_java is linked against libmpi |
there was an explanation around the dlopen code explaining the logic behind the dlopen. Had to do with the scope of the symbols as imported by the jvm vs what Open MPI likes. I would want any solution to have a method for linux users to override a hardwired path for libmpi.so/dylib. |
@nrgraham23 you may want to look this PR over |
@hppritcha Yeah, I know why the |
@hppritcha The one difference I was advocating was to create a code path from libmpijava into libmpi -- so that not only does it link against it, but it also calls something in libmpi. This might make a difference in what the linker actually loads into the global namespace...? I.e., if we link libmpi to libmpijava today, if there's no libmpi symbols used by libmpijava, the linker might realize it's a no-op. |
for the record, is here the first attempt that is using hard coded path if the initial dlopen fails commit 1630de64d5760d38f7f315d11ca7bc2f5e1717a1
Author: Gilles Gouaillardet <[email protected]>
Date: Wed Dec 16 15:39:39 2015 +0900
java: try do dlopen libmpi with the full path
Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
(set at configure time)
Fixes open-mpi/ompi#1220
Thanks Alexander Daryin for reporting this
diff --git a/ompi/mpi/java/c/Makefile.am b/ompi/mpi/java/c/Makefile.am
index 95615ea..306c5c9 100644
--- a/ompi/mpi/java/c/Makefile.am
+++ b/ompi/mpi/java/c/Makefile.am
@@ -1,9 +1,11 @@
# -*- makefile -*-
#
# Copyright (c) 2011-2013 Cisco Systems, Inc. All rights reserved.
-# Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2012 Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2015 Los Alamos National Security, LLC. All rights
# reserved.
+# Copyright (c) 2015 Research Organization for Information Science
+# and Technology (RIST). All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
@@ -14,7 +16,7 @@
if OMPI_WANT_JAVA_BINDINGS
# Get the include files that were generated from the .java source files
-AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\"
+AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" -DOMPI_LIBDIR=\"$(OMPI_WRAPPER_LIBDIR)\"
headers = \
mpiJava.h
diff --git a/ompi/mpi/java/c/mpi_MPI.c b/ompi/mpi/java/c/mpi_MPI.c
index 5b3a39e..55941ab 100644
--- a/ompi/mpi/java/c/mpi_MPI.c
+++ b/ompi/mpi/java/c/mpi_MPI.c
@@ -14,6 +14,8 @@
* reserved.
* Copyright (c) 2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015 Intel, Inc. All rights reserved.
+ * Copyright (c) 2015 Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
@@ -124,9 +126,17 @@ OBJ_CLASS_INSTANCE(ompi_java_buffer_t,
*/
jint JNI_OnLoad(JavaVM *vm, void *reserved)
{
- libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+ libmpi = dlopen("/libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+ if (NULL == libmpi) {
+ /*
+ * OS X El Capitan does not propagate DYLD_LIBRARY_PATH to children any more
+ * so if the previous dlopen failed, try again with the full path
+ * see https://github.com/open-mpi/ompi/issues/1220
+ */
+ libmpi = dlopen(OMPI_LIBDIR "/libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+ }
- if(libmpi == NULL)
+ if(NULL == libmpi)
{
fprintf(stderr, "Java bindings failed to load libmpi: %s\n",dlerror());
exit(1); |
1630de6
to
99a29b3
Compare
@jsquyres @hppritcha @nrgraham23 @shurickdaryin |
#[[Dl_info info; dladdr((void *)0, &info);]])], | ||
#[OMPI_HAVE_DLADDR=1], | ||
#[OMPI_HAVE_DLADDR=0]) | ||
AC_CHECK_TYPE([Dl_info], [OMPI_HAVE_DLADDR=1], [OMPI_HAVE_DLADDR=0], [[#include <dlfcn.h>]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace most of the changes in this file with AC_CHECK_TYPES
? We do this kind of thing in a bunch of places (even though it will use the GNU convention of #define
vs. #undef
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do,
btw, I pushed my stuff before I removed the debug stuff
99a29b3
to
c2ff2ff
Compare
Test FAILed. |
:bot:retest |
👍 |
c2ff2ff
to
0417a4c
Compare
@hppritcha i just noticed i mixed |
Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more propagated to children, so try to dlopen libmpi with the full path using the directory of libmpi_java Fixes open-mpi#1220 Thanks Alexander Daryin for reporting this
0417a4c
to
e918d75
Compare
Ensure singletons register their progress thread
Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
(set at configure time)
Fixes #1220
Thanks Alexander Daryin for reporting this