Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java: try do dlopen libmpi with the full path #1227

Merged

Conversation

ggouaillardet
Copy link
Contributor

Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
(set at configure time)

Fixes #1220

Thanks Alexander Daryin for reporting this

@@ -14,7 +16,7 @@
if OMPI_WANT_JAVA_BINDINGS

# Get the include files that were generated from the .java source files
AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\"
AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" -DOMPI_LIBDIR=\"$(OMPI_WRAPPER_LIBDIR)\"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is the right way. You should be using the OPAL installdirs framework to get the libdir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with this approach ...
until I figured out opal is not available nor initialized before libmpi is dlopen'ed (!)

that being said, we could use an other environment variable instead of hard-coding the path.

any thoughts ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the JNI_onLoad() fires before MPI_INIT? Hmm. Yes, that does create a quandary.

Hard-coding the answer at compile time seems like it will someday lead to sadness when someone tries to relocate their Open MPI installation, setenv OPAL_PREFIX (or one of the others), and is confused as to why the Java bindings don't work.

I'm not sure what the Right answer is.

BTW, mention @shurickdaryin just so that he is aware of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me too...

btw, is there any reason why libmpijava is not linked with libmpi ?
that should make things trivial (thought not relocatable), at least when ompi is configure'd with --enable-mpirun-prefix-by-default.
I can also try to dlopen $OPAL_PREFIX/lib64/libmpi, then $OPAL_PREFIX/lib/libmpi before trying to dlopen the hard coded path. that would make things a bit better on el capitan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had an other idea...
libmpi is dlopen'ed from libmpi_java, and both libraries are in the same directory.
so if we can find a way to retrieve libmpi_java directory, we can build the path for libmpi.
I ll try to achieve this with dlsym and friends, otherwise I will try to see what the jvm can do for us
(api and/or environment and/or command line)
any thoughts ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I can't think of a reason why libmpijava is not linked against libmpi (!). That may actually solve the problem described in the comment (that java will dlopen libmpi in a private namespace) -- but I think you'll need to make at least one code path from libmpijava to call into libmpi to make that linkage work.

If that works, when libmpijava.so is loaded, it'll pull libmpi.so with it, and we won't need the dlopen().

As for the other ideas (if linking against libmpi doesn't work out):

  • trying to dlopen $OPAL_PREFIX... is pretty much what installdirs does (you'd also need to check $OPAL_LIBDIR, IIRC). I'm a bit hesitant to basically repeat (some of) that logic up here. Hmm.
  • I think you're right that we should be able to assume that libmpijava is in the same dir as libmpi. So if you can find some way to discover that directory (even if it's an OS X-specific way), that might be a good way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will try tomorrow.
dladdr can be used to find the path of libmpijava, this is a gnu extension on linux, but "standard" in OS X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool -- can easily add a configure test for dladdr().

@hppritcha
Copy link
Member

libmpi_java is linked against libmpi

@hppritcha
Copy link
Member

there was an explanation around the dlopen code explaining the logic behind the dlopen. Had to do with the scope of the symbols as imported by the jvm vs what Open MPI likes.

I would want any solution to have a method for linux users to override a hardwired path for libmpi.so/dylib.

@hppritcha
Copy link
Member

@nrgraham23 you may want to look this PR over

@jsquyres
Copy link
Member

@hppritcha Yeah, I know why the dlopen() is there -- I was just hoping that linking it in manually would have the same effect (i.e., bring in the relevant symbols into the global scope as opposed to a private scope).

@jsquyres
Copy link
Member

@hppritcha The one difference I was advocating was to create a code path from libmpijava into libmpi -- so that not only does it link against it, but it also calls something in libmpi. This might make a difference in what the linker actually loads into the global namespace...? I.e., if we link libmpi to libmpijava today, if there's no libmpi symbols used by libmpijava, the linker might realize it's a no-op.

@ggouaillardet
Copy link
Contributor Author

for the record, is here the first attempt that is using hard coded path if the initial dlopen fails

commit 1630de64d5760d38f7f315d11ca7bc2f5e1717a1
Author: Gilles Gouaillardet <[email protected]>
Date:   Wed Dec 16 15:39:39 2015 +0900

    java: try do dlopen libmpi with the full path

    Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
    propagated to children, so try to dlopen libmpi with the full path
    (set at configure time)

    Fixes open-mpi/ompi#1220

    Thanks Alexander Daryin for reporting this

diff --git a/ompi/mpi/java/c/Makefile.am b/ompi/mpi/java/c/Makefile.am
index 95615ea..306c5c9 100644
--- a/ompi/mpi/java/c/Makefile.am
+++ b/ompi/mpi/java/c/Makefile.am
@@ -1,9 +1,11 @@
 # -*- makefile -*-
 #
 # Copyright (c) 2011-2013 Cisco Systems, Inc.  All rights reserved.
-# Copyright (c) 2012   Oracle and/or its affiliates.  All rights reserved.
+# Copyright (c) 2012     Oracle and/or its affiliates.  All rights reserved.
 # Copyright (c) 2015      Los Alamos National Security, LLC. All rights
 #                         reserved.
+# Copyright (c) 2015      Research Organization for Information Science
+#                         and Technology (RIST). All rights reserved.
 # $COPYRIGHT$
 #
 # Additional copyrights may follow
@@ -14,7 +16,7 @@
 if OMPI_WANT_JAVA_BINDINGS

 # Get the include files that were generated from the .java source files
-AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\"
+AM_CPPFLAGS = -I$(top_builddir)/ompi/mpi/java/java $(OPAL_JDK_CPPFLAGS) -DOPAL_DYN_LIB_SUFFIX=\"$(OPAL_DYN_LIB_SUFFIX)\" -DOMPI_LIBDIR=\"$(OMPI_WRAPPER_LIBDIR)\"

 headers = \
         mpiJava.h
diff --git a/ompi/mpi/java/c/mpi_MPI.c b/ompi/mpi/java/c/mpi_MPI.c
index 5b3a39e..55941ab 100644
--- a/ompi/mpi/java/c/mpi_MPI.c
+++ b/ompi/mpi/java/c/mpi_MPI.c
@@ -14,6 +14,8 @@
  *                         reserved.
  * Copyright (c) 2015      Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2015      Intel, Inc. All rights reserved.
+ * Copyright (c) 2015      Research Organization for Information Science
+ *                         and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -124,9 +126,17 @@ OBJ_CLASS_INSTANCE(ompi_java_buffer_t,
  */
 jint JNI_OnLoad(JavaVM *vm, void *reserved)
 {
-    libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+    libmpi = dlopen("/libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+    if (NULL == libmpi) {
+        /*
+         * OS X El Capitan does not propagate DYLD_LIBRARY_PATH to children any more
+         * so if the previous dlopen failed, try again with the full path
+         * see https://github.com/open-mpi/ompi/issues/1220
+         */
+        libmpi = dlopen(OMPI_LIBDIR "/libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
+    }

-    if(libmpi == NULL)
+    if(NULL == libmpi)
     {
         fprintf(stderr, "Java bindings failed to load libmpi: %s\n",dlerror());
         exit(1);

@ggouaillardet ggouaillardet force-pushed the poc/java_full_libmpi_path branch from 1630de6 to 99a29b3 Compare December 17, 2015 03:54
@ggouaillardet
Copy link
Contributor Author

@jsquyres @hppritcha @nrgraham23 @shurickdaryin
here is a second attempt.
if the initial dlopen("libmpi." ...) fails, then try to open libmpi in the same directory than libmpi_java

#[[Dl_info info; dladdr((void *)0, &info);]])],
#[OMPI_HAVE_DLADDR=1],
#[OMPI_HAVE_DLADDR=0])
AC_CHECK_TYPE([Dl_info], [OMPI_HAVE_DLADDR=1], [OMPI_HAVE_DLADDR=0], [[#include <dlfcn.h>]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace most of the changes in this file with AC_CHECK_TYPES? We do this kind of thing in a bunch of places (even though it will use the GNU convention of #define vs. #undef).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do,
btw, I pushed my stuff before I removed the debug stuff

@ggouaillardet ggouaillardet force-pushed the poc/java_full_libmpi_path branch from 99a29b3 to c2ff2ff Compare December 18, 2015 01:31
@lanl-ompi
Copy link
Contributor

Test FAILed.

@ggouaillardet
Copy link
Contributor Author

:bot:retest

@hppritcha
Copy link
Member

👍
@ggouaillardet I think this is good enough. Please open a PR against v2.x and v1.10.2 when you get a chance.

@ggouaillardet ggouaillardet force-pushed the poc/java_full_libmpi_path branch from c2ff2ff to 0417a4c Compare December 22, 2015 01:31
@ggouaillardet
Copy link
Contributor Author

@hppritcha i just noticed i mixed #ifdef HAVE_xxx_H and #if HAVE_xxx_H so i updated(and rebased) this PR
once it passes the tests, i will merge it and make the PRs for v2.x and v1.10

Since OS X 10.11 (aka El Capitan) DYLD_LIBRARY_PATH is no more
propagated to children, so try to dlopen libmpi with the full path
using the directory of libmpi_java

Fixes open-mpi#1220

Thanks Alexander Daryin for reporting this
@ggouaillardet ggouaillardet force-pushed the poc/java_full_libmpi_path branch from 0417a4c to e918d75 Compare December 22, 2015 02:10
@ggouaillardet ggouaillardet merged commit e918d75 into open-mpi:master Dec 22, 2015
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Aug 23, 2016
Ensure singletons register their progress thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants