Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Embed libltdl in opal/libltdl #390

Closed
wants to merge 3 commits into from

Conversation

jsquyres
Copy link
Member

This PR fully embeds libltdl in the OMPI source tree -- i.e., it's committed in the git tree (as opposed of the GNU Autotools copying it into the OMPI source tree at autogen.pl time).

This preserves the "build with dlopen / DSO support by default" behavior that Open MPI has had for a long time. It also works with Libtool 2.4.4 (and assumedly beyond).

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/229/

Build Log
last 50 lines

[...truncated 1676 lines...]
checking whether PF_UNSPEC is declared... yes
checking whether AF_INET6 is declared... yes
checking whether PF_INET6 is declared... yes
checking if SA_RESTART defined in signal.h... yes
checking for struct sockaddr.sa_len... no
checking for struct dirent.d_type... yes
checking for siginfo_t.si_fd... yes
checking for siginfo_t.si_band... yes
checking for struct statfs.f_type... yes
checking for struct statfs.f_fstypename... no
checking for struct statvfs.f_basetype... no
checking for struct statvfs.f_fstypename... no
checking for pointer diff type... ptrdiff_t (size: 8)
checking for type of MPI_Aint... ptrdiff_t (size: 8)
checking for type of MPI_Count... long long (size: 8)
checking for type of MPI_Offset... long long (size: 8)
checking for an MPI datatype for MPI_Offset... MPI_LONG_LONG
checking for MPI_INTEGER_KIND... 4
checking for MPI_ADDRESS_KIND... 8
checking for MPI_COUNT_KIND... 8
checking for MPI_OFFSET_KIND... 8

============================================================================
== Library and Function tests
============================================================================
checking for library containing openpty... -lutil
checking for library containing gethostbyname... none required
checking for library containing socket... none required
checking for library containing sched_yield... none required
checking for library containing dirname... none required
checking for library containing ceil... -lm
checking for library containing clock_gettime... -lrt

*** GNU libltdl setup
checking location of libltdl... internal copy
configure: OPAL configuring in opal/libltdl
configure: running /bin/sh './configure'  --enable-ltdl-convenience --disable-ltdl-install --enable-shared --disable-static --cache-file=/dev/null --srcdir=. --disable-option-checking
/bin/sh: ./configure: No such file or directory
configure: /bin/sh './configure' *failed* for opal/libltdl
configure: WARNING: Failed to build GNU libltdl.  This usually means that something
configure: WARNING: is incorrectly setup with your environment.  There may be useful information in
configure: WARNING: opal/libltdl/config.log.  You can also disable GNU libltdl, which will disable
configure: WARNING: dynamic shared object loading, by configuring with --disable-dlopen.
configure: error: Cannot continue
+ exit 10
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s

Test FAILed.

@jsquyres jsquyres force-pushed the topic/embed-libltdl branch from 6ee6208 to 4c9b449 Compare February 11, 2015 23:18
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/231/
Test PASSed.

@jsquyres jsquyres force-pushed the topic/embed-libltdl branch from 4c9b449 to ec563f6 Compare February 17, 2015 00:00
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/242/

Build Log
last 50 lines

[...truncated 5902 lines...]
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_am.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_mr.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_wait.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_poll.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_util.lo
  CCLD     libmca_common_libfabric.la
make[3]: Entering directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
 /bin/mkdir -p '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib'
 /bin/sh ../../../../libtool   --mode=install /usr/bin/install -c   libmca_common_libfabric.la '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib'
 /bin/mkdir -p '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric'
 /usr/bin/install -c -m 644 libfabric/include/fi.h libfabric/include/fi_enosys.h libfabric/include/fi_indexer.h libfabric/include/fi_list.h libfabric/include/fi_log.h libfabric/include/fi_rbuf.h libfabric/include/prov.h libfabric/include/linux/osd.h libfabric/include/osx/osd.h libfabric/include/rdma/fabric.h libfabric/include/rdma/fi_atomic.h libfabric/include/rdma/fi_cm.h libfabric/include/rdma/fi_domain.h libfabric/include/rdma/fi_endpoint.h libfabric/include/rdma/fi_eq.h libfabric/include/rdma/fi_errno.h libfabric/include/rdma/fi_prov.h libfabric/include/rdma/fi_rma.h libfabric/include/rdma/fi_tagged.h libfabric/include/rdma/fi_trigger.h libfabric/prov/usnic/src/fi_ext_usnic.h libfabric/prov/usnic/src/usdf.h libfabric/prov/usnic/src/usdf_av.h libfabric/prov/usnic/src/usdf_cm.h libfabric/prov/usnic/src/usdf_cq.h libfabric/prov/usnic/src/usdf_dgram.h libfabric/prov/usnic/src/usdf_endpoint.h libfabric/prov/usnic/src/usdf_msg.h libfabric/prov/usnic/src/usdf_progress.h libfabric/prov/usnic/src/usdf_rdm.h libfabric/prov/usnic/src/usdf_rudp.h libfabric/prov/usnic/src/usdf_timer.h libfabric/prov/usnic/src/usnic_direct/cq_desc.h libfabric/prov/usnic/src/usnic_direct/cq_enet_desc.h libfabric/prov/usnic/src/usnic_direct/kcompat.h libfabric/prov/usnic/src/usnic_direct/kcompat_priv.h libfabric/prov/usnic/src/usnic_direct/libnl1_utils.h libfabric/prov/usnic/src/usnic_direct/libnl3_utils.h libfabric/prov/usnic/src/usnic_direct/libnl_utils.h libfabric/prov/usnic/src/usnic_direct/linux/delay.h '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric'
/usr/bin/install: will not overwrite just-created `/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric/osd.h' with `libfabric/include/osx/osd.h'
make[3]: *** [install-opalHEADERS] Error 1
make[3]: *** Waiting for unfinished jobs....
libtool: install: /usr/bin/install -c .libs/libmca_common_libfabric.so.0.0.0 /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmca_common_libfabric.so.0.0.0
libtool: install: (cd /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib && { ln -s -f libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so.0 || { rm -f libmca_common_libfabric.so.0 && ln -s libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so.0; }; })
libtool: install: (cd /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib && { ln -s -f libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so || { rm -f libmca_common_libfabric.so && ln -s libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so; }; })
libtool: install: /usr/bin/install -c .libs/libmca_common_libfabric.lai /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmca_common_libfabric.la
libtool: finish: PATH="/hpc/local/bin::/usr/local/bin:/bin:/usr/bin:/usr/sbin:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin:/sbin" ldconfig -n /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib
----------------------------------------------------------------------
Libraries have been installed in:
   /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[3]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
make[2]: *** [install-am] Error 2
make[2]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal'
make: *** [install-recursive] Error 1
+ exit 10
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of ec563f6126ceb427a7f9d1a81366a0514254200d to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/242/ and message: Merged build finished.

Test FAILed.

@jsquyres
Copy link
Member Author

bot:retest

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/243/

Build Log
last 50 lines

[...truncated 5902 lines...]
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_am.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_mr.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_wait.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_poll.lo
  CC       libfabric/prov/psm/src/libmca_common_libfabric_la-psmx_util.lo
  CCLD     libmca_common_libfabric.la
make[3]: Entering directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
 /bin/mkdir -p '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib'
 /bin/sh ../../../../libtool   --mode=install /usr/bin/install -c   libmca_common_libfabric.la '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib'
 /bin/mkdir -p '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric'
 /usr/bin/install -c -m 644 libfabric/include/fi.h libfabric/include/fi_enosys.h libfabric/include/fi_indexer.h libfabric/include/fi_list.h libfabric/include/fi_log.h libfabric/include/fi_rbuf.h libfabric/include/prov.h libfabric/include/linux/osd.h libfabric/include/osx/osd.h libfabric/include/rdma/fabric.h libfabric/include/rdma/fi_atomic.h libfabric/include/rdma/fi_cm.h libfabric/include/rdma/fi_domain.h libfabric/include/rdma/fi_endpoint.h libfabric/include/rdma/fi_eq.h libfabric/include/rdma/fi_errno.h libfabric/include/rdma/fi_prov.h libfabric/include/rdma/fi_rma.h libfabric/include/rdma/fi_tagged.h libfabric/include/rdma/fi_trigger.h libfabric/prov/usnic/src/fi_ext_usnic.h libfabric/prov/usnic/src/usdf.h libfabric/prov/usnic/src/usdf_av.h libfabric/prov/usnic/src/usdf_cm.h libfabric/prov/usnic/src/usdf_cq.h libfabric/prov/usnic/src/usdf_dgram.h libfabric/prov/usnic/src/usdf_endpoint.h libfabric/prov/usnic/src/usdf_msg.h libfabric/prov/usnic/src/usdf_progress.h libfabric/prov/usnic/src/usdf_rdm.h libfabric/prov/usnic/src/usdf_rudp.h libfabric/prov/usnic/src/usdf_timer.h libfabric/prov/usnic/src/usnic_direct/cq_desc.h libfabric/prov/usnic/src/usnic_direct/cq_enet_desc.h libfabric/prov/usnic/src/usnic_direct/kcompat.h libfabric/prov/usnic/src/usnic_direct/kcompat_priv.h libfabric/prov/usnic/src/usnic_direct/libnl1_utils.h libfabric/prov/usnic/src/usnic_direct/libnl3_utils.h libfabric/prov/usnic/src/usnic_direct/libnl_utils.h libfabric/prov/usnic/src/usnic_direct/linux/delay.h '/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric'
/usr/bin/install: will not overwrite just-created `/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/include/openmpi/opal/mca/common/libfabric/osd.h' with `libfabric/include/osx/osd.h'
make[3]: *** [install-opalHEADERS] Error 1
make[3]: *** Waiting for unfinished jobs....
libtool: install: /usr/bin/install -c .libs/libmca_common_libfabric.so.0.0.0 /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmca_common_libfabric.so.0.0.0
libtool: install: (cd /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib && { ln -s -f libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so.0 || { rm -f libmca_common_libfabric.so.0 && ln -s libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so.0; }; })
libtool: install: (cd /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib && { ln -s -f libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so || { rm -f libmca_common_libfabric.so && ln -s libmca_common_libfabric.so.0.0.0 libmca_common_libfabric.so; }; })
libtool: install: /usr/bin/install -c .libs/libmca_common_libfabric.lai /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmca_common_libfabric.la
libtool: finish: PATH="/hpc/local/bin::/usr/local/bin:/bin:/usr/bin:/usr/sbin:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin:/sbin" ldconfig -n /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib
----------------------------------------------------------------------
Libraries have been installed in:
   /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[3]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
make[2]: *** [install-am] Error 2
make[2]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal/mca/common/libfabric'
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/opal'
make: *** [install-recursive] Error 1
+ exit 10
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of ec563f6126ceb427a7f9d1a81366a0514254200d to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/243/ and message: Merged build finished.

Test FAILed.

Commit a copy of libltdl from Libtool 2.4.2 in the Open MPI source
tree because later versions of Libtool broke the embed-libltdl
functionality.

To be clear, this is what I did: in another Open MPI tree (master
open-mpi/ompi@22f1d29), with with AC 2.69, AM 1.13.3, LT 2.4.2.

```sh
$ cd $other-ompi-tree
$ autogen.pl
$ cd $my-git-embed-libltdl-tree
$ mv .gitignore TEMP-IGNORE-gitignore
$ cd opeal
$ rm -rf libltdl
$ cp -r $other-ompi-tree/opal/libltdl .
$ git add libltdl
$ cd ..
$ ./autogen.pl && ./configure
$ cd opal/libltdl
$ git rm --force (...files that changed...)
$ git commit -m "libltdl from LT 2.4.2"
```

Also, I hand-edited two files:

1. opal/libltdl/configure.ac: change `AC_CONFIG_MACR_DIR([m4])` to `([config])`
2. opal/libltdl/Makefile.am: change `ACLOCAL_AMFLAGS = -I m4` to `-I config`

This was all necessary because there's a complicated bootstrapping
procedure in Libtool/libltdl.  We cannot use the LTDL_* m4 macros to
setup libltdl because they assume all the bootstrapping that the
Libtool package does to embed itself.

Instead, we simply invoke autogen/configure down in the opal/libltdl
directory.  More details coming in the next commit (which adjusts the
rest of the tree to use this embedded libltdl).
This gives Open MPI better error messages when a DSO fails to open for
some reason.  See the lengthy comment in the commit explaining why.
Adjust .gitignore for files that we should and should not ignore in
the embedded opal/libltdl (and friends).

Also make changes to autogen.pl:

* Remove some old kruft about the preopen error diff patch and FreeBSD
  shenanigans
* Update some configure ordering of CUDA libltdl checking to ensure
  that it happens after libltdl is setup
* Set all the LTDL macros that our build system expects (i.e., that
  LTDL_INIT/libtoolize used to do for us, such as LIBLTDL, LTDLDEPS,
  LTDLINCL).
* Adjust all inclusions of `ltdl.h` to now use `#include <ltdl.h>` and
  simply set CPPFLAGS if we need it to point to the one embedded in
  the OMPI source tree.
* Adjust autogen.pl to run "autoreconf" in the opal/libltdl tree.
  Critically, we need to NOT run `libtoolize`, though (there's
  complicated sequencing in the bootstrapping of the Libtool/libltdl
  packages that makes that magically work somehow -- but now we don't
  want that magic: we want libltdl to be just like any other embedded
  package that just needs a configure script and some Automake-ified
  Makefiles).
@jsquyres jsquyres force-pushed the topic/embed-libltdl branch from ec563f6 to 874333d Compare February 17, 2015 15:10
@jsquyres
Copy link
Member Author

bot:retest

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/244/
Test PASSed.

@jsquyres
Copy link
Member Author

Per 17 Feb 2015 Tuesday webex, we decided to go with this approach over #366.

However, further testing after the call showed that this path... basically leads to madness. There are many more corner cases that have been exposed with more testing.

In short: libltdl is not a normal Autotools-based package that we can embed in our source tree (e.g., like hwloc and libevent). Libltdl has some sophisticated bootstrapping that is inherently tied to the GNU Libtool package and the running of libtoolize during autogen.pl. It will be very, very difficult to try to extract libltdl from this bootstrapping handshake (and probably result in a fragile packaging that could/will break in the future).

I have another approach that I'm trying. More details will be posted on #311 when I'm ready.

@jsquyres jsquyres closed this Feb 19, 2015
@jsquyres jsquyres deleted the topic/embed-libltdl branch July 9, 2015 17:56
jsquyres pushed a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
Bring cuda mpiext over to 2.x. They are ready.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OMPI cannot build with Libtool 2.4.3 and above
2 participants