Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Replace libltdl with OPAL "dl" framework #410

Merged
merged 9 commits into from
Mar 9, 2015

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Feb 21, 2015

Per #311, we've tried two approaches to getting rid of the embedded libltdl from OMPI. Neither worked. ☹️

Here's a new approach: make dynamic library functionality (i.e., dlopen/dlsym-like functionality) be an OPAL framework. Have (at least) two components:

  1. A simple dlopen-based component that works on any dlopen-lovin' platform.
  2. A libltdl-based component that uses a system-provided libltdl (assuming ltdl.h and libltdl are available)

This idea is based on the premise that Open MPI's main two platforms are (modern) Linux and OS X, both of which support dlopen(2). Therefore, combined with the fact that dlfcn.h and libdl are typically available by default, the dlopen-based component can (usually) be built by default. For non-dlopen-lovin' platforms, libltdl support is still available and will function the same as ever -- just not embedded in the Open MPI tree (and therefore you must have libltdl devel support installed).

Additionally, plugins can be written for other platforms to support their native dlopen/dlsym-like functionality, if desired (e.g., if libltdl doesn't support that platform and/or if a developer doesn't want to force a user to have libltdl+devel support installed).

This PR contains a series of commits that incorporates the entirety of this functionality in logical steps:

  1. add the dl framework
  2. add the dlopen dl component
  3. add the libltdl dl component
  4. convert the MCA base to use the opal_dl interface
  5. convert the debuggers code to use the opal_dl interface
  6. convert the CUDA code to use the opal_dl interface
  7. remove the lt_interace code (an OPAL interface to libltdl)
  8. convert (orte|ompi|oshmem)*info to use the opal_dl interface
  9. remove libltdl from the tree

(NOTE: there is currently a "zero" commit at the head of this patch set that is a bug fix for the MCA framework; this is getting reviewed independently by @hjelmn right now, and will likely be committed separately. It is included here because the fix is required to get this DL framework to function properly)

@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from 805cf3a to abfa7a6 Compare February 21, 2015 14:04
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/251/

Build Log
last 50 lines

[...truncated 14776 lines...]
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[1424527549.693987] [jenkins01:27451:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.694011] [jenkins01:27454:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.693962] [jenkins01:27448:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.693950] [jenkins01:27453:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1c38b00)
[1424527549.693939] [jenkins01:27456:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.693989] [jenkins01:27447:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.693961] [jenkins01:27445:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424527549.693987] [jenkins01:27444:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[jenkins01:27442] 7 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[jenkins01:27442] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of 805cf3a17d20dccc8b19f179b6e18ce8bb30e78a to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/251/ and message: Merged build finished.

Test FAILed.

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/252/

Build Log
last 50 lines

[...truncated 14776 lines...]
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[1424528069.712760] [jenkins01:27181:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712755] [jenkins01:27186:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712748] [jenkins01:27189:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1c38b00)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[1424528069.712801] [jenkins01:27180:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712770] [jenkins01:27184:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712779] [jenkins01:27177:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712785] [jenkins01:27187:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[1424528069.712790] [jenkins01:27178:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a88b00)
[jenkins01:27175] 7 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[jenkins01:27175] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of abfa7a69f567fb12a5560dfbcd1f16f70fe247cb to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/252/ and message: Merged build finished.

Test FAILed.

@jsquyres
Copy link
Member Author

@miked-mellanox There's something happening here with MXM that I don't understand. Is this a false failure?

@mike-dubman
Copy link
Member

i suspect some direct commit to the master did break modex for mxm.
yet another reason to avoid it and use PRs.

the jenkins setup did not change for a long time.
This one 48eae25 looks suspicious to me.

@rhc54
Copy link
Contributor

rhc54 commented Feb 21, 2015

I'm not aware of any "direct commit" that affected the modex. I did some some mxm and mtl related changes go by, but it's hard for me to keep track of everything

@rhc54
Copy link
Contributor

rhc54 commented Feb 21, 2015

Hmmm...scanning back, I see that Elena did commit a modex change the other day. I don't know why that would have affected mxm particularly, but it might be a place to look.

@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from abfa7a6 to d037da3 Compare February 22, 2015 11:12
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/253/

Build Log
last 50 lines

[...truncated 14775 lines...]
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[1424604358.030332] [jenkins01:30619:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030331] [jenkins01:30624:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030335] [jenkins01:30628:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1c14b00)
[1424604358.030332] [jenkins01:30618:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030332] [jenkins01:30621:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030332] [jenkins01:30626:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030331] [jenkins01:30630:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[1424604358.030334] [jenkins01:30617:0]  proto_conn.c:846  MXM  ERROR already connected to � (uuid 0x7ffff1a24b00)
[jenkins01:30615] 7 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[jenkins01:30615] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Build step 'Execute shell' marked build as failure
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of d037da3e27844b2c3289a4ad348136716bfeb7e5 to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/253/ and message: Merged build finished.

Test FAILed.

@jsquyres
Copy link
Member Author

@miked-mellanox I made a minor tweak and checked Jenkins again, and am still getting the MXM failure. Can someone have a look at master and/or this PR to see what is going on? I do not have the ability to test MXM myself.

@mike-dubman
Copy link
Member

@yosefe - could you please comment on failure root-cause?
Thanks

@yosefe
Copy link
Contributor

yosefe commented Feb 22, 2015

it seems like a data corruption in modex send / modex recv, the value being printed (uuid) is passed over modex; it should be random value, but it looks like a pointer

@jsquyres
Copy link
Member Author

@yosefe So is this happening on master, and unrelated to this PR?

@mike-dubman
Copy link
Member

we will know shortly, submitting PR to the master now for coverity fixes in yalla.

@mike-dubman
Copy link
Member

fails as well :(

@rhc54
Copy link
Contributor

rhc54 commented Feb 22, 2015

FWIW: looks like Igor touched mxm when he committed a bunch of Coverity fixes - see 010dce3

@mike-dubman
Copy link
Member

yep, you are right- thanks

010dce3 (http://bgate.mellanox.com/jenkins/job/gh-ompi-master-merge/9) breaks the mxm. @igor-ivanov - please review.

426d1ce (http://bgate.mellanox.com/jenkins/job/gh-ompi-master-merge/8) is fine

btw, now jenkins runs for "merge commits" as well, and has "coverity html report" per PR attached from the left for specific PR.

here is a report example:

http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-merge//ws/cov_build/all_9/c/output/errors/index.html

@mike-dubman
Copy link
Member

bot:retest

@mellanox-github
Copy link

@mellanox-github
Copy link

@mellanox-github
Copy link

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/260/
Test PASSed.

@jsquyres
Copy link
Member Author

bot:retest

@mellanox-github
Copy link

@mellanox-github
Copy link

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/263/
Test PASSed.

@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from d037da3 to 0915ae2 Compare February 23, 2015 14:32
@mellanox-github
Copy link

@PHHargrove
Copy link
Member

I have tested a tarball provided by Jeff, based on his commit 5a3fcf2.
It passes my testing on a variety of BSD and Solaris platforms, plus some older Mac OS X versions, and Linux with all sorts of compilers.

My tests include running ring_c (usually with 2 ranks on a single host).
However, I have not seen the failure that jenkins reports, above, on the same commit.

@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from 5a3fcf2 to 33987d5 Compare February 27, 2015 11:44
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/304/

Build Log
last 50 lines

[...truncated 19560 lines...]
[jenkins01:27032] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27032] *** End of error message ***
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmpi.so.0(MPI_Init+0x1b0)[0x7ffff7d8c740]
[jenkins01:27026] [12] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400826]
[jenkins01:27026] [13] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27026] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27026] *** End of error message ***
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmpi.so.0(MPI_Init+0x1b0)[0x7ffff7d8c740]
[jenkins01:27028] [12] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400826]
[jenkins01:27028] [13] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27028] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27028] *** End of error message ***
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmpi.so.0(MPI_Init+0x1b0)[0x7ffff7d8c740]
[jenkins01:27022] [12] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400826]
[jenkins01:27022] [13] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27022] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27022] *** End of error message ***
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27020] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27020] *** End of error message ***
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libmpi.so.0(MPI_Init+0x1b0)[0x7ffff7d8c740]
[jenkins01:27024] [12] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400826]
[jenkins01:27024] [13] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27024] [14] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello_c[0x400739]
[jenkins01:27024] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node jenkins01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Build step 'Execute shell' marked build as failure
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Saving reports...
Processing '/var/lib/jenkins/jobs/gh-ompi-master-pr/builds/304/tap-master-files/cov_stat.tap'
Parsing TAP test result [/var/lib/jenkins/jobs/gh-ompi-master-pr/builds/304/tap-master-files/cov_stat.tap].
ok - coverity found no issues for all_304
ok - coverity found no issues for oshmem_304
ok - coverity found no issues for yalla_304
ok - coverity found no issues for mxm_304
ok - coverity found no issues for fca_304
ok - coverity found no issues for hcoll_304

TAP Reports Processing: FINISH
Anchor chain: could not read file with links: /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/jenkins_sidelinks.txt (No such file or directory)
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
Setting commit status on GitHub for https://github.com/open-mpi/ompi/commit/65e14067ea0676188d8e2485884a3dfed635e806
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of 33987d5b6068337734ca9d0e3380198060b9c3d0 to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/304/ and message: Merged build finished.

Test FAILed.

@jsquyres
Copy link
Member Author

bot:retest

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/308/
Test PASSed.

@jsquyres
Copy link
Member Author

jsquyres commented Mar 3, 2015

There's one more known issue on this branch that is being exposed by the change to OPAL_CHECK_PACKAGE: @PHHargrove noticed that when he runs this:

/home/phhargrove/OMPI/openmpi-pr410-v4-linux-x86_64-psm/openmpi-gitclone/configure --prefix=/home/phhargrove/OMPI/openmpi-pr410-v4-linux-x86_64-psm/INST --enable-debug --with-openib --with-psm=/usr/local/Infinipath

During the config for mtl_psm, it does not find his infinipath library, and rules that PSM is unavailable. The problem seems to be this:

--- MCA component mtl:psm (m4 configuration macro)
checking for MCA component mtl:psm compile mode... dso
checking --with-psm value... sanity check ok (/usr/local/Infinipath)
checking --with-psm-libdir value... simple ok (unspecified)
checking psm.h usability... yes
checking psm.h presence... yes
checking for psm.h... yes
looking for library in lib
checking for library containing psm_finalize... no
looking for library in lib64
checking for library containing psm_finalize... (cached) no
configure: error: PSM support requested but not found.  Aborting

My educated guess at why this is happening is that the switch from AC_CHECK_LIB to AC_SEARCH_LIBS means that the AC var we're resetting under the covers to enable us to run AC_SEARCH_LIBS twice (once with -L.../lib and again with -L.../lib64) is now wrong. I.e., it was the correct variable for AC_CHECK_LIB, but it's a different variable for AC_SEARCH_LIBS.

@PHHargrove
Copy link
Member

"Use the source, Luke"

A quick look at /usr/share/autoconf/autoconf/libs.m4 reveals

# AC_SEARCH_LIBS(FUNCTION, SEARCH-LIBS,
#                [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND],
#                [OTHER-LIBRARIES])
# --------------------------------------------------------
# Search for a library defining FUNC, if it's not already available.
AC_DEFUN([AC_SEARCH_LIBS],
[AS_VAR_PUSHDEF([ac_Search], [ac_cv_search_$1])dnl
AC_CACHE_CHECK([for library containing $1], [ac_Search],
....

So, it looks like the cache variable is ac_cv_search_[function] which indeed is different than ac_cv_lib_[library]_[function]

@jsquyres
Copy link
Member Author

jsquyres commented Mar 3, 2015

Thanks!

I am busy preparing for voting on MPI-3.1 this week; I just hadn't gotten to look in the autoconf source yet...

@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from 33987d5 to ad39b69 Compare March 4, 2015 13:29
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/319/
Test PASSed.

jsquyres added 9 commits March 9, 2015 08:16
Embedding libltdl without the use of Libtool bootstrapping has
proven... difficult.  Instead, create a new simple "dl" framework.  It
only provides 4 functions:

- open a DSO (very similar to lt_dlopenadvise())
- lookup a symbol in a previously-opened DSO (very similar to lt_dlsym())
- close a previously-opened DSO (very similar to lt_dlclose())
- iterate over all files in a directory (very similar to ld_dlforeachfile())

There will be follow-on commits with a simple dlopen-based component
(nowhere near as complete/functional as libltdl, but good enough for
Linux and OS X), and a libltdl-based component for all other
platforms.

The intent is that the dlopen-based component can be built by default
in almost all cases.  But if libltdl is available, that component will
be built.  End result: we still get DSO-based functionality by default
in (almost?) all cases.  Without embedding libltdl.  Which is what we
want.
Works on systems with dlopen (e.g., Linux and OS X).  It requires
dlfcn.h and libdl, which many systems have installed by default.
Works on any system that libltdl supports and has ltdl.h and libltdl
available.
Noe that this commit removes option:lt_dladvise from the various
"info" tools output.  This technically breaks our CLI "ABI" because
we're not deprecating it / replacing it with an alias to some other
"into" tool output.

Although the dl/libltdl component contains an "have_lt_dladvise" MCA
var that contains the same information, the "option:lt_dladvise"
output from the various "info" tools is *not* an MCA var, and
therefore we can't alias it.  So it just has to die.
The libltdl interface has been completely replaced by the OPAL DL
framework (i.e., the opal_dl interface).

Fixes open-mpi#311
@jsquyres jsquyres force-pushed the topic/libltdl-must-die branch from ad39b69 to 914880a Compare March 9, 2015 15:28
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/333/
Test PASSed.

jsquyres added a commit that referenced this pull request Mar 9, 2015
RFC: Replace libltdl with OPAL "dl" framework
@jsquyres jsquyres merged commit b958daa into open-mpi:master Mar 9, 2015
@jsquyres jsquyres deleted the topic/libltdl-must-die branch March 9, 2015 15:59
jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants