Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No protection for opening plugins from wrong OMPI version #475

Open
opoplawski opened this issue Mar 16, 2015 · 19 comments
Open

No protection for opening plugins from wrong OMPI version #475

opoplawski opened this issue Mar 16, 2015 · 19 comments
Assignees

Comments

@opoplawski
Copy link
Contributor

I'm trying to build openmpi master from openmpi-dev-1330-g7640507 as the Fedora package for testing. I'm getting:

$ ./dlopen_test 
[barry:08971] *** Process received signal ***
[barry:08971] Signal: Segmentation fault (11)
[barry:08971] Signal code: Address not mapped (1)
[barry:08971] Failing at address: 0x1
[barry:08971] [ 0] /lib64/libpthread.so.0(+0x100d0)[0x7f6a37c2c0d0]
[barry:08971] [ 1] /lib64/libc.so.6(strlen+0x2a)[0x7f6a378e9c8a]
[barry:08971] [ 2] /lib64/libc.so.6(__strdup+0xe)[0x7f6a378e99ae]
[barry:08971] [ 3] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(+0x46204)[0x7f6a38e15204]
[barry:08971] [ 4] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(mca_base_component_var_register+0x3c)[0x7f6a38e1549c]
[barry:08971] [ 5] /usr/lib64/openmpi/lib/openmpi/mca_shmem_mmap.so(+0x1264)[0x7f6a36e40264]
[barry:08971] [ 6] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(mca_base_framework_components_register+0x159)[0x7f6a38e196e9]
[barry:08971] [ 7] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(mca_base_framework_register+0x166)[0x7f6a38e19a46]
[barry:08971] [ 8] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(mca_base_framework_open+0x31)[0x7f6a38e19ac1]
[barry:08971] [ 9] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/opal/.libs/libopen-pal.so.0(opal_init+0x18c)[0x7f6a38df690c]
[barry:08971] [10] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/ompi/debuggers/.libs/lt-dlopen_test(+0x121f)[0x7f6a3981021f]
[barry:08971] [11] /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f6a3787efe0]
[barry:08971] [12] /export/home/orion/fedora/openmpi/openmpi-dev-1330-g7640507/ompi/debuggers/.libs/lt-dlopen_test(+0xd59)[0x7f6a3980fd59]
[barry:08971] *** End of error message ***

(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007fe55ad4c9ae in __GI___strdup (s=0x1 <error: Cannot access memory at address 0x1>)
    at strdup.c:41
#2  0x00007fe55c278204 in register_variable (framework_name=<optimized out>,
    component_name=0x7fe55a4a5178 <mca_shmem_mmap_component+56> "mmap",
    variable_name=<optimized out>, description=<optimized out>,
    type=MCA_BASE_VAR_TYPE_VERSION_STRING, enumerator=<optimized out>, bind=0,
    flags=(MCA_BASE_VAR_FLAG_SETTABLE | MCA_BASE_VAR_FLAG_DWG), info_lvl=OPAL_INFO_LVL_9,
    scope=MCA_BASE_VAR_SCOPE_LOCAL, synonym_for=-1,
    storage=0x7fe55a4a5100 <opal_shmem_mmap_nfs_warning>, project_name=0x0)
    at mca_base_var.c:1417
#3  0x00007fe55c278457 in mca_base_var_register (project_name=project_name@entry=0x0,
    framework_name=<optimized out>, component_name=<optimized out>,
    variable_name=<optimized out>, description=<optimized out>, type=<optimized out>,
    enumerator=<optimized out>, bind=<optimized out>, flags=<optimized out>,
    info_lvl=<optimized out>, scope=<optimized out>,
    storage=0x7fe55a4a5100 <opal_shmem_mmap_nfs_warning>) at mca_base_var.c:1444
#4  0x00007fe55c27849c in mca_base_component_var_register (component=<optimized out>,
    variable_name=<optimized out>, description=<optimized out>, type=<optimized out>,
    enumerator=<optimized out>, bind=<optimized out>, flags=MCA_BASE_VAR_FLAG_SETTABLE,
    info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL,
    storage=0x7fe55a4a5100 <opal_shmem_mmap_nfs_warning>) at mca_base_var.c:1457
#5  0x00007fe55a2a3264 in mmap_register ()
   from /usr/lib64/openmpi/lib/openmpi/mca_shmem_mmap.so
#6  0x00007fe55c27c6e9 in register_components (project_name=<optimized out>,
    dest=0x7fe55c4c8550 <opal_shmem_base_framework+80>, src=0x7fff2a23da20, output_id=-1,
    type_name=<optimized out>) at mca_base_components_register.c:116
#7  mca_base_framework_components_register (
    framework=framework@entry=0x7fe55c4c8500 <opal_shmem_base_framework>,
    flags=flags@entry=MCA_BASE_REGISTER_DEFAULT) at mca_base_components_register.c:67
#8  0x00007fe55c27ca46 in mca_base_framework_register (
    framework=0x7fe55c4c8500 <opal_shmem_base_framework>,
    flags=flags@entry=MCA_BASE_REGISTER_DEFAULT) at mca_base_framework.c:112
#9  0x00007fe55c27cac1 in mca_base_framework_open (
    framework=0x7fe55c4c8500 <opal_shmem_base_framework>,
    flags=flags@entry=MCA_BASE_OPEN_DEFAULT) at mca_base_framework.c:136
#10 0x00007fe55c25990c in opal_init (pargc=<optimized out>, pargv=<optimized out>)
    at runtime/opal_init.c:471
#11 0x00007fe55cc7321f in main (argc=1, argv=0x7fff2a23dc28) at dlopen_test.c:133

(gdb) up
#2  0x00007fe55c278204 in register_variable (framework_name=<optimized out>,
    component_name=0x7fe55a4a5178 <mca_shmem_mmap_component+56> "mmap",
    variable_name=<optimized out>, description=<optimized out>,
    type=MCA_BASE_VAR_TYPE_VERSION_STRING, enumerator=<optimized out>, bind=0,
    flags=(MCA_BASE_VAR_FLAG_SETTABLE | MCA_BASE_VAR_FLAG_DWG), info_lvl=OPAL_INFO_LVL_9,
    scope=MCA_BASE_VAR_SCOPE_LOCAL, synonym_for=-1,
    storage=0x7fe55a4a5100 <opal_shmem_mmap_nfs_warning>, project_name=0x0)
    at mca_base_var.c:1417
1417                ((char **)storage)[0] = strdup (((char **)storage)[0]);
(gdb) print storage
$1 = (void *) 0x7fe55a4a5100 <opal_shmem_mmap_nfs_warning>
(gdb) print ((char **)storage)[0]
$2 = 0x1 <error: Cannot access memory at address 0x1>
./configure --prefix=/usr/lib64/openmpi --mandir=/usr/share/man/openmpi-x86_64 --includedir=/u
sr/include/openmpi-x86_64 --sysconfdir=/etc/openmpi-x86_64 --disable-silent-rules --enable-mpi-j
ava --with-libevent=/usr --with-verbs=/usr --with-sge --with-valgrind --enable-memchecker --with
-hwloc=/usr --with-libltdl=/usr CC=gcc CXX=g++ 'LDFLAGS=-Wl,-z,relro -specs=/usr/lib/rpm/redhat/
redhat-hardened-ld' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' 'CXXFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' FC=gfortran 'FCFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic'
@opoplawski
Copy link
Contributor Author

Okay, this turned out to be caused by having an already installed openmpi on the system. Apparently the dlopen() code is picking that up in preference to the build libraries. This seems like a bug as well, but not sure if this is new or not.

@hjelmn
Copy link
Member

hjelmn commented Mar 16, 2015

That is a long outstanding bug in Open MPI. We are discussing ways to correct this for 1.9.

@opoplawski
Copy link
Contributor Author

Okay, I'll leave to this to close/reassign/etc. as you see fit then.

@jsquyres jsquyres changed the title Segmentation fault in dlopen_test No protection for opening plugins from wrong OMPI version Mar 16, 2015
@hppritcha hppritcha added this to the Open MPI 1.9.0 milestone Mar 18, 2015
@hjelmn
Copy link
Member

hjelmn commented Mar 18, 2015

Thanks Howard.

To begin the discussion we could probably use the following scheme for plugin paths:

<prefix>/lib[64]/openmpi/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/

This is the most flexible naming scheme. So for example a 1.9/2.0 btl 3.0 plugin might be found in:

<prefix>/lib/openmpi/opal/2.0/btl/3.0

@jsquyres
Copy link
Member

I mostly agree, but will be slightly more pedantic:

<libdir>/openmpi/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/

That being said, this is probably a little overkill -- do we need both PROJECT_VERSION and ABI_VERSION? I.e., won't those 2 be chained together?

@jsquyres
Copy link
Member

Actually, I wasn't pedantic enough. :-)

This is more correct:

<pkglibdir>/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/

@bosilca
Copy link
Member

bosilca commented Mar 19, 2015

Our modules have a version number. Why simply discarding right after dlopen all modules with the wrong version number is not a adequate solution?

@hjelmn
Copy link
Member

hjelmn commented Mar 19, 2015

Hmm, that may be the way to go. We now have the mca version, project version, and type version in the mca component.

@hjelmn
Copy link
Member

hjelmn commented Mar 19, 2015

To make this work well I should probably version the frameworks themselves. Will investigate.

@hppritcha
Copy link
Member

George do you mean the shared library version number?

Couldn't we use some versioned symbols similar to the way libfabric does it, and then check for the presence of a particular versioned symbol in a *.so using dlvsym? Are there any other projects that need all this type of subdirectory structure for shared libraries they use internally? It seems a little weird.

Is the goal to be able to install multiple versions of open mpi in the same location, or just make sure that an incompatible *.so in openmpi dir is not dlopen'd with subsequent badness as reported above?

@hjelmn
Copy link
Member

hjelmn commented Mar 19, 2015

Howard, the primary goal is to no open incompatible .so's but it would be a useful feature to be able to have multiple versions of a project (opal for example) installed in the same tree.

@jsquyres
Copy link
Member

Don't forget that there are other reasons why we can't install two versions of OMPI into the same tree, such as:

  • mpi.h is installed into $installdir
  • libmpi.* (and all the other top-level libraries) are installed into $libdir

The goal is to prevent a scenario like this:

  1. User installs version A.B.C into $prefix
  2. User later installs version D.E.F into the same $prefix
  3. User runs new D.E.F OMPI and Bad Things happen because some old A.B.C components were still in $prefix/lib/openmpi

That being said, perhaps just checking the version numbers in the .so is good enough -- perhaps a new directory structure is not worth it (since, even if you do that, you can't install multiple versions of OMPI into the same tree).

@hppritcha I think the symbol versioning stuff is a slightly different use case than what we're trying to protect against here...?

@opoplawski
Copy link
Contributor Author

Just to be explicit - the problem I was running into was having openmpi 1.8.2 installed in /usr, then building newer versions in my home directories and running in-tree tests.

@hppritcha
Copy link
Member

I thought the .so's in openmpi directory lack version numbers, hence the --avoid-version in the la_LDFLAGS in all the mca//Makefile.am's. I guess we'd have to pay attention to VERSION then and not just fill in 0.0.0? I'd be fine with that. As long as we kept true to the current/rev/age formula and have C-A really mean something, this would take care of the problem, including Opoplawski's problem.
It would get complicated if the version numbers for the different *.so's could vary.

@hjelmn
Copy link
Member

hjelmn commented Mar 19, 2015

Yes. The .so files have no version number in the filename. What George is referring to is the mca_base_component_t inside the .so. That contains version information for the plugin.

The only problem with using that structure is we may change it from release to release. We just did this by adding the project name and version.

@hppritcha
Copy link
Member

sounds like a problem of introducing standard shared library versioning.
just say no to -avoid-version and really use so versioning.
On Mar 18, 2015 8:47 PM, "Nathan Hjelm" [email protected] wrote:

Yes. The .so files have no version number in the filename. What George is
referring to is the mca_base_component_t inside the .so. That contains
version information for the plugin.

The only problem with using that structure is we may change it from
release to release. We just did this by adding the project name and version.


Reply to this email directly or view it on GitHub
#475 (comment).

@hppritcha
Copy link
Member

@hjelmn is #449 good enough to close this bug.

@jsquyres jsquyres modified the milestones: Open MPI 2.X, Open MPI v2.0.0 Jun 25, 2015
@jsquyres
Copy link
Member

We talked about this in person at the dev meeting in June 2015. We concluded:

  1. It is not sufficient to pass the framework (and/or framework version) to the framework open function, and only open components that match that version, because you could run into a scenario like this:
    • OMPI vA.B.C is installed
    • OMPI vA.B.(C+1) is installed
    • Component X in both of these has the same framework version.
    • But component X for vA.B.(C+1) uses a symbol in the framework base that exists in A.B.(C+1), but not in A.B.C.
  2. Hence, we really need to check the project version of components to decide whether we should open them or not.

@hjelmn says that he will get to this some time in the v2.x series.

@hppritcha
Copy link
Member

Moving to 3.x

@hppritcha hppritcha modified the milestones: v3.1.0, v3.0.0 Mar 14, 2017
@bwbarrett bwbarrett removed this from the v3.1.0 milestone Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants