Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure during MPI_Win_detach #7384

Closed
naughtont3 opened this issue Feb 11, 2020 · 44 comments
Closed

failure during MPI_Win_detach #7384

naughtont3 opened this issue Feb 11, 2020 · 44 comments

Comments

@naughtont3
Copy link
Contributor

Background information

Application failure with Open MPI with one sided communication (OSC).

Reporting on behalf of user to help track problem.

The test works fine with MPICH/3.x, Spectrum MPI, and Intel MPI.

What version of Open MPI are you using?

  • Fails with latest master (c6831c5)
  • Need to check status with v4.0.x and v3.0.x

Describe how Open MPI was installed

Standard tarball build can reproduce, using gcc/8.x compiler suite (gcc-8, g++-8, gfortran-8). Need gcc > 8.x to avoid past gfortran bugs.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Linux
  • Computer hardware:
  • Network type:

Reproducible on any Linux workstation (Need gcc > 8.x to avoid past gfortran bugs)


Details of the problem

You should be able to reproduce this failure on any Linux workstation. The only
thing you need to make sure is to use the gcc/8.x compiler suite (gcc-8, g++-8,
gfortran-8) since other versions are buggy in their gfortran part.

    1. git clone https://gitlab.com/DmitryLyakh/ExaTensor.git
    1. git checkout openmpi_fail
    1. export PATH_OPENMPI=PATH_TO_OPENMPI_ROOT_DIR
    1. make
    1. Copy the produced binary Qforce.x into some directory and place both attached scripts there as well
    1. Run run.exatensor.sh (it runs 4 MPI processes, which is the minimal configuration; each process runs up to 8 threads, which is also mandatory,but all these can be run on a single node)

I added .txt extension to attach to github ticket.

Normally run.exatensor.sh invokes mpiexec with the binary directly, but for some reason the mpiexec from the latest GIT master branch fails to load some dynamic libraries (libgfortran), so I introduced a workaround where run.exatensor.sh invokes mpiexec with exec.sh, which in turn executes the binary Qforce.x. Previous OpenMPI versions did not have this issue by the way. But all of them fail in MPI_Win_detach as you can see below:

Destroying tensor dtens ... [exadesktop:32108] *** An error occurred in MPI_Win_detach
[exadesktop:32108] *** reported by process [3156279297,1]
[exadesktop:32108] *** on win rdma window 5
[exadesktop:32108] *** MPI_ERR_OTHER: known error not in list
[exadesktop:32108] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[exadesktop:32108] ***    and potentially your MPI job)
[exadesktop:32108] [0]func:/usr/local/mpi/openmpi/git/lib/libopen-pal.so.0(opal_backtrace_buffer+0x35) [0x149adfa0726f]
[exadesktop:32108] [1] func:/usr/local/mpi/openmpi/git/lib/libmpi.so.0(ompi_mpi_abort+0x9a) [0x149ae0574db1]
[exadesktop:32108] [2] func:/usr/local/mpi/openmpi/git/lib/libmpi.so.0(+0x48d6e) [0x149ae055ad6e]
[exadesktop:32108] [3]func:/usr/local/mpi/openmpi/git/lib/libmpi.so.0(ompi_mpi_errors_are_fatal_win_h andler+0xed) [0x149ae055a3d2]
[exadesktop:32108] [4] func:/usr/local/mpi/openmpi/git/lib/libmpi.so.0(ompi_errhandler_invoke+0x155) [0x149ae0559c11]
[exadesktop:32108] [5] func:/usr/local/mpi/openmpi/git/lib/libmpi.so.0(PMPI_Win_detach+0x197) [0x149ae05f2417]
[exadesktop:32108] [6] func:/usr/local/mpi/openmpi/git/lib/libmpi_mpifh.so.0(mpi_win_detach__+0x38) [0x149ae0946d86]
[exadesktop:32108] [7] func:./Qforce.x() [0x564a82]
[exadesktop:32108] [8] func:./Qforce.x() [0x564b42]
[exadesktop:32108] [9] func:./Qforce.x() [0x56df9e]
[exadesktop:32108] [10] func:./Qforce.x() [0x4319fa]
[exadesktop:32108] [11] func:./Qforce.x() [0x42a326]
[exadesktop:32108] [12] func:./Qforce.x() [0x42e2cc]
[exadesktop:32108] [13] func:./Qforce.x() [0x4de039]
[exadesktop:32108] [14] func:/usr/local/gcc/8.2.0/lib64/libgomp.so.1(+0x1743e) [0x149ae841343e]
[exadesktop:32108] [15] func:/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x149ae1d416db]
[exadesktop:32108] [16] func:/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x149adfdea88f]
[exadesktop:32103] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix4x/openpmix/src/server/pmix_server.c at line 2188
[exadesktop:32103] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[exadesktop:32103] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
@DmitryLyakh
Copy link

All OpenMPI versions I tried, including 3.x, 4.x, and the latest github master fail, although not sure if the problem is exactly the same in all cases. I only debugged the latest master and 4.0.2, where I found that the error occurs in /ompi/mca/osc/rdma/osc_rdma_dynamic.c in function ompi_osc_rdma_detach because ompi_osc_rdma_find_region_containing() does not find the dynamic memory region, but it should still exist (I checked the application code to the best I could).

@jsquyres
Copy link
Member

Can you make a smaller reproducer, perchance?

@hjelmn
Copy link
Member

hjelmn commented Feb 11, 2020

I will use the one provided. Will quickly delete gfortran once I am done :p. Keep in mind that in some implementations attach/detach are essentially no-ops so success with another implementation does not necessarily mean there is a bug in Open MPI.

But given that attach/detach test coverage is incomplete it would not be surprising if there is a bug.

@DmitryLyakh
Copy link

Making a smaller reproducer would be extremely hard due to the code architecture, but I can assist with code navigation and debugging. In particular, there is only one source of MPI_Win_attach() in DDSS/distributed.F90 (subroutine DataWinAttach) and there is only one source of MPI_Win_detach() in DDSS/distributed.F90 (subroutine DataWinDetach). Similarly, all other MPI-3 RMA one-sided functionality is located in DDSS/distributed.F90. MPI dynamic windows are used with MPI_Rget() and MPI_Raccumulate() performing communications from within a PARALLEL OpenMP region, synchronized via MPI_Test() (inside the MPI_Win_lock and MPI_Win_unlock epoch).

@DmitryLyakh
Copy link

I am also trying to double check that my use of MPI-3 is valid, which I believe it is, but still there is a chance I am missed something.

@hjelmn
Copy link
Member

hjelmn commented Feb 11, 2020

f951: internal compiler error: in generate_finalization_wrapper, at fortran/class.c:1993
Please submit a full bug report,
with preprocessed source if appropriate.

@DmitryLyakh
Copy link

You are likely not using gcc/8.x.

@DmitryLyakh
Copy link

Only gfortran/8.x works, other versions have compiler bugs, including gfortran/9.x.

@DmitryLyakh
Copy link

I have just checked again and can confirm that the application is trying to detach a valid (previously attached) region which has not been detached before. Moreover, this is neither the first Detach call nor the last, if this can help. Also, no error code is returned from MPI_Win_detach() because of the crash, so I would assume the Open MPI would have returned an error code instead of crashing in case this issue was on the application side, right?

@hjelmn
Copy link
Member

hjelmn commented Feb 11, 2020

The problem occurs because the program attaches multiple regions that overlap least one page (minimum hardware registration unit). Taking a look at one crash:

  • MPI_Attach 8 byte region starting at 0x151090000c20.
  • MPI_Attach 30752 byte region starting at 0x151090000f60.

Both of these regions contain the 4k page at 0x151090000000. Ideally osc/rdma should have returned an error as the implementation treats page overlap as region overlap. Overlapping regions are not allowed by the standard. The standard does not, however, give guidance on what overlapping means so it is fair to assume page-level overlap is allowed. This may be an error in the standard as the implementation is usually free to set restrictions based on hardware characteristics. I would very strongly recommend against 1) page overlapped regions, and 2) small regions (an 8 byte attach is incredibly wasteful). I will see about either allowing this kind of overlap or at least returning the proper error from MPI_Win_attach.

Both of these are trivial to implement, just want to make sure we follow the intended behavior (standard may be wrong here).

@DmitryLyakh
Copy link

Ha, this is subtle, but makes total sense from the implementation point of view. Thanks for such a quick investigation! I have always interpreted the standard as non-overlapping virtual ranges. On the other hand, can we safely assume that a 4K-byte page is always the minimum hardware registration unit on all systems, because otherwise the MPI standard will introduce a non-portable restriction? In any case, the error code and message from MPI_Win_attach would definitely help here. Thanks.

hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 12, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
naughtont3 pushed a commit to naughtont3/ompi that referenced this issue Feb 12, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
@DmitryLyakh
Copy link

I have just built the branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap from https://github.com/hjelmn/ompi, but it results in exactly the same problem as before, even if I specify --mca osc_rdma_max_attach 128 in mpirun. Am I testing the wrong branch/commit? Commit I have is ec331c7
Author: Nathan Hjelm [email protected]
Date: Tue Feb 11 21:57:24 2020 -0800

naughtont3 pushed a commit to naughtont3/ompi that referenced this issue Feb 12, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
@naughtont3
Copy link
Contributor Author

@hjelmn @DmitryLyakh I did a test with cherry-pick of #7383 and #7387 to point just prior to current PRRTE (no-orte) changes due to unrelated problems. For clarity, I pushed the branch I tested here: https://github.com/naughtont3/ompi/tree/pre-NoRTE-plus-oscrdma-fix

Things fail later, but there is still a failure during the detach with SEGV while inside ompi_osc_rdma_remove_attachment(). A debug log with osc_base_verbose and osc_rdma_verbose enabled is attached.

@naughtont3
Copy link
Contributor Author

I noticed that had omp threads=8, so I re-ran with OMP_NUM_THREADS=1 for slightly simpler case. It fails to find a memory attachment and throws the error during detach.

@naughtont3
Copy link
Contributor Author

It seems like the rdma_region_handle is ompi_osc_rdma_detach() is NULL. I am not sure why When that happens and you see "could not find dynamic memory attachment" it returns OMPI_ERR_BASE, which triggers the MPI win error handler for errors-are-fatal. So question is why/where does the rdma_region_handle get off?

hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 14, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented Feb 14, 2020

Think I found the issue. Please try it again.

naughtont3 pushed a commit to naughtont3/ompi that referenced this issue Feb 14, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
@naughtont3
Copy link
Contributor Author

This seems better for my test. But I will have to check with Dmitry to ensure it behaves as expected for him.

@DmitryLyakh
Copy link

I am unable to test this as I cannot find which branch/commit I am supposed to try. The last commit I see in Nathan's repository on branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap is dated Feb 11. Where is the latest commit with today's fix?

@DmitryLyakh
Copy link

This is what I see as the latest commit (Feb 11):
commit 96ed630
Author: Nathan Hjelm [email protected]
Date: Tue Feb 11 21:57:24 2020 -0800

osc/rdma: modify attach to check for region overlap

This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <[email protected]>

@hjelmn
Copy link
Member

hjelmn commented Feb 14, 2020

Forced pushed the branch. Just reclone it. You will probably need to set the max attach to 256 or higher for your app.

@DmitryLyakh
Copy link

I re-cloned https://github.com/hjelmn/ompi.git, checked out branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap, commit 96ed630.
And it produces exactly the same crash in MPI_Win_detach as before (below) on my Ubuntu 16.04 laptop. Any ideas?
Printing scalar etens ...
etens()[]
0.10668566847503D+15
Ok: 0.1063 sec
Retrieving directly scalar etens ... Ok: Value = ( 0.10668566847503D+15 0.00000000000000D+00): 0.0560 sec
Retrieving directly tensor dtens ... Ok: Norm = 0.10668566847502D+15: 0.3408 sec
Destroying tensor rtens ... [Incredible:00000] *** An error occurred in MPI_Win_detach
[Incredible:00000] *** reported by process [3374317570,1]
[Incredible:00000] *** on win rdma window 5
[Incredible:00000] *** MPI_ERR_UNKNOWN: unknown error
[Incredible:00000] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[Incredible:00000] *** and potentially your MPI job)
[Incredible:00000] *** An error occurred in MPI_Win_detach
[Incredible:00000] *** reported by process [3374317570,0]
[Incredible:00000] *** on win rdma window 5
[Incredible:00000] *** MPI_ERR_UNKNOWN: unknown error
[Incredible:00000] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[Incredible:00000] *** and potentially your MPI job)
[Incredible:23436] PRUN: EVHANDLER WITH STATUS PMIX_ERR_JOB_TERMINATED(-145)
[Incredible:23436] JOB [51488,2] COMPLETED WITH STATUS -1
[Incredible:23436] PRUN: INFOCB
[Incredible:23436] PRUN: EVHANDLER WITH STATUS LOST_CONNECTION_TO_SERVER(-101)

@DmitryLyakh
Copy link

I tried both 128 and 256 max limit, no difference ... However, Thomas somehow made it work on hist desktop. I will try on Summit as well later, but on my laptop I observe no difference, it always crashes the same way as originally reported ...

@DmitryLyakh
Copy link

DmitryLyakh commented Feb 14, 2020

This is the mpiexec command I used on my laptop:
/usr/local/mpi/openmpi/openmpi-hjelmn/bin/mpiexec -np 4 -npernode 4 --hostfile hostfile --verbose --mca mpi_abort_print_stack 1 --mca osc_rdma_max_attach 256 ./exec.sh

hostfile:
localhost slots=4

@naughtont3
Copy link
Contributor Author

Yes, I think that his the same scenario b/c I see MPI_ERR_UNKNOWN, whereas on my desktop if I exclude the --mca osc_rdma_max_attach 128 it fails during MPI_Win_attach with error MPI_ERR_RMA_ATTACH.

hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 17, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue Feb 17, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 6649aef)
Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented Feb 17, 2020

Well, I can no longer reproduce the issue. I committed the changes to master so you can go ahead and try that and see what you get.

@DmitryLyakh
Copy link

Do you mean the test code I provided runs to completion without a crash in MPI_Win_detach() in your case? I built and tested the latest commit (below) from github.com/hjelmn/ompi and still getting the same crash on my Ubuntu 16.04 machine ...

commit 54c8233
Author: Nathan Hjelm [email protected]
Date: Sun Feb 16 17:09:20 2020 -0800

osc/rdma: bump the default max dynamic attachments to 64

This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <[email protected]>

@DmitryLyakh
Copy link

Thomas, does the latest commit pass the test on your desktop?

@naughtont3
Copy link
Contributor Author

Yes on my desktop (not tested on Summit yet).

I pulled OMPI master with Nathan's changes merged and rebuilt on my desktop.

     beaker:$ gcc --version | head -1
     gcc (Spack GCC) 8.1.0

     beaker:$  ../configure \
            --enable-mpirun-prefix-by-default \
            --enable-debug \
            --prefix=$PWD/_install \
         && make \
         && make install
  • Here's ExaTensor info and launch command-line (on desktop)...
    beaker:$ git remote -v
    origin  https://gitlab.com/DmitryLyakh/ExaTensor.git (fetch)
    origin  https://gitlab.com/DmitryLyakh/ExaTensor.git (push)
    beaker:$ git br
    master
    * openmpi_fail
    beaker:$ git log --oneline | head -2
    bf8a46e Prepared the Makefile for reproducing the OpenMPI crash in MPI_Win_detach().
    68e2a37 Added ddss_flush_all() in DDSS.

    mpirun \
        --np 4 \
        --mca osc rdma \
        --mca mpi_abort_print_stack 1 \
        --mca osc_rdma_max_attach 128 \
        -x OMP_NUM_THREADS=8 \
        -x QF_NUM_PROCS=4 \
        -x QF_PROCS_PER_NODE=4 \
        -x QF_CORES_PER_PROCESS=1 \
        -x QF_MEM_PER_PROCESS=1024 \
        -x QF_NVMEM_PER_PROCESS=0 \
        -x QF_HOST_BUFFER_SIZE=1024 \
        -x QF_GPUS_PER_PROCESS=0 \
        -x QF_MICS_PER_PROCESS=0 \
        -x QF_AMDS_PER_PROCESS=0 \
        -x QF_NUM_THREADS=1 \
        ./Qforce.x

@hjelmn
Copy link
Member

hjelmn commented Feb 18, 2020

Looks to me like it runs to completion. I am running with Open MPI master with only setting the osc_rdma_max_attach MCA variable to 1024 (just to be safe).

@DmitryLyakh
Copy link

Just in case, did you build OpenMPI in Debug or Release mode?

@hjelmn
Copy link
Member

hjelmn commented Feb 18, 2020

debug mode. shouldn't make a difference but I can try again in optimized mode.

@hjelmn
Copy link
Member

hjelmn commented Feb 18, 2020

shouldn't but is. huh

@hjelmn
Copy link
Member

hjelmn commented Feb 18, 2020

ok, I see the issue. had the call inside an assert and we optimize assert out in non-debug builds. fixing.

@naughtont3
Copy link
Contributor Author

See also PR #7421

@naughtont3
Copy link
Contributor Author

@DmitryLyakh I think I worked through most of the issues I was hitting on Summit. I am now able to run your reproducer w/o error on Summit using the "DEVELOP" openmpi/master build. This is using gcc/8.1.1 module and ompi master at 960c5f7. Please, give this a shot and see how things are working for you.

@DmitryLyakh
Copy link

Confirmed on my desktop: The OpenMPI master branch works fine in the Release mode after PR #7421

@DmitryLyakh
Copy link

Thanks for fixing this! I guess this issue can be closed now.

@hppritcha
Copy link
Member

@naughtont3 could you boil this down to a small reproducer we can put into the ibm test suite?

@hjelmn
Copy link
Member

hjelmn commented Feb 24, 2020

I can add one. Just need to attach and detach a bunch.

@naughtont3
Copy link
Contributor Author

@hppritcha OK, I'll sync with @DmitryLyakh and get it into a test batch.

@naughtont3
Copy link
Contributor Author

@hjelmn if you can write simple unit test, that would be easier than having full application case. i'll try to get a version of @DmitryLyakh code into a test somewhere, but your unit test would be an easier case for most folks to quickly test. Thx

cniethammer pushed a commit to cniethammer/ompi that referenced this issue May 10, 2020
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow open-mpi#2. If that
is the case then open-mpi#2 should fail in the same way as open-mpi#1. There should
be no technical reason to disallow open-mpi#2 at this time.

References open-mpi#7384

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 6649aef)
Signed-off-by: Nathan Hjelm <[email protected]>
@jsquyres
Copy link
Member

@naughtont3 @hjelmn Where are we on this issue? Did @cniethammer's cherry pick fix the issue on v4.0.x and/or v4.1.x?

@hppritcha
Copy link
Member

@naughtont3 could you check if this is still a problem master and release branches?

@hppritcha
Copy link
Member

closing. reopen if this is still a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants