-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure during MPI_Win_detach #7384
Comments
All OpenMPI versions I tried, including 3.x, 4.x, and the latest github master fail, although not sure if the problem is exactly the same in all cases. I only debugged the latest master and 4.0.2, where I found that the error occurs in /ompi/mca/osc/rdma/osc_rdma_dynamic.c in function ompi_osc_rdma_detach because ompi_osc_rdma_find_region_containing() does not find the dynamic memory region, but it should still exist (I checked the application code to the best I could). |
Can you make a smaller reproducer, perchance? |
I will use the one provided. Will quickly delete gfortran once I am done :p. Keep in mind that in some implementations attach/detach are essentially no-ops so success with another implementation does not necessarily mean there is a bug in Open MPI. But given that attach/detach test coverage is incomplete it would not be surprising if there is a bug. |
Making a smaller reproducer would be extremely hard due to the code architecture, but I can assist with code navigation and debugging. In particular, there is only one source of MPI_Win_attach() in DDSS/distributed.F90 (subroutine DataWinAttach) and there is only one source of MPI_Win_detach() in DDSS/distributed.F90 (subroutine DataWinDetach). Similarly, all other MPI-3 RMA one-sided functionality is located in DDSS/distributed.F90. MPI dynamic windows are used with MPI_Rget() and MPI_Raccumulate() performing communications from within a PARALLEL OpenMP region, synchronized via MPI_Test() (inside the MPI_Win_lock and MPI_Win_unlock epoch). |
I am also trying to double check that my use of MPI-3 is valid, which I believe it is, but still there is a chance I am missed something. |
|
You are likely not using gcc/8.x. |
Only gfortran/8.x works, other versions have compiler bugs, including gfortran/9.x. |
I have just checked again and can confirm that the application is trying to detach a valid (previously attached) region which has not been detached before. Moreover, this is neither the first Detach call nor the last, if this can help. Also, no error code is returned from MPI_Win_detach() because of the crash, so I would assume the Open MPI would have returned an error code instead of crashing in case this issue was on the application side, right? |
The problem occurs because the program attaches multiple regions that overlap least one page (minimum hardware registration unit). Taking a look at one crash:
Both of these regions contain the 4k page at 0x151090000000. Ideally osc/rdma should have returned an error as the implementation treats page overlap as region overlap. Overlapping regions are not allowed by the standard. The standard does not, however, give guidance on what overlapping means so it is fair to assume page-level overlap is allowed. This may be an error in the standard as the implementation is usually free to set restrictions based on hardware characteristics. I would very strongly recommend against 1) page overlapped regions, and 2) small regions (an 8 byte attach is incredibly wasteful). I will see about either allowing this kind of overlap or at least returning the proper error from MPI_Win_attach. Both of these are trivial to implement, just want to make sure we follow the intended behavior (standard may be wrong here). |
Ha, this is subtle, but makes total sense from the implementation point of view. Thanks for such a quick investigation! I have always interpreted the standard as non-overlapping virtual ranges. On the other hand, can we safely assume that a 4K-byte page is always the minimum hardware registration unit on all systems, because otherwise the MPI standard will introduce a non-portable restriction? In any case, the error code and message from MPI_Win_attach would definitely help here. Thanks. |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
I have just built the branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap from https://github.com/hjelmn/ompi, but it results in exactly the same problem as before, even if I specify --mca osc_rdma_max_attach 128 in mpirun. Am I testing the wrong branch/commit? Commit I have is ec331c7 |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn @DmitryLyakh I did a test with cherry-pick of #7383 and #7387 to point just prior to current PRRTE (no-orte) changes due to unrelated problems. For clarity, I pushed the branch I tested here: https://github.com/naughtont3/ompi/tree/pre-NoRTE-plus-oscrdma-fix Things fail later, but there is still a failure during the detach with SEGV while inside ompi_osc_rdma_remove_attachment(). A debug log with osc_base_verbose and osc_rdma_verbose enabled is attached. |
I noticed that had omp threads=8, so I re-ran with |
It seems like the |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
Think I found the issue. Please try it again. |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
This seems better for my test. But I will have to check with Dmitry to ensure it behaves as expected for him. |
I am unable to test this as I cannot find which branch/commit I am supposed to try. The last commit I see in Nathan's repository on branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap is dated Feb 11. Where is the latest commit with today's fix? |
This is what I see as the latest commit (Feb 11):
|
Forced pushed the branch. Just reclone it. You will probably need to set the max attach to 256 or higher for your app. |
I re-cloned https://github.com/hjelmn/ompi.git, checked out branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap, commit 96ed630. |
I tried both 128 and 256 max limit, no difference ... However, Thomas somehow made it work on hist desktop. I will try on Summit as well later, but on my laptop I observe no difference, it always crashes the same way as originally reported ... |
This is the mpiexec command I used on my laptop: hostfile: |
Yes, I think that his the same scenario b/c I see |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]>
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow #2. If that is the case then #2 should fail in the same way as #1. There should be no technical reason to disallow #2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 6649aef) Signed-off-by: Nathan Hjelm <[email protected]>
Well, I can no longer reproduce the issue. I committed the changes to master so you can go ahead and try that and see what you get. |
Do you mean the test code I provided runs to completion without a crash in MPI_Win_detach() in your case? I built and tested the latest commit (below) from github.com/hjelmn/ompi and still getting the same crash on my Ubuntu 16.04 machine ... commit 54c8233
|
Thomas, does the latest commit pass the test on your desktop? |
Yes on my desktop (not tested on Summit yet). I pulled OMPI master with Nathan's changes merged and rebuilt on my desktop.
|
Looks to me like it runs to completion. I am running with Open MPI master with only setting the osc_rdma_max_attach MCA variable to 1024 (just to be safe). |
Just in case, did you build OpenMPI in Debug or Release mode? |
debug mode. shouldn't make a difference but I can try again in optimized mode. |
shouldn't but is. huh |
ok, I see the issue. had the call inside an assert and we optimize assert out in non-debug builds. fixing. |
See also PR #7421 |
@DmitryLyakh I think I worked through most of the issues I was hitting on Summit. I am now able to run your reproducer w/o error on Summit using the "DEVELOP" |
Confirmed on my desktop: The OpenMPI master branch works fine in the Release mode after PR #7421 |
Thanks for fixing this! I guess this issue can be closed now. |
@naughtont3 could you boil this down to a small reproducer we can put into the ibm test suite? |
I can add one. Just need to attach and detach a bunch. |
@hppritcha OK, I'll sync with @DmitryLyakh and get it into a test batch. |
@hjelmn if you can write simple unit test, that would be easier than having full application case. i'll try to get a version of @DmitryLyakh code into a test somewhere, but your unit test would be an easier case for most folks to quickly test. Thx |
This commit addresses two issues in osc/rdma: 1) It is erroneous to attach regions that overlap. This was being allowed but the standard does not allow overlapping attachments. 2) Overlapping registration regions (4k alignment of attachments) appear to be allowed. Add attachment bases to the bookeeping structure so we can keep better track of what can be detached. It is possible that the standard did not intend to allow open-mpi#2. If that is the case then open-mpi#2 should fail in the same way as open-mpi#1. There should be no technical reason to disallow open-mpi#2 at this time. References open-mpi#7384 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 6649aef) Signed-off-by: Nathan Hjelm <[email protected]>
@naughtont3 @hjelmn Where are we on this issue? Did @cniethammer's cherry pick fix the issue on v4.0.x and/or v4.1.x? |
@naughtont3 could you check if this is still a problem master and release branches? |
closing. reopen if this is still a problem. |
Background information
Application failure with Open MPI with one sided communication (OSC).
Reporting on behalf of user to help track problem.
The test works fine with MPICH/3.x, Spectrum MPI, and Intel MPI.
What version of Open MPI are you using?
Describe how Open MPI was installed
Standard tarball build can reproduce, using gcc/8.x compiler suite (gcc-8, g++-8, gfortran-8). Need gcc > 8.x to avoid past gfortran bugs.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Reproducible on any Linux workstation (Need gcc > 8.x to avoid past gfortran bugs)
Details of the problem
You should be able to reproduce this failure on any Linux workstation. The only
thing you need to make sure is to use the gcc/8.x compiler suite (gcc-8, g++-8,
gfortran-8) since other versions are buggy in their gfortran part.
I added
.txt
extension to attach to github ticket.Normally
run.exatensor.sh
invokesmpiexec
with the binary directly, but for some reason thempiexec
from the latest GIT master branch fails to load some dynamic libraries (libgfortran), so I introduced a workaround whererun.exatensor.sh
invokesmpiexec
withexec.sh
, which in turn executes the binaryQforce.x
. Previous OpenMPI versions did not have this issue by the way. But all of them fail inMPI_Win_detach
as you can see below:The text was updated successfully, but these errors were encountered: