-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in simple program using MPI_Comm_accept()/connect() #4153
Comments
It's strange that a very similar test case is passing with MTT:
The stack looks like a data corruption. How did you configure Open MPI? |
The only configure option I used for these tests was --with-ucx=no. |
I can't seem to get it to fail with --enable-debug and --use-memchecker. Valgrind runs on these show no apparent issues. |
IBM is enabling MTT testing on optimized builds tonight of the v3.0.x branch. I'll send email to devel-core. |
-I still cannot get this to fail on a debug build. |
Updated 9/1 to reflect new info. The failure doesn't seem intercom related. I attached a new simple reproducer. |
@awlauria thanks for the report, i ll take a crack at it. this is a very odd memory corruption in
unfortunatly the error does not always occur. i wrote it is strange because at first, i was able to trace at line 355 (iirc)
but then, at line 453 (iirc)
with the same index returns the previous i will keep digging tomorrow |
correctly balance some parenthesis ... Fixes open-mpi#4153 Thanks Austen Lauria for the report This is a one-off commit for the v3.0.x branch, master was fixed as part of a larger commit, and the v2 branches are unaffected. Signed-off-by: Gilles Gouaillardet <[email protected]>
Great, thank you @ggouaillardet for looking at it. |
I confirmed it does not happen on master nor the new v3.0.0rc5 build. Fixed by #4167. |
Updated, new info 9/1/17
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
3.0.0rc4
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
Please describe the system on which you are running
Red Hat 7.3
Power and X86
Details of the problem
I have a simple test that hits a segmentation fault intermittently in all v3.0 rc's.
The tests seems to pass on the master branch, though it's not apparent as to where the bug got 'fixed'. I attached a sample testcase that will fail running on a single node with two tasks. It may not fail every time, but if you run it in a loop you will hit the segmentation fault after some runs. The location the segmentation fault will change, so below is an example stack-trace.
I can only get it to crash using an optimized build.
Running with valgrind doesn't show any heap corruption, even on the fail case. So it seems to be stack related, unless valgrind is missing something.
sample run:
`mpirun -np 2 ./simple_test
simple_test.zip
The text was updated successfully, but these errors were encountered: