error launching/attaching LaunchMON debugger with OpenMPI 2.1.1 #3660

lee218llnl · 2017-06-06T19:12:05Z

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v2.1.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from source tarball

Please describe the system on which you are running

Operating system/version: RHEL7
Computer hardware: x86-64
Network type: infiniband

Details of the problem

I am having trouble attaching LaunchMON when using OpenMPI 2.1.1.

[LMON_FE] launching the job/daemons via /usr/workspace/wsrzd/lee218/install/toss_3_x86_64_ib/ompi-2.1.1/bin/orterun

[LMON FE] 6 RM types are supported
[warn] Epoll ADD(4) on fd 38 failed.  Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
[warn] Epoll ADD(4) on fd 35 failed.  Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
[rzoz1:132008] [[21665,0],0] usock_peer_send_blocking: send() to socket 36 failed: Broken pipe (32)
[rzoz1:132008] [[21665,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[rzoz1:132008] [[21665,0],0]-[[21665,1],1] usock_peer_accept: usock_peer_send_connect_ack failed
[rzoz1:132008] [[21665,0],0] usock_peer_send_blocking: send() to socket 40 failed: Broken pipe (32)
[rzoz1:132008] [[21665,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[rzoz1:132008] [[21665,0],0]-[[21665,1],0] usock_peer_accept: usock_peer_send_connect_ack failed
--------------------------------------------------------------------------
orterun was unable to start the specified application as it encountered an
error:

Error name: Not supported
Node: rzoz1

when attempting to start process rank 0.
--------------------------------------------------------------------------
<Jun 06 11:51:50> <LMON FE API> (INFO): FE-ENGINE connection timed out: 120
[LMON FE] FAILED

Here's how you can reproduce (modify your PATH and the path to mpirun):

git clone https://github.com/llnl/launchmon.git
cd launchmon/
export PATH=/collab/usr/global/tools/openmpi/toss_3_x86_64_ib/openmpi-2.1.1/bin:$PATH
CFLAGS="-g -O0" CXXFLAGS="-g -O0" ./configure --prefix=/nfs/tmp2/lee218/prefix/launchmon-1.0.3b --with-test-rm=orte --with-test-rm-launcher=/collab/usr/global/tools/openmpi/toss_3_x86_64_ib/openmpi-2.1.1/bin/mpirun --with-test-installed --with-test-nnodes=1 && make clean && make -j 8 install && make -j 8 check
cd test/src
./test.launch_1

In addition, the LaunchMON "test.attach_1" test hangs when trying to attach. @rhc54 had previously helped me with various debugger attach issues and we had a working commit. I don't know if that made it into the release or if this is a new issue. It would be nice if LaunchMON tests could be integrated as part of the release testing.

The text was updated successfully, but these errors were encountered:

lee218llnl · 2017-06-06T19:18:51Z

FWIW, I can launch/attach via LaunchMON with OpenMPI 2.0.3

rhc54 · 2017-06-07T03:30:12Z

I won't be able to get to this right away - I'm surprised this is still using the usock component, though. I'll have to check what version of pmix is being used.

So far as I can tell, all the PR's were taken. However, it is possible that I forgot to file one for the 2.1 series. I'll check that too.

Be advised that I plan to gradually move to supporting only PMIx attach methods as we go forward. Not sure if someone else will pickup the non-PMIx methods. Won't be until after 3.0, though, so nothing imminent - just something to plan for the future.

rhc54 · 2017-06-15T23:00:06Z

Hmmm...well, the code supporting the debuggers has clearly not been updated. Not entirely sure why - probably my fault (most likely forgot to file, I suppose). Anyway, I'll create a patch.

rhc54 · 2017-06-16T01:54:40Z

See the referenced PR for fix - please let us know if it resolves the problem. I tested it with both launch_1 and attach_1.

lee218llnl · 2017-06-16T17:45:03Z

This looks good to me, thanks!

rhc54 · 2017-09-05T15:07:30Z

This apparently isn't fully fixed yet, according to folks at Allinea:

I've tested OpenMPI 2.1.2rc2 and I still hit the same problem reported in #3660 with Allinea MAP and DDT, but intermittently:
[e112475-lin:09338] [[1848,0],0] usock_peer_send_blocking: send() to socket 44 failed: Broken pipe (32)
[e112475-lin:09338] [[1848,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[e112475-lin:09338] [[1848,0],0]-[[1848,0],0] usock_peer_accept: usock_peer_send_connect_ack failed

It is an intermittent issue and it usually appears when I use between 2 and 6 processes. I've used two completely different machines to test OpenMPI 2.1.2, my laptop (Intel i7) with Ubuntu 16 and an ARMv8 machine (ThunderX) with SUSE 12.
OpenMPI 3.0.0rc2 always work fine in my laptop (I did a lot of executions).

This is the problem with 2.1.2, it works using 8 procs, but it crashes using 2:
[xavoro01]:~/tmp/allinea-forge-7.0.6-Ubuntu-14.04-x86_64/examples$ ../bin/map --profile mpirun -n 8 wave_c 5
Waiting for licence
Allinea Forge 7.0.6 - Allinea MAP

Profiling             : mpirun -n 8 wave_c 5
Allinea sampler       : preload (Express Launch)
MPI implementation    : Auto-Detect (Open MPI)
* number of processes : 8
* number of nodes     : 1
* Allinea MPI wrapper : preload (JIT compiled) (Express Launch)

Wave solution with 8 processes
points = 1000000, running for 5 seconds

               min    mean    max
All time (ms): 5001    5009    5014
CPU time (ms): 2090    2179    2409
MPI time (ms): 2605    2831    2921

CPU time (%) : 42    43    48
MPI time (%) : 52    56    58

Iterations   : 1550    1550    1550
points/second: 309.3M (38.7M per process)
wave finished

MAP analysing program...
MAP gathering samples...
MAP generated /home/xavoro01/tmp/allinea-forge-7.0.6-Ubuntu-14.04-x86_64/examples/wave_c_8p_1n_2017-08-24_12-39.map

[xavoro01]:~/tmp/allinea-forge-7.0.6-Ubuntu-14.04-x86_64/examples$ ../bin/map --profile mpirun -n 2 wave_c 5
Waiting for licence
Allinea Forge 7.0.6 - Allinea MAP

[e112475-lin:09338] [[1848,0],0] usock_peer_send_blocking: send() to socket 44 failed: Broken pipe (32)
[e112475-lin:09338] [[1848,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[e112475-lin:09338] [[1848,0],0]-[[1848,0],0] usock_peer_accept: usock_peer_send_connect_ack failed
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------

I don't have time to deal with this one, so I'm assigning it others.

rhc54 · 2017-09-05T15:18:01Z

One further piece of info:

Yes, it only happens under debugger control.
"mpirun -n ......" always work.
All the procs are on the same machine.

jsquyres · 2017-09-06T13:32:35Z

@dirk-schubert-arm (Dirk Schubert, DDT developer) Just to confirm: the changes we introduced to the Open MPI master branch in PR #3709 did not fix the problem, and you guys are seeing the problem @rhc54 cited above with the v2.1.2rc tarball (i.e., #3660 (comment) -- based on an email exchange with Xavier).

Can you guys test the Open MPI development master to see if the problem occurs there? If so, it may be something we neglected to back-port to our v2.1.x release series. You can find nightly snapshot tarballs from master here: https://www.open-mpi.org/nightly/master/

rhc54 · 2017-09-06T13:58:22Z

Just to highlight a line from the original report:

OpenMPI 3.0.0rc2 always work fine

ghost · 2017-09-07T12:51:42Z

Just to confirm: the changes we introduced to the Open MPI master branch in PR #3709 did not fix the problem, and you guys are seeing the problem @rhc54 cited above with the v2.1.2rc tarball (i.e., #3660 (comment) -- based on an email exchange with Xavier).

Correct.

Can you guys test the Open MPI development master to see if the problem occurs there? If so, it may be something we neglected to back-port to our v2.1.x release series. You can find nightly snapshot tarballs from master here: https://www.open-mpi.org/nightly/master/

@jsquyres: How urgent is that? Can this wait until Xavier is back next week? If not please let me know and I will try to have a look myself.

Just to highlight a line from the original report: OpenMPI 3.0.0rc2 always work fine

From Xavier in our ticket:

I tested (a lot of times) OpenMPI 3.0.0 rc2 and it works with MAP and DDT. I tested on my laptop and ubuntu-1404-tegra.

But not sure how much "a lot" is.

jsquyres · 2017-09-07T14:31:31Z

Yeah, as @dirk-schubert-arm and @rhc54 pointed out -- I missed that the original report says that it works fine on v3.0.x. It probably also works on master, but it would be good to verify (because we're going to fork from master again soon for v3.1.x).

I guess we'll need some help tracking this down in the v2.1.x release series.

This can certainly wait until next week.

xavier1arm · 2017-09-13T10:31:03Z

Hello,
I have cloned and built master.
When debugging a mpi application with DDT and MAP, mpirun crashes.

$ mpirun --version
mpirun (Open MPI) 4.0.0a1

Report bugs to http://www.open-mpi.org/community/help/

$ mpirun -n 2 examples/wave_c 2
Wave solution with 2 processes
points = 1000000, running for 2 seconds
           min	mean	max
All time (ms): 2016 2016 2016
CPU time (ms): 1981 1992 2004
MPI time (ms): 12 23 35

CPU time (%) : 98 98 99
MPI time (%) : 1 1 2

Iterations : 650 650 650
points/second: 322.4M (161.2M per process)
wave finished

$ map --profile mpirun -n 2 examples/wave_c 2
Allinea Forge 7.0.6 - Allinea MAP

[e112475-lin:22137] *** Process received signal ***
[e112475-lin:22137] Signal: Segmentation fault (11)
[e112475-lin:22137] Signal code: Address not mapped (1)
[e112475-lin:22137] Failing at address: 0x70949010

MAP: One or more MPI processes are taking a long time to start.
MAP:
MAP: Tue Sep 12 16:25:48 2017 Startup has timed out. Aborting.
MAP:
MAP: Note: some MPI implementations may need to be configured with --enable-debug (or similar) to work with Allinea MAP.
MAP:
MAP: Check Allinea MAP is using the correct MPI implementation then contact Allinea for assistance.
MAP:
MAP: You can disable this timeout by setting the ALLINEA_NO_TIMEOUT environment variable before you launch Allinea Forge.

This is the backtrace from the mpirun core file:

Core was generated by '/software/mpi/openmpi-master_gnu-5.4.0/bin/mpirun -x LD_LIBRARY_PATH=/home/xavo'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007ffff7265428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7ffff3178700 (LWP 20584))]
(gdb) bt
#0  0x00007ffff7265428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff7267187 in __GI_abort () at abort.c:118
#2  0x00007ffff72a77ea in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7ffff73c0e98 "*** Error in '%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007ffff72b037a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, 
    str=0x7ffff73c0fc8 "double free or corruption (!prev)", action=3) at malloc.c:5006
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867
#5  0x00007ffff72b453c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#6  0x00007ffff465139d in ndes ()
   from /software/mpi/openmpi-master_gnu-5.4.0/lib/openmpi/mca_pmix_pmix2x.so
#7  0x00007ffff4618a3c in _notify_client_event ()
   from /software/mpi/openmpi-master_gnu-5.4.0/lib/openmpi/mca_pmix_pmix2x.so
#8  0x00007ffff7890521 in event_process_active_single_queue (activeq=0x6b5040, base=0x6c96d0)
    at event.c:1370
#9  event_process_active (base=<optimized out>) at event.c:1440
#10 opal_libevent2022_event_base_loop (base=0x6c96d0, flags=1) at event.c:1644
#11 0x00007ffff4667b3e in progress_engine ()
   from /software/mpi/openmpi-master_gnu-5.4.0/lib/openmpi/mca_pmix_pmix2x.so
#12 0x00007ffff76016ba in start_thread (arg=0x7ffff3178700) at pthread_create.c:333
#13 0x00007ffff73373dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) quit

I also tested v3.0.x branch and works fine:

$ mpirun --version
mpirun (Open MPI) 3.0.0rc6

Report bugs to http://www.open-mpi.org/community/help/

$ bin/map --profile mpirun -n 2 examples/wave_c 2Waiting for licence
Allinea Forge 7.0.6 - Allinea MAP

Profiling : mpirun -n 2 examples/wave_c 2
Allinea sampler : preload (Express Launch)
MPI implementation : Auto-Detect (Open MPI)

number of processes : 2

number of nodes : 1

Allinea MPI wrapper : preload (JIT compiled) (Express Launch)

Wave solution with 2 processes
points = 1000000, running for 2 seconds
          min	mean	max
All time (ms): 2020 2020 2020
CPU time (ms): 1982 1986 1991
MPI time (ms): 29 33 38

CPU time (%) : 98 98 99
MPI time (%) : 1 1 2

Iterations : 650 650 650
points/second: 321.7M (160.9M per process)
wave finished

MAP analysing program...
MAP gathering samples...
MAP generated /home/xavoro01/tmp/allinea-forge-7.0.6-Ubuntu-14.04-x86_64/wave_c_2p_1n_2017-09-13_11-20.map

The error that I see on Master is different than the error reported for OpenMPI 2.1.x

ghost · 2017-10-20T14:22:34Z

I finally found some time to make a reproducer just with GDB (attached in gdb-only.zip).

The good news is that only version 2.1.x seems to be affected by that issue. Version 3.0.0 and master (used nightly build from 8th of October) are fine (I have not tested 3.1.x).

The gist of the test: The script "run.sh" runs mpirun in a loop (configurable, default is 100) under GDB (driven by the commands defined in "mpirun.gdb") to count the number of successful / bad runs and keeping a GDB log of all the runs.

A successful run is when all processes run to completion.
A bad run is when mpirun crashes with SIGPIPE.

Here is how you can "run.sh":

$ make compile && ./run.sh -n 4 ./hello_c
mpicc -g hello.c -o hello_c

mpirun (Open MPI) 2.1.2

Report bugs to http://www.open-mpi.org/community/help/

** Run for 100 times ... **

Legend: RC = Exit code (0 -> NORMAL, 2 -> SIGPIPE)
        FC = Number of processes which print "all done..."

Run 1 ... RC=0 FC=4 
Run 2 ... RC=0 FC=4 
Run 3 ... RC=0 FC=4 
Run 4 ... RC=0 FC=4 

...
Run 76 ... RC=2 FC=0 (logs-2017-10-20-10-27-35/run-76-mpirun-gdb.log)
Run 77 ... RC=0 FC=4 
Run 78 ... RC=2 FC=0 (logs-2017-10-20-10-27-35/run-78-mpirun-gdb.log)
Run 79 ... RC=0 FC=4 
Run 80 ... RC=2 FC=0 (logs-2017-10-20-10-27-35/run-80-mpirun-gdb.log)
...
Run 98 ... RC=0 FC=4 
Run 99 ... RC=0 FC=4 
Run 100 ... RC=0 FC=4 

** Finished **

Good: 92
 Bad: 8

Afterwards you can inspect the individual GDB logs with:

$ cat logs-2017-10-20-10-27-35/run-80-mpirun-gdb.log
...
Catchpoint 3 (signal SIGPIPE), 0x00007ffff6b3672b in send () from /lib64/libpthread.so.0
#0  0x00007ffff6b3672b in send () from /lib64/libpthread.so.0
#1  0x00007ffff47094f1 in usock_peer_send_blocking () from /shared_scratch/dschubert/openmpi/install//2.1.2/lib/openmpi/mca_oob_usock.so
#2  0x00007ffff47096e6 in usock_peer_send_connect_ack () from /shared_scratch/dschubert/openmpi/install//2.1.2/lib/openmpi/mca_oob_usock.so
#3  0x00007ffff470b35a in mca_oob_usock_peer_accept () from /shared_scratch/dschubert/openmpi/install//2.1.2/lib/openmpi/mca_oob_usock.so
#4  0x00007ffff47081a2 in recv_handler () from /shared_scratch/dschubert/openmpi/install//2.1.2/lib/openmpi/mca_oob_usock.so
#5  0x00007ffff78cff9c in event_process_active_single_queue (activeq=0x65af00, base=0x65a990) at event.c:1370
#6  event_process_active (base=<optimized out>) at event.c:1440
#7  opal_libevent2022_event_base_loop (base=0x65a990, flags=1) at event.c:1644
#8  0x0000000000404d57 in orterun (argc=4, argv=0x7fffffffe238) at orterun.c:1083
#9  0x00000000004036a0 in main (argc=4, argv=0x7fffffffe238) at main.c:13
...

NB: I am not sure if the mpirun issue is the problem or just the symptom of a problem in the individual MPI processes. If I run the reproducer enough, the individual MPI can processes crash.

FWIW:

If I run inside a Slurm allocation the problem does not seem to occur in the configurations I tried (2/4 processes) at all. All other runs were made outside an resource allocator.
Running with 2 processes triggered 99 bad runs out of 100.
Running with 8 processes triggered 1 bad runs out of 100.

I hope this helps. Let me know if we can assist further.

Regards,
Dirk

gpaulsen · 2017-12-05T20:00:10Z

Can we close this issue as Fixed in v3.x series?

lee218llnl · 2017-12-05T20:19:36Z

Are there any plans to fix this in a 2.x version?

jsquyres · 2017-12-12T16:17:23Z

@lee218llnl We talked about this today on the weekly teleconf. @hjelmn and @hppritcha are going to investigate and scope the issue. Just for our information: is it possible to upgrade to 3.0.x?

lee218llnl · 2017-12-12T21:43:34Z

While it would be nice to see a fix in a 2.x version, I suppose a 3.0.x version will suffice. Think it will make it in time for x=1? If so, when is that going to be released?

gpaulsen · 2017-12-12T21:51:30Z

v3.0.1 is anticipated by the end of the month. But testing above shows that this is working with v3.0.0 and v3.1.x and master.

ghost · 2017-12-14T11:07:10Z

Just for our information: is it possible to upgrade to 3.0.x?

Agreeing with @lee218llnl that a fix would be nice in 2.x, but 3.0.x will do.

My only worry is users of OpenMPI 2.1.x who are not able to upgrade (immediately) to 3.0.x for whatever reason - in the end they are not able to debug/profile their code or (more general) make use any tool relying on the MPIR interface.

gpaulsen · 2017-12-14T14:42:26Z

My understanding is that it's not the entire MPIR interface that's broken, but just the part of MPIR that LaunchMON uses to attach to MPI processes. other MPIR tools that launch via MPIR seem to be working.
The backporting of a solution from v3.x would be challenging as there have been many orte changes, and we haven't been able to root cause the issue/solution.

It may be easier to look at what might have regressed since 2.0.x and fix it that way, if anyone has time to look into this.

ghost · 2017-12-14T15:34:43Z

My understanding is that it's not the entire MPIR interface that's broken, but just the part of MPIR that LaunchMON uses to attach to MPI processes. other MPIR tools that launch via MPIR seem to be working.

It's not just LaunchMON, because our tools - Arm DDT/MAP (formerly known as Allinea DDT/MAP) - have the same problem (based on my understanding) with Open MPI 2.1.x.

The backporting of a solution from v3.x would be challenging as there have been many orte changes, and we haven't been able to root cause the issue/solution.

Okay. Was the GDB standalone reproducer that I provided on Oct, 20th of any help?

rhc54 · 2017-12-14T15:44:08Z

I'm always intrigued by the different ways people approach a break in code. My approach would be to simply add some print statements to find where the current code is broken, and fix it. IMO, that is much easier than playing all these "how did the code change" games.

Hopefully, folks are now beginning to understand better the community decision to move away from MPIR. I'm disturbed that six months into the transition of RTE responsibilities, we still can't get someone to address problems such as this one...but that's a community problem that shouldn't get reflected into the broader user base.

FWIW: I recently committed a PMIx-based RTE component into OMPI that eliminates the need for a runtime (except, of course, where no managed environment exists and the PMIx reference server isn't being used). This will significantly reduce the RTE support requirement for most installations. @npe9 and I need to do some testing/debugging, but it should be ready soon.

I'll take a crack at fixing this today as my LANL friends tell me that they would appreciate support for the 2.1 series, and I hate seeing an entire OMPI series that doesn't support a debugger. I suspect it isn't a big issue.

However, it truly is the last time I can/will do it.

@lee218llnl If you would like, I'm happy to help integrate LaunchMON with the PMIx debugger support.

lee218llnl · 2017-12-14T16:04:37Z

I don't work directly with LaunchMON, I'm more of a client of it with STAT. @dongahn may want to chime in on this.

dongahn · 2017-12-14T16:24:56Z

@rhc54: this would be an interesting effort. The key would be to maintain both MPIR and PMIX ports. Probably doable but would need a redesign effort, for which I don't have time to do at this point. Maybe I can get some help from Matt Legendre's team. (don't have his githib id.)..

lee218llnl · 2017-12-14T16:27:21Z

adding @mplegendre

rhc54 · 2017-12-14T19:19:15Z

@lee218llnl I just ran test.launch_1 using the current head of the v2.x branch and it worked fine, so perhaps this is already fixed. I don't see a 2.1.3 release candidate out there, but can you clone the OMPI v2.x branch and give it a try?

lee218llnl · 2017-12-14T21:00:04Z

@rhc54 I just tested the v2.x branch and it does work for LaunchMON/STAT. FWIW, I looked back at this issue thread and I had previously confirmed your patch. I believe the allinea folks said they were the ones still seeing problems, per your comment on Sept 5.

rhc54 · 2017-12-14T21:05:10Z

Ah, ok - I'll run their test and see what happens.

rhc54 · 2017-12-16T04:04:54Z

Okay, I have confirmed the following:

things work just fine with v2.0.x
things break with anything in v2.x
the breakage has NOTHING to do with MPIR

The problem is that the daemon is attempting to send to the rank=0 app proc before that proc is up and running. This causes a SIGPIPE which gdb is trapping, which subsequently causes the daemon to abruptly exit and results in the proc segfaulting.

I'm tracking down why the daemon feels a need to communicate as that is the root cause. Meantime, I am closing this issue as it has nothing to do with the original reported problem, and will open another issue to track this specific problem.

rhc54 mentioned this issue Jun 16, 2017

Update the debugger support so it properly launches under a debugger, and supports attach to a running job #3709

Merged

rhc54 assigned jjhursey and ggouaillardet Sep 5, 2017

rhc54 added bug Target: v2.x labels Sep 5, 2017

rhc54 modified the milestones: v2.1.2, v2.1.3 Sep 5, 2017

rhc54 assigned jsquyres, jjhursey and ggouaillardet and unassigned jjhursey and ggouaillardet Sep 5, 2017

rhc54 closed this as completed Dec 16, 2017

rhc54 mentioned this issue Dec 16, 2017

Incorrect communication to client proc #4629

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error launching/attaching LaunchMON debugger with OpenMPI 2.1.1 #3660

error launching/attaching LaunchMON debugger with OpenMPI 2.1.1 #3660

lee218llnl commented Jun 6, 2017

lee218llnl commented Jun 6, 2017

rhc54 commented Jun 7, 2017

rhc54 commented Jun 15, 2017

rhc54 commented Jun 16, 2017

lee218llnl commented Jun 16, 2017

rhc54 commented Sep 5, 2017

rhc54 commented Sep 5, 2017

jsquyres commented Sep 6, 2017

rhc54 commented Sep 6, 2017

ghost commented Sep 7, 2017

jsquyres commented Sep 7, 2017

xavier1arm commented Sep 13, 2017 •

edited by jsquyres

Loading

ghost commented Oct 20, 2017 •

edited by ghost

Loading

gpaulsen commented Dec 5, 2017

lee218llnl commented Dec 5, 2017

jsquyres commented Dec 12, 2017

lee218llnl commented Dec 12, 2017

gpaulsen commented Dec 12, 2017

ghost commented Dec 14, 2017

gpaulsen commented Dec 14, 2017

ghost commented Dec 14, 2017

rhc54 commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

dongahn commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

rhc54 commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

rhc54 commented Dec 14, 2017

rhc54 commented Dec 16, 2017

error launching/attaching LaunchMON debugger with OpenMPI 2.1.1 #3660

error launching/attaching LaunchMON debugger with OpenMPI 2.1.1 #3660

Comments

lee218llnl commented Jun 6, 2017

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

lee218llnl commented Jun 6, 2017

rhc54 commented Jun 7, 2017

rhc54 commented Jun 15, 2017

rhc54 commented Jun 16, 2017

lee218llnl commented Jun 16, 2017

rhc54 commented Sep 5, 2017

rhc54 commented Sep 5, 2017

jsquyres commented Sep 6, 2017

rhc54 commented Sep 6, 2017

ghost commented Sep 7, 2017

jsquyres commented Sep 7, 2017

xavier1arm commented Sep 13, 2017 • edited by jsquyres Loading

ghost commented Oct 20, 2017 • edited by ghost Loading

gpaulsen commented Dec 5, 2017

lee218llnl commented Dec 5, 2017

jsquyres commented Dec 12, 2017

lee218llnl commented Dec 12, 2017

gpaulsen commented Dec 12, 2017

ghost commented Dec 14, 2017

gpaulsen commented Dec 14, 2017

ghost commented Dec 14, 2017

rhc54 commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

dongahn commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

rhc54 commented Dec 14, 2017

lee218llnl commented Dec 14, 2017

rhc54 commented Dec 14, 2017

rhc54 commented Dec 16, 2017

xavier1arm commented Sep 13, 2017 •

edited by jsquyres

Loading

ghost commented Oct 20, 2017 •

edited by ghost

Loading