-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI MTL PSM2 hangs under flux-core 0.11.1 #2173
Comments
BTW, this would be a good test case to try our upcoming debugging tools support. |
I just tried PSM2 on multiple nodes, and it worked. I believe this is related to this OpenMPI bug: open-mpi/ompi#1559 |
@SteVwonder: could this be related to the problems @damora is having at all? |
Sigh Good to know. Glad that you discovered this before others do. |
(Separate discussion with @garlick led me here...) I see what I believe to be a similar issue running on DOE CTS-1 with OpenMPI 4.x applications, notably 4.1.1. Summary:
This is a blocker as I'm attempting to use Also, in case helpful, some wacky neutering of PSM2 via disabling the cm PML: Are there any other known remedies or workarounds? Should I expand on my environment and reproducing in this issue, in another issue/discussion, or is that not needed to help debug? |
Thanks @briadam ! Yeah, let's track the details in this issue. Here are a couple you sent in email that I think are helpful. When a 1n4p task fails, all four tasks are still running, with three of four having entered the PMI1 barrier, and a fourth stuck in PSM2:
The OMPI_ environment for the failing case is
The ompi_mtl_psm2_add_procs function is defined here If you have any other insights or data please append to this issue. Thanks! |
When you run the 1n4p case under slurm, and it works, what options do you need to use? Would it be possible to get an environment dump of |
Nothing special seems needed to run. Here are some abbreviated notes from a clean salloc/direct launch, where the mpirun variant default to running the tests on one node and both that and srun launch ran fine.
Here's the environment after I first start the flux daemons, where the only difference is the addition of the PMI library path:
From within that Flux pty session, I did this to see what's set when running a job:
|
Thanks - could you run |
Good call. Nothing special additional with srun:
I also did same with raw mpirun in case useful, with apologies for the verbose output. I replaced some names and numbers in this with <...>:
|
The team at LLNL kindly installed openmpi-4.1.2 on our opal CTS-1 system for us, and it seems to work out of the box with flux. However it is using the
The only environment vars we have set for OMPI are the ones set by flux (captured by running
I think we tried that in your environment and it failed, correct? Back to the drawing board - maybe to build 4.1.1 |
Sorry, tried which in our environment? I haven't tried wiping out the system module-set OMPI_* variables in our environment, but if that's what you mean, can give it a try. Depending on that, I can also see if it works for me with openmpi-4.1.2 if I build it (though I probably don't have a prayer of building it exactly the way the sys admins did...). In case relevant to your experiments, the only interesting parts of the system-installed openmpi-4.1.1 configuration (from "Configure command line" in ompi_info) seem to be
I'm happy to provide any other configuration info that's relevant. |
Sorry, that wasn't clear. Right, that's what I meant. Just wondering if openib just works for you like it appears to do for us. Though we still probably want to know why psm2 works for slurm and not flux. I forgot that |
Didn't change any behavior, one rank still hangs when run with -N 1. (I unset any OMPI_* variables both before flux start and again before flux mini run as they were re-created in the sub-shell.) FWIW for your debugging, the target application is using openmpi-4.0.5, but I'm happy to test whatever version/options are helpful. Whatever difference in our environments seems tricky to chase down. |
Thanks. To clarify, 4.0.5 just now, 4.1.1 before? By chance do you have 4.1.2 available? That is the only version I have that works on this system right now. (Self-built ompi thus far is not going great...) |
Now I caused confusion... All my experiments reported in this issue are with 4.1.1. My team member who asked me to demo this is ultimately aiming to use Dakota + Flux with an application built with intel-20.x and openmpi-4.0.5. I realize they may need to rebuild the application depending on what we find is the cause of this hang issue and whether it's openmpi version related, or an issue with the runtime environment, hardware, etc. |
Progress! It occurred to me that with my previous experiment clearing the OMPI_* env vars, the runtime could still fall-back to the default best transports and maybe pick cm/psm2. If I explicitly select openib (no idea if this is a valid way to do this as I'm in way over my head here):
yielding
the -N 1 test works. It also works if I clear my environment and only set these two variables:
So this seems to further support that the issue relates to PSM2. |
Excellent. Well we should root out the psm2 issue, but it's good to know something works with a high speed interconnect! |
Not making a lot of progress here, although just to add some data points, I got 4.1.1 working on our system with psm2 built, and it seems to work out of the box with flux. I note that when I run with I found I was able to force only psm2 to be used (over shared memory, tcp etc) with the following:
but still no joy recreating the problem. Just for the record, I was able to see what modules are actually being used by setting the following debug variables:
|
Just to verify, I ran my -N1 test case with that increased verbosity with both the default openmpi-4.1.1 environment as well as in a clean one where only those verbosity controls were set. I believe the attached logs confirm that psm2 is selected, as we've thought. |
A multi-task mpi-hello-world program run under a single-node flux instance with the openmpi 3.0.1 installed in /usr/tce hangs until the psm2 initialization times out.
Backtrace from pid 17986{2,4}:
Backtrace from pid 179863:
Output after timeout:
It works if you set
OMPI_MCA_mtl=psm
orOMPI_MCA_mtl=^psm
before running. Maybe we should add anopenmpi
mpi "personality" just like we have for Spectrum.The text was updated successfully, but these errors were encountered: