-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux PMI will not init spectrum MPI #1382
Comments
Not sure if this will help but running with |
I think what’s going on is that spectrum is using an older OpenMPI
than the one that we got flux support into. Not 100% sure, but I get
the impression it just doesn’t even try to talk to us right now. =(
…On 23 Mar 2018, at 9:40, Mark Grondona wrote:
Not sure if this will help but running with `-o trace-pmi-server`
might give more information about what the flux PMI server is seeing
(if anything).
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#1382 (comment)
|
Just a side note: but the state transition needs to be beefed up on abnormal transitions like this. |
@trws: if you have specific questions, I can talk to Spectrum MPI guys at IBM. I guess the main question is: Does Spectrum MPI uses PMI (or PMIX)? And do they have recipe to make Spectrum MPI talk to another bootstrapped like flux that implements normal PMI? |
It uses PMIX, which should work with us if they’re using a recent
enough version IIRC. We’ll have to figure out exactly what to ask,
but it may be as simple as asking them to compile in the flux support
module for their MPI’s internal PMI implementation.
…On 23 Mar 2018, at 10:06, Dong H. Ahn wrote:
@trws: if you have specific questions, I can talk to Spectrum MPI guys
at IBM. I guess the main question is:
Does Spectrum MPI uses PMI (or PMIX)? And do they have recipe to make
Spectrum MPI talk to another bootstrapped like flux that implements
normal PMI?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1382 (comment)
|
Also in theory, we should be able to use OpenMPI. We can use a version of OMPI for which we tested flux's support. I know we installed those on EA systems but I'm not sure if we have on Sierra systems. Let me ask. |
Talked to Adam Moody; will start a separate email discussion thread. |
What does work is Flux can be launched wtih PMIX's backwards compatibility support for PMI-1. If the MPI wants only PMIX, then Flux can't launch it. I think what we added to OMPI was support for the PMI-1 wire protocol which we offer and which can be used wtihout having to relink MPI against our PMI library. If this is a variant of OMPI, maybe that could be backported? ompi_info output might be helpful, e.g.
|
See #923 for more detailed info on OMPI's "flux support". I made pretty detailed commit comments in OMPI when this was added, in case anyone needs to dig into this. Uh, looks like ralph squashed my whole PR down to one commit in the merge: open-mpi/ompi@215d629 and concatenated all my coments so it seems a bit like a run-on. |
Maybe there is an runtime mca option to turn on flux support... If not, building OpenMPI ourselves would be the path to least resistance. I don't believe we build spectrum mpi on our own. |
FYI -- From Roy Mussleman:
|
Not sure if this is pertinent, but we did run into this problem with openmpi built on TOSS 3: |
Here's a new fun detail, setting FLUX_JOB_SIZE and FLUX_JOB_NNODES kills the spectrum mpi mpirun... >< |
Looks like if FLUX_JOB_ID is set, it tries to do something that causes it to segfault. |
heh: >< - good tomoticon! |
@trws: What happens if you select flux component for pmix type?
I know ultimately you want to use |
I'll try that, it would be a less nasty solution. |
@trws: I know some of us will be busy with SC18 submissions and spring break next two weeks, I think i will be good to summarize the issues we need to unblock and "good to haves" for splash effort. I think I may be able to fit emitting trimmed |
That and the combined cancellation/kill are the only things that would
be really good for splash right now. The rest is more documentation of
issues to address “at some point” and things that will be blockers
for ATS.
…On 25 Mar 2018, at 10:47, Dong H. Ahn wrote:
@trws: I know some of us will be busy with SC18 submissions and spring
break next two weeks, I think i will be good to summarize the issues
we need to unblock and "good to haves" for splash effort.
I think I may be able to fit emitting trimmed `R` for affinity and
optimizing rdesc fetching rdesc using @grondo's experimental wreck.
Anything else?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1382 (comment)
|
Some notes on how ompi under flux is supposed to work, and a quick test on my desktop to ensure we haven't regressed anything. First ensure that a hello world mpi program can be compiled with ompi and run under flux (yup):
Exercise PMI client side debug (prove that ompi flux support opened flux PMI library)
Exercise PMI server side debug (prove that flux PMI library connected to PMI_FD provided by wrexed):
The two ompi flux modules are: mca_pmix_flux.so
The module dlopens the flux pmi library using the above environment variable, then translates ompi generic pmi-ish calls to PMI-1 API calls supplied by our PMI library. lib/flux/libpmi.so The flux PMI library tries the following, in descending priority:
Regardless of what the flux PMI library chooses to do here, FLUX_PMI_DEBUG should tell you that the flux PMI library was called, and if it is dlopening another PMI library, what it passed to dlopen. mca_schizo_flux.so I can't make heads or tails of the I seem to recall it needs to be there to ensure the other module runs, but no idea how. |
@trws: I said I would find the runes for getting an ompi-linked MPI program to emit some debug.
|
A new bit of info here, it's also true the other way around. The spectrum MPI mpirun, orterun and jsrun will not bootstrap flux either. I'm not sure if this is an issue with something that changed in a newer PMIX or something particular to IBM's implementation of PMIX, but it may be worth looking into at some point, or at least a good reason to write something that can launch flux under LSF that doesn't require mpich... |
When we do look at this, a good thing to try would be to set FLUX_PMI_DEBUG=1 in the environment and try to run a small job with the native launch tool(s). This will cause trace information from our PMI client (used by the broker) to go to stderr. (See example in earlier comment) So can we launch flux as a batch job directly under LSF or are no native options for launching flux on that machine? |
I'm not sure what blaunch would do on manta/ray, but on sierra jsrun is the official LSF-sanctioned launcher, so there is currently no native option for launching flux there. I would say that I think we could launch it with a config file with the native tools, but no PMI wireup can be expected at the moment. |
Just requested butte/sierra access and will try to debug issues with Flux launching spectrum MPI apps directly. |
I'm on, and flux-core master builds fine ( I did hit these make check failures in t2000-wreck.t:
In t0001-basic.t:
Heading out, just wanted to document where I was in this investigation. |
Spectrum mpirun and jsrun, which is IBM's launcher to go with LSF on these things. I managed to get flux to successfully launch a multi-node MPI job with spectrum just now, but only by turning pami off. It looks like we'll have to enlist IBM to actually get a fix for this:
|
Great! We need to involve Roy Mussleman to get this to be fixed by IBM ASAP. Do you want to come up to 4th floor for quick to strategize with Roy? I will give him a quick heads-up as well. |
I would like to, but I’m in Santa Clara at the moment. Will you be
around tomorrow?
…On 10 Apr 2018, at 14:16, Dong H. Ahn wrote:
> Spectrum mpirun and jsrun, which is IBM's launcher to go with LSF on
these things. I managed to get flux to successfully launch a
multi-node MPI job with spectrum just now, but only by turning pami
off. It looks like we'll have to enlist IBM to actually get a fix for
this:
Great!
We need to involve Roy Mussleman to get this to be fixed by IBM ASAP.
Do you want to come up to 4th floor for quick to strategize with Roy?
I will give him a quick heads-up as well.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1382 (comment)
|
The potentially deeper worry here is that flux doesn't seem to work for OpenMPI builds that rely on these kinds of paths in the environment. I'm not sure if there's anything we can, or should, do about that from our end though. @rhc54 is there anything we can tie into that would make the flux-end handling for OpenMPI environment (prefix/libdir/mpi_root) requirements a little more robust? |
I will be. I just talked to Mussleman. He said the fastest route to get IBM's response would be to describe the problem in an email and send it to MPI/PAMI developers directly. And he has a couple of names. If there is a way to work around this in time, they are the ones who can provide the info or who can forward our inquiry. @trws: can you send an email to Mussleman and copy me on? His email is [email protected]. |
@garlick Sorry the squash caused confusion. There has been some argument in the OMPI world about having a lot of "in-between" commits. Schizo just checks for markers of a particular environment (flux, in your case) and sets things up to ensure the right components get selected (in your case, the flux PMI one). @trws I'm not sure there is a great solution for the problem. OMPI by itself seems to be okay in that regard, but Spectrum does some nasty things with the environment - the timing of the "schizo" framework's development didn't dovetail into their initial efforts, and so mpirun is now a wrapper that fiddles with things before calling the real mpirun. This is what causes the fragility so far we we've heard from folks. Jim's flux work should be just fine - I confess we don't track/test it, but nothing has changed in those areas of the code. I can try to advise as you run into things, if that would help. |
Thanks for that clarification @rhc54! Poking around in |
@garlick: if you have specific questions, feel free to involve me and Mussleman. We have contact info for some Spectrum MPI developers. |
Thanks @rhc54, it looks like I had a bad build of openmpi that was making me think we needed a more general fix. Should we warn people to build with anything to make sure they get the right prefix by default, or is that all handled in schizo? |
Assuming IBM doesn't interfere, you can configure OMPI with |
Ok. It may be worth putting a reference to that in our PMI docs, not
that it’s required to work with us necessarily, but we don’t have a
good way to work around those paths being missing.
…On 11 Apr 2018, at 12:04, Ralph Castain wrote:
Assuming IBM doesn't interfere, you can configure OMPI with
```--enable-orterun-prefix-by-default``` and that should ensure things
are always set.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1382 (comment)
|
OK 2 things came out from the concall with IBM folks.
|
Related issue #1555. |
It's interesting that this PAMI layer (library?) is, I guess, independently bootstrapping itself through PMIX, as opposed to being implemented as a plugin to OpenMPI where it would have access to OpenMPI's internal PMIish interfaces that work with multiple resource managers including Flux. Probably we're not going to be able change that though. I wonder if we can offer PMIX support in Flux by simply exporting a libpmix.so that implements the API, or if we'll have to implement PMIX's wire protocol, security, etc? I guess it depends on how libpami uses PMIX? |
my current thinking is:
We probably don't want to rely on the PMIX server running on the node (which was used to launch flux) in launching MPI jobs within the flux instance, though. |
There are certainly PAMI bits implemented as OpenMPI plugins, and none of them are using
Not sure if that is the extent of the pami code though. Maybe another library is lurking somewhere else? |
Sent an email directly to Josh Hursey @IBM and cc'ed you. |
Josh can help you better than I given his direct knowledge of the PAMI code. My understanding is that PAMI pulls all the PMIx data out of the local JSM daemon that hosts the PMIx server library, but I don't know what interfaces they use to do it. They might dlopen it, which is why it wouldn't show in a dependency listing. One clarification just to ensure we are on the same page: there is no separate PMIx server running on the node. JSM's daemon acts as the PMIx server on each node (i.e., it calls PMIx server_init). However, I do agree that if you launch the flux instance, you would certainly want flux to handle the MPI wireup. If there are concerns blocking your direct use of PMIx, we'd love to understand them and see if we can't resolve them. Ideally, we'd like to see flux hosting a PMIx server as there are increasingly more things being provided thru the PMIx library (e.g., comm cost matrix for scheduling, fabric topology, and storage directives). |
This was one hangup that made integrating the "reference server" code difficult for us: openpmix/openpmix#102 If the wire protocol is now nailed down and documented, we could maybe implement our own server. |
Dropping the "in progress" label since I am not actively working on this. Is there anything we should add to 0.10.0 to make this easier? We do have these lua scripts that provide some environment settings needed by various MPI's, but they are all loaded unconditionally. Would it make sense to provide a way to conditionally set them, e.g. so you could launch |
Actually, that sounds extremely useful. I hadn't thought about it for a long time, but that's something I remember wishing for any number of times when working on the MPI end of the equation.
…________________________________
From: Jim Garlick <[email protected]>
Sent: Wednesday, July 11, 2018 7:07:16 AM
To: flux-framework/flux-core
Cc: Scogland, Tom; Mention
Subject: Re: [flux-framework/flux-core] flux PMI will not init spectrum MPI (#1382)
Dropping the "in progress" label since I am not actively working on this.
Is there anything we should add to 0.10.0 to make this easier? We do have these lua scripts that provide some environment settings needed by various MPI's, but they are all loaded unconditionally. Would it make sense to provide a way to conditionally set them, e.g. so you could launch --with-mpi=spectrum or similar?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1382 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAoStUx6UeHqHyll-ZwEq5kjLbuGHjXlks5uFgaUgaJpZM4S5B-p>.
|
Add mpi "personality" for IBM spectrum MPI, enabled by user with wreckrun -o mpi-spectrum. Note that this plugin assumes MPI is installed in /opt/ibm/spectrum_mpi. It also disables PAMI, the spectrum enhanced collectives due to their dependency on the RM providing a PMIx server. See flux-framework#1382 for further details. It also sets the soft stack limit to a value the MPI runtime seems to require. See flux-framework#1382 for more details. Fixes flux-framework#1584
Add mpi "personality" for IBM spectrum MPI, enabled by user with wreckrun -o mpi-spectrum. Note that this plugin assumes MPI is installed in /opt/ibm/spectrum_mpi. It also disables PAMI, the spectrum enhanced collectives due to their dependency on the RM providing a PMIx server. See flux-framework#1382 for further details. It also sets the soft stack limit to a value the MPI runtime seems to require. See flux-framework#1382 for more details. Fixes flux-framework#1584
Add mpi "personality" for IBM spectrum MPI, enabled by user with wreckrun -o mpi-spectrum. Note that this plugin assumes MPI is installed in /opt/ibm/spectrum_mpi. It also disables PAMI, the spectrum enhanced collectives due to their dependency on the RM providing a PMIx server. See flux-framework#1382 for further details. It also sets the soft stack limit to a value the MPI runtime seems to require. See flux-framework#1382 for more details. Fixes flux-framework#1584
See below:
The text was updated successfully, but these errors were encountered: