-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add OpenMPI flux orte module #923
Comments
I know nothing about orte/schizo, but it appears what they are attempting to support is the difference between launching ompi app run in a Flux session with a flux parallel launcher (e.g. Sorry if that was no help. Happy to discuss further in the office. |
That's helpful. Here's a summary of how I read the code and how I think it needs to change:
Does that make sense? |
What if we're running flux under flux-wreckrun, isn't
|
FLUX_JOB_ID is only set if running under flux-wreckrun. |
Oh I just realized what you are saying. Should we be purging that from the environment in a new flux instance? |
Yeah, I was thinking of the same thing, but I'm having trouble remembering the strategy for a child instance to connect to its parent, know if it is a child of Flux or something else, etc... I think it is probably ok, but it does get a little tricky to think about. |
The broker both opens the enclosing instance and caches the URI of the enclosing instance in a broker attribute, so I think that case is covered? |
I'd say clear that variable for now from flux-start/flux-broker then |
This was upstreamed in open-mpi/ompi#2597 @trws if you have a chance to poke at this please use flux-core master and ompi master from the repo referenced above.
|
Will do, thanks @garlick!
On December 17, 2016 at 8:39:38 AM PST, Jim Garlick <[email protected]> wrote:
This was upstreamed in open-mpi/ompi#2597<open-mpi/ompi#2597>
Minor flux-core changes went in with #926<#926> and #921<#921>
@trws<https://github.com/trws> if you have a chance to poke at this please use flux-core master and ompi master from the repo referenced above.
-
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#923 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAoSte2HEiXSYTnDfCdrgq2hchbapXNCks5rJBBCgaJpZM4LL0Jh>.
|
I've yet to try the static variants, but using pkgconfig and flux pmi
with shared libraries, everything works as expected both on and across
nodes. This is some really nice work @garlick.
…On 17 Dec 2016, at 8:39, Jim Garlick wrote:
This was upstreamed in open-mpi/ompi#2597
Minor flux-core changes went in with #926 and #921
@trws if you have a chance to poke at this please use flux-core master
and ompi master from the repo referenced above.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#923 (comment)
|
Static also works as expected. I toyed with getting the system MPIs to load the component, but that doesn't fly, looks like we'll need new builds of OpenMPI to make use of this. Regardless, it works great. |
Thanks! BTW it should work without pkg-config or any other configure options. That's only necessary in conjunction with the static build. By default it dlopens our libpmi.so, following the FLUX_PMI_LIBRARY_PATH environment variable (set by Flux) at runtime. |
Let's call this done. |
OpenMPI requires modules for each launcher it supports. We've added one for Flux's PMI in rhc54/ompi#1. Also in that PR it was noted that ORTE (client side) also needs one to support direct launch, as opposed to launch via mpirun.
A skeletal implementation (based on the SLURM one I think) was added in rhc54/ompi@12bef7f
@grondo I was hoping maybe you could have a look at this and we could discuss it this morning before I attempt to work it over for Flux. There are a few different use cases evident there that I'm not sure we need in Flux, but I may be missing something, for example "running in a job step" versus not (but still running in an allocation).
The text was updated successfully, but these errors were encountered: