-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MPIPreferences
to automatically initialise MPI?
#627
Comments
I like the idea. However, how would this work/what would happen if multiple dependencies have different threading requests - would it be that the first one to load MPI.jl just "wins" and determines the setting for everyone? |
Wouldn't that be a problem also without auto-initialisation? The idea is that you could choose what to do with the I'm just floating the idea of this feature since I found it in mpi4py (and in many occasions I'd have preferred MPI to autoinitialise, instead of killing the session with the first |
Absolutely. But right now it is kind of accepted that it is somewhat undefined behavior, depending on which packages (and dependencies) you are loading and in which order. However, once we introduce something like a preferences-based approach, users would (rightfully, imho) assume that everything now happens deterministically - or at least that there are no silent errors anymore. I was just wondering what would happen in case multiple conflicting Again, if there were a central and well-documented mechanism for auto- |
A couple of considerations:
|
|
Could add a Requires hook to your |
Okay, I've thought a bit more about this: how about the following:
Thoughts? |
Oh, not sure that we can load packages until after MPIPreferences? |
Wow, after trying to debug an issue with this, I have to say I am much more in favor of this. The fact that it just kills the job without giving a stacktrace is painful. |
One example where this will be a problem: on our Slurm cluster, if I get an interactive session via |
Using the default JLL binaries gives even worse errors:
|
I don't understand this: what's the difference between a Julia process started inside an interactive shell session launched by |
I don't understand why PMIx issue should be dependent on MPI.jl. On one of our test cluster, I run into these PMIx issues (with MPI.jl v0.19.2). The workaround was to As a general thought after playing with MPI.jl v0.20 now on various machines; it would be nice to not over-engineer the setup machinery as things get usually much more complicated on clusters and supercomputers than they are on local machines (no internet on compute nodes, missing libs/env on login nodes, etc...). So in general, the more basic and robust workflows should be preferred. Ideally, with workflows that would need as few as possible "interactive setup" steps. |
What "interactive setup" steps you're referring to? I don't think there is anything strictly interactive? |
Apologies, I realise this being slightly OT. Previously, an ENV var was needed and then only a |
I must have a different concept of "interactive". Nothing of what you described is interactive. Until v0.19, to use system MPI you had to do
with v0.20 you have to do
which doesn't look much different to me and it requires the same number of non-interactive commands to be run. |
I guess it sets some environment variables that changes the behaviour of MPI? Honestly I have no idea. I don't see it in
That does help, at least once:
If you already have the LocalPrefrences.toml file (or you have it in your global env), it shouldn't be required at all. |
I had similar issue on that machine having The solution was to always start julia in a separate srun, instead of starting 1
instead of
Maybe it's the same for you? |
While playing with mpi4py earlier this week I realised it automatically initialises MPI at loading time. This can be controlled with the
mpi4py.rc
object, including the threading setup. I think we can do something similar with an option inMPIPreferences.jl
. How does that sound?The text was updated successfully, but these errors were encountered: