test all env query functions for compliance with MPI-3.1 thread safety compliance #493

hppritcha · 2015-03-24T22:15:57Z

The MPI-3.1 standard states that

MPI_Initialized
MPI_Finalized
MPI_Query_thread
MPI_Is_thread_main
MPI_Get_version
MPI_Get_library_version

are callable from threads without restrictions irrespective of the actual level of thread support provided.
It appears the the sun/threads/mt_env.c covers most, but not all of these. This test could either be enhanced or a similar test could be added to the ibm tests in opmi-tests.

The text was updated successfully, but these errors were encountered:

bosilca · 2015-03-24T22:38:42Z

No, the mt_env.c test bails out if the MPI_THREAD_MULTIPLE mode is not supported.

jsquyres · 2015-07-25T12:07:30Z

GET_VERSION abd GET_LIBRARY_VERSION are multi-threaded safe.

For INITIALIZED, FINALIZED, QUERY_THREAD, and IS_THREAD MAIN, I'm thinking that all we need to make the following globals volatile -- right? (these are all declared in ompi/runtime/mpiruntime.h and defined in ompi/runtime/ompi_mpi_init.c)

OMPI_DECLSPEC extern bool ompi_mpi_initialized;
OMPI_DECLSPEC extern bool ompi_mpi_finalized;
OMPI_DECLSPEC extern int ompi_mpi_thread_provided;
OMPI_DECLSPEC extern struct opal_thread_t *ompi_mpi_main_thread;

bosilca · 2015-07-25T12:15:16Z

I don't think they should be volatile. However, I do think they should be accessed via atomic operations. Moreover, we need to redesign our init/finalize process in order to remove all possible race conditions between the init and finalized states (such as once finalize is started, init does not return success).

jsquyres · 2015-07-25T12:20:33Z

Forgive my ignorance: why access via atomics vs. volatile?

As for the behavior of MPI_INITIALIZED, MPI-3.1 (and 3.0) is quite clear:

This routine may be used to determine whether MPI_INIT has been called. MPI_INITIALIZED returns true if the calling process has called MPI_INIT. Whether MPI_FINALIZE has been called does not affect the behavior of MPI_INITIALIZED.

So I think our current implementation of that aspect is correct.

bosilca · 2015-07-25T13:02:07Z

Thanks for pointing out the text about MPI_FINALIZE, I was not aware that it has no impact on the behavior of MPI_INITIALIZED.

Using volatile will prevent the compiler from caching the variables, so this might look like a good idea. However, such behavior is only maintained in the context of a single function. As long as MPI_INITIALIZED is not a macro used in a busy-loop, there will be no benefit in using volatile.

Imagine a pretty standard usage of MPI_INITIALIZED and MPI_INIT

MPI_INITIALIZED(&flag);
if( ! flag ) {
    MPI_INIT(NULL, NULL);
}

If we call this code from multiple threads at once, the current Open MPI will not behave correctly as there might be a case where both threads calls ompi_mpi_init and start initializing the internals. Thus, we need to protect the different stages of the initialization/finalization process.

jsquyres · 2015-07-25T23:21:28Z

With the MPI API specified as it is, I don't think there's a way to guarantee that the code snipit you proposed will always be safe when executed by multiple threads. I.e., there's no atomic "test for INITIALIZED, and if not INITIALIZED, then INIT."

There was much argument about this in the Forum (i.e., make a new INIT function that does stuff like this), but after much round-and-round debate, the minimum distance compromise was made to just guarantee that INITIALIZED (and the others) are always thread safe -- which at least solved a few problems. That was (literally) the smallest step that could be made.

Making a new INIT was punted further down the road...

Are you proposing to lock the internals if ompi_mpi_init(), and if a 2nd thread calls MPI_INIT (e.g., per your example), return a non-fatal error? That would be above and beyond what the MPI standard calls for, but a) I think it's sane, and b) it might not violate the standard, since erroneous behavior is simply undefined...?

bosilca · 2015-07-26T00:02:12Z

I am not sure about returning a non-fatal MPI error, I don't think the standard supports such type of exception. What I was proposing is to protect the internals of the MPI initialization and force all subsequent MPI_Init in a wait mode until the first MPI_Init complete. Upon completion of the first MPI_Init, all MPI_Init return with the same error code (this goes together with MPI_INITIALIZED returning the flag to true only after a successful complete initialization). This approach seems to be valid from the MPI standard perspective, while proposing a logical and safe approach for our users.

That being said, I remember you were involved in the new MPI standard design regarding the thread-safety of some of the init/fini functions. So please enlighten me on how the Forum regards the thread safety of the init/fini tuple. So far, every time I get more information about this I just get more confused about the lack of coherent logic regarding this topic (that's nothing new, just unsettling). It reads to me like "I cast on you thy few thread-safe functions that cannot be used together in a thread-coherent way". Or are we assuming a sequential initialization of any MPI-based application (despite the Amdahl's law)?

jsquyres · 2015-10-05T18:04:56Z

@bosilca You're right -- there is no definitive solution for this issue in an MPI-3.1-compliant way (see my comment above). What is really needed is an MPI_INIT_IF_NOT_ALREADY_INIT() kind of function. ...or something entirely different (e.g., ideas like I presented in Bordeaux at EuroMPI 2015).

@hppritcha and I looked at this today. We agree that all we can do is narrow the race condition window if multiple threads call MPI_INITIALIZED simultaneously. And we should do that (e.g., strengthen the checking in MPI_INIT and MPI_FINALIZE to check for a variable that is set immediately upon starting, and possibly protect that with a lock). But this ticket is about ensuring that it is safe to call all these functions from multiple threads safely.

Right?

bosilca · 2015-10-05T18:27:48Z

I am not really sure that we agree on the problem. Let's take a simple example where two threads are calling the sequence of code from above:

MPI_INITIALIZED(&flag);
if( ! flag ) {
    MPI_INIT(NULL, NULL);
}

Now, let's assume that we fix all the issues about the use of our internal variables holding the current state of the library.

What should MPI_INITIALIZED return and starting from when? We update the internals upon entry into the MPI_INIT, so technically MPI_INITIALIZED would return true once another thread entered in MPI_INIT, but way before the MPI stack is in a usable form. Now, if instead we move the update of the internal variables at the end of MPI_INIT, the MPI_INITIALIZED will return false, and the second thread will also call MPI_INIT. So now we will have to block this thread until the first one completed the MPI_INIT ... Thus we need a condition.

jsquyres · 2015-10-05T18:49:29Z

Mmmm... true, if we move the "allow MPI_INITIALIZED to return TRUE" to up earlier in MPI_INIT, that's probably dangerous for a different reason -- e.g. two threads calling this:

MPI_Initialized(&flag);
if (!flag) {
    MPI_Init(NULL, NULL);
}
MPI_Send(...);

If two threads concurrently execute this code, two kinds of race conditions can occur:

Both threads call MPI_INIT
Only one thread calls MPI_INIT, but the other thread calls MPI_SEND long before the MPI library is fully initialized

Are you advocating that MPI_INITIALIZED check to see if MPI_INIT has started, and if so, block until MPI_INIT completes, and then return flag=true? If so, that narrows the race condition window, but it doesn't eliminate it. I'm not sure there is a way to eliminate the race condition in MPI-3.1.

bosilca · 2015-10-05T21:11:26Z

I was advocating the MPI_INITIALIZED returns false until the MPI_INIT complete, to prevent exactly what you described.

But your proposal might be simpler to implement. Block all initialization functions (including accessors to check the state) as soon as one initialization is in progress.

hppritcha · 2015-10-06T15:53:57Z

see related discussion in open-mpi/ompi-release#636

jsquyres · 2015-10-08T04:19:21Z

@bosilca @hppritcha and I discussed this on the phone today. We decided:

that the MPI-3.1 spec does not fix this problem
reducing the size of the race condition window isn't worth it (because you still can't guarantee that someone's code will never trip the race condition)
writing an MPI extension probably also isn't worth it (at least, not at this time)
but what we can do, and fairly easily, is (once v2.x - Add in-finalize indicator, fca fall back to prev barrier if in-finalize ompi-release#636 is done):
1. if MPI is initialized more than once, update the show_help message to say "sorry, you called MPI_INIT[_THREAD] more than once. I'm now going to abort. But if you don't want me to abort, set MCA param ABC to X and I'll silently allow you to invoke MPI_INIT[_THREAD] as many times as you want without an error."
2. Hence, this is basically a (simple) extension that is OMPI specific, but gives users a trivial way to find out how to use it. I.e., if they try to do the simple code pattern that may end up invoking MPI_INIT multiple times, they'll get a help message saying how to enable the OMPI extension (via MCA param) to allow that behavior.

Then we have to wait for MPI-4 to fix this issue for real.

bosilca · 2015-10-08T15:02:18Z

Minor correction to the i) case. The end of the sentence should read "as many times as you want without an error before the first call to MPI_FINALIZE."

jsquyres · 2015-11-02T22:13:56Z

To close the loop on this issue: after I made the comment above (#493 (comment)), I decided not to support the "let it be ok if MPI is initialized more than once" functionality. See #1007 for details.

@hppritcha Still wants to do some testing, though -- so this issue is still open.

Fix PMI and PMI2 builds

hppritcha · 2015-12-02T18:46:11Z

I'm removing blocker label, and changing to future. This issue was really just serving as a placeholder for adding some thread safety tests.

hppritcha · 2017-03-22T17:34:45Z

think we've added enough tests over time that this issue can be closed.

hppritcha added the MPI-3.1 label Mar 24, 2015

hppritcha added this to the Open MPI 1.9.0 milestone Mar 24, 2015

hppritcha self-assigned this Mar 24, 2015

jsquyres added the Severity: blocker label Oct 5, 2015

hppritcha mentioned this issue Oct 6, 2015

v2.x - Add in-finalize indicator, fca fall back to prev barrier if in-finalize open-mpi/ompi-release#636

Merged

hppritcha mentioned this issue Oct 10, 2015

use opal atomics to modify ompi initialize/finalize related variables #1006

Closed

jsquyres added a commit to jsquyres/ompi that referenced this issue Nov 10, 2015

Merge pull request open-mpi#493 from nysal/topic/v2.x/pmi_fixes

186d98f

Fix PMI and PMI2 builds

hppritcha removed the Severity: blocker label Dec 2, 2015

hppritcha modified the milestones: Future, v2.0.0 Dec 2, 2015

hppritcha closed this as completed Mar 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test all env query functions for compliance with MPI-3.1 thread safety compliance #493

test all env query functions for compliance with MPI-3.1 thread safety compliance #493

hppritcha commented Mar 24, 2015

bosilca commented Mar 24, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 25, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 25, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 26, 2015

jsquyres commented Oct 5, 2015

bosilca commented Oct 5, 2015

jsquyres commented Oct 5, 2015

bosilca commented Oct 5, 2015

hppritcha commented Oct 6, 2015

jsquyres commented Oct 8, 2015

bosilca commented Oct 8, 2015

jsquyres commented Nov 2, 2015

hppritcha commented Dec 2, 2015

hppritcha commented Mar 22, 2017

test all env query functions for compliance with MPI-3.1 thread safety compliance #493

test all env query functions for compliance with MPI-3.1 thread safety compliance #493

Comments

hppritcha commented Mar 24, 2015

bosilca commented Mar 24, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 25, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 25, 2015

jsquyres commented Jul 25, 2015

bosilca commented Jul 26, 2015

jsquyres commented Oct 5, 2015

bosilca commented Oct 5, 2015

jsquyres commented Oct 5, 2015

bosilca commented Oct 5, 2015

hppritcha commented Oct 6, 2015

jsquyres commented Oct 8, 2015

bosilca commented Oct 8, 2015

jsquyres commented Nov 2, 2015

hppritcha commented Dec 2, 2015

hppritcha commented Mar 22, 2017