Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Pending communication' not defined in definition of MPI_Comm_disconnect #710

Closed
RolfRabenseifner opened this issue May 4, 2023 · 40 comments
Labels
chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee chap-dynamic Process Initialization, Creation, and Management Chapter Committee chap-p2p Point to Point Communication Chapter Committee errata Errata items for the previous MPI Standard had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote
Milestone

Comments

@RolfRabenseifner
Copy link

RolfRabenseifner commented May 4, 2023

Problem

The text uses the term "pending communication". Nowhere is it defined. Here, "decoupled MPI activities" may be the correct wording, see new MPI-4.1 Section 2.9 Progress.

§11.10.4

MPI_COMM_DISCONNECT has the same action as MPI_COMM_FREE, except that it waits for pending communication to finish internally and enables the guarantee about the behavior of disconnected processes.

would be

MPI_COMM_DISCONNECT has the same action as MPI_COMM_FREE, except that it waits for decoupled MPI activities to finish internally and enables the guarantee about the behavior of disconnected processes.

Another problem is, that the the phrase "has the same action ... except ..." is wrong, because the

Furthermore, the text in MPI-4.0

MPI_COMM_DISCONNECT may be called only if all communication is complete and matched, so that buffered data can be delivered to its destination. This requirement is the same as for MPI_FINALIZE.

seems not to be clear enough, as the discussion at the MPI forum meeting May 3, 2023 has shown.

Especially in the definition of MPI_Finalize

Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI calls needed to complete its involvement in MPI communications associated with the World Model. It must locally complete all MPI operations that it initiated and must execute matching calls needed to complete MPI communications initiated by other processes. For example, if the process executed a nonblocking send, it must eventually call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived function; if the process is the target of a send, then it must post the matching receive; if it is part of a group executing a collective operation, then it must have completed its participation in the operation.

it is clear that the calls to MPI_WAIT, MPI_TEST, or MPI_REQUEST_FREE (to free request handles of blocking operations or to complete or free persistent requests) nmust be called before MPI_Finalize.

Open questions have been

  • whether the eventually only means that they must be called between the starting of the request and MPI_Finalize or whether it has another or additional meaning?
  • and whether in the case of MPI_COMM_DISCONNECT, what is allowed after MPI_COMM_DISCONNECT returned in the case of inactive persistent request handles?
  • whether in the future, this list may be extended by adding MPI_REQUEST_GET_STATUS (and its derived functions),
    and which special rules then apply for the still existing request handle?

Consensus was that MPI_Finalize nor the sections about pt-to-pt and collective persistent operations require that an inactive persistent request handle is freed with MPI_REQUEST_FREE before calling MPI_Finalize.

See also discussions in issues #543 and #676

Proposal

The proposal in the pull request tries onyl to resolve the inconsistencies, but as this is an errata, not any future extensions for MPI_REQUEST_GET_STATUS.
In addition, it makes the same clarification for MPI_Finalize.

Changes to the Text

See above. This should be part of the "cleanup" for MPI 4.1.

Alternatively, the term "pending communication" could be properly in the terms section.
(Risky, because its meaning may be different in different contexts)

Impact on Implementations

None (unless there it renders some implementations non-compliant due to confusion stemming from the usage of the term "pending")
Exception: If an MPI implementation requires that related persistent requests are freed (instead of only beeing inactive) before calling MPI_COMM_DISCONNECT (i.e. requiring more than for MPI_FINALIZE), then this bug should be fixed.

Impact on Users

Clear use of terms helps users better understand the text without having to reason about what a "pending operation" is.

References and Pull Requests

PR 823

@RolfRabenseifner RolfRabenseifner added errata Errata items for the previous MPI Standard chap-dynamic Process Initialization, Creation, and Management Chapter Committee chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee mpi-4.1 For inclusion in the MPI 4.1 standard labels May 4, 2023
@github-project-automation github-project-automation bot moved this to To Do in MPI 4.1 May 4, 2023
@RolfRabenseifner
Copy link
Author

@hppritcha @raffenet @Wee-Free-Scot @schulzm @naughtont3 / @GuillaumeMercier / @devreal @bkmgit
I assigned you all as member of the process-start-chapter-committee, or as groups-chapter-chair, or as active yesterday in the discussion-chat.

I tried to summarize the problems here on the abstract of this issue.
I tried to produce a minimal consistent solution that I would propose.
I explicitly refused to add anything about MPI_REQUEST_GET_STATUS,
because the corner case positiv test with MPI_REQUEST_GET_STATUS plus no call to MPI_WAIT/TEST/COMM_FREE
is not yet handled in the rules for MPI_FINALIZE and therefore not part of this errata ticket.

In my opinion, it is entirely new territory to promote dangling "request handles" that are never released.

I hope you like the content of the proposal. I feel the rules must be clear for the users - that should be an important goal.

Issue: #710
PR: https://github.com/mpi-forum/mpi-standard/pull/823

@RolfRabenseifner
Copy link
Author

The changes are marked with PR823 and also in yellow color on pages 475, 525, 1025, 1054 in
mpi41-report_Issue710_PR823.pdf

@bosilca
Copy link
Member

bosilca commented May 4, 2023

If we only require persistent communications to be inactive, we are forcing a high-quality (aka memory-leak free) MPI implementation to track all persistent requests to be able to release them (on behalf of the user) during finalization.

Moreover, having different requirements for nonblocking (which are required to be freed) and persistent requests (which are required to be inactive or freed) is not helping with the consistency. From an implementor perspective it makes more sense to require that all persistent requests are inactive and freed, or simply all requests to be freed (and then the part about decoupled MPI activities will take care of the rest).

@RolfRabenseifner
Copy link
Author

For MPI finalize, no MPI library required persistent handles to be freed. We also do not reqire datatype handles or ... to be freed before finalize. Would it be helpful to write an AtoU?

@bosilca
Copy link
Member

bosilca commented May 4, 2023

Open MPI does. Our objects refcount each other (datatype to requests, requests to communicators, ...) such that at the end we know if there are dangling objects. In debug mode we even print a list of dangling objects.

@raffenet
Copy link

raffenet commented May 4, 2023

MPICH will also print a list of objects still allocated objects during MPI_FINALIZE in a debug build. Though I am not sure the user is required to do anything with that information.

@Wee-Free-Scot
Copy link

A possible approach to dealing with dangling/stale handles/objects is to permit MPI_Request_free (and MPI_Type_free, ...) to be called after MPI_Finalize. I'm not saying it's a good approach, but it points out that "valgrind cleanliness" is not just the responsibility of the MPI library, but is a cooperation between MPI and the user.

Open MPI keeps a list internally of active requests, so it can tell the difference between "there are no active requests right now" and "there is at least one active request right now". It can partition the list of active requests by the session they are derived from, so it can tell the difference between "there are no active requests for session X (which might be the World model)" and "there is at least one active request for session X (which might be the World model)".

During MPI_Finalize/MPI_Session_finalize, even one active request in the World model/particular session means an erroneous program (go mad, do whatever you want).

During MPI_Comm_disconnect for a communicator derived from the World/Session model, even one active request in the World model/particular session means an erroneous program (go mad, do whatever you want).

The problem highlighted is exclusively about the existence of inactive requests at the point of destroying a communicator (i.e. at the point of calling finalize or disconnect).

Axiom: The internal request structure must keep a pointer to the communicator (not strictly true, but lets pretend that is an axiom and argue about it separately).

Question: why is that pointer contributing to the ref-count on the communicator when the request is inactive?

If MPI_Start (and friends) incremented the ref-count on the communicator and MPI_Wait (and friends) decremented the ref-count on the communicator, then the existence of inactive requests would not contribute to the ref-count and would not prevent MPI from destroying the communicator when told to do so, e.g. by MPI_Comm_disconnect.

The obvious objection is that destroying a request possibly/sometimes involves destroying the objects to which it holds a pointer (i.e. when that pointer is the last ref-counted reference to that other object).

Question: why does destroying a request cascade to potentially destroying the communicator for which it has a pointer?

The answer is MPI_Comm_free -- when MPI_Comm_free only marks the communicator for later destruction (because the ref-count is nonzero), it expects that some later action will actually destroy the communicator (when the ref-count becomes zero).

If inactive requests have a reference to the communicator (as well as a pointer to it), then the existence of even one inactive request will prevent MPI_Comm_free from destroying the communicator. In this case, destroying inactive requests to remove the reference is necessary before the communicator can be destroyed -- which means an inactive request is a pending communication in the sense of the MPI Standard wording for MPI_Comm_disconnect,

If inactive requests have a pointer to the communicator, but not a reference (i.e. it is known by design that the communicator ref-count (if it still exists) is not counting this pointer at this time), then their existence does not prevent MPI_Comm__free from destroying the communicator; only active requests have a reference and can prevent destruction during MPI_Comm_free. In this case, destroying inactive requests is not necessary before the communicator can be destroyed (because they do not contribute to the ref-count) -- which means an inactive request is NOT a pending communication in the sense of the MPI Standard wording for MPI_Comm_disconnect,

Consequences, part 1:

If an inactive request does NOT have a reference to the communicator, then the communicator could be destroyed by MPI while it is inactive, and attempting to start it again after MPI_Comm_free is then going to be a potential problem. MPI_Start might attempt to dereference a dangling communicator pointer and bad things (UB) will likely happen. OTOH, MPI_Comm_free/MPI_Comm_disconnect can safely ignore the existence of inactive requests (MPI does not even need to know whether any inactive requests exist) and destroy the communicator if/when the ref-count (which counts active references for pending communication only) is zero.

Thus, if inactive requests are NOT pending communications then

  • we permit inactive requests to exist during MPI_Comm_disconnect and MPI_Finalize/MPI_Session_finalize (and those procedures will actually destroy the communicator before they return, as they are supposed to)
  • we lose the ability to start inactive requests after MPI_Comm_free, which is currently allowed by omission in the MPI Standard, works in some MPI libraries, and is used by some customers.

Consequences, part 2:

If an inactive request DOES have a reference to the communicator, then the communicator cannot be destroyed while an inactive request exists; attempting to start it again after MPI_Comm_free is then going to be fine (probably). MPI_Start would attempt to dereference the communicator pointer and good things will likely happen. OTOH, MPI_Comm_free/MPI_Comm_disconnect must not ignore the existence of inactive requests and must not destroy the communicator while they still exist, so they must contribute to the ref-count that triggers/prevents the communicator destruction.

Thus, if inactive requests ARE pending communications then

  • we permit inactive requests to be used normally after MPI_Comm_free (but not after MPI_Comm_disconnect, which should hang until these requests are destroyed)
  • we lose the ability to destroy the communicator whilst inactive requests exist, which is currently allowed by omission in the MPI Standard, works in some MPI libraries, and is used by some customers.

Conclusion

We have a choice: which group of customers do we disappoint? Those who want to use (start, wait, etc) inactive requests after freeing the communicator handle or those who want to destroy the communicator itself without freeing inactive requests beforehand?

Which is all a really long way of getting to the suggestion of permitting inactive requests to be freed after the communicator is destroyed. If "the request is inactive" means "has a pointer, but not a reference, to a communicator" then MPI_Request_free must not follow that pointer, it just destroys the request itself, treating the communicator pointer as if it were dangling (even if it is not). That can happen before the communicator is destroyed (the pointer would work, if used, but no-one cares because there must be at least one other reference to the communicator so MPI_Request_free has no work to do) or afterwards (the pointer is dangling but no-one cares because the communicator is already gone so MPI_Request_free has no work to do).

To achieve valgrind cleanliness, the user must free all their requests and destroy all their communicators, of course, but not necessarily in that order.

We now have an option that doesn't disappoint either group of customers, but involves re-work in some MPI implementations.

@raffenet
Copy link

raffenet commented May 5, 2023

So my view of the options are:

  1. Require the user to free inactive requests before disconnect/finalize
  2. Allow the user to free inactive requests after disconnect/finalize (or not and get a warning from debug builds)
  3. Allow MPI to free inactive requests during disconnect/finalize, thus allowing the user to free them, or not, before disconnect/finalize without warning. This requires some additional state tracking by the MPI library, but only for persistent requests allocations off the critical path. This option seemed palatable when discussing within the MPICH team, especially in the context of the sessions model.

@Wee-Free-Scot
Copy link

  1. Require the user to free inactive requests before disconnect/finalize

Well ... "before disconnect/finalize can return".

With thread multiple support, the user can free inactive requests on a different thread to release the thread blocked on disconnect/finalize. Without thread multiple support, then yes exactly what Ken wrote because my wording and Ken's are equivalent in that situation.

@RolfRabenseifner
Copy link
Author

RolfRabenseifner commented May 6, 2023

  1. Require the user to free inactive requests before disconnect/finalize

Finalize and persistent request exist since MPI-1 and freeing before finalize was never required.
Invalidating all existing applications that use persistent requests and not freeing them before finalize would be a no-go for me.

  1. Allow the user to free inactive requests after disconnect/finalize (or not and get a warning from debug builds)

After the discussion, I would really prefer Option 3:

  1. Allow MPI to free inactive requests during disconnect/finalize, thus allowing the user to free them, or not, before disconnect/finalize without warning. This requires some additional state tracking by the MPI library, but only for persistent requests allocations off the critical path. This option seemed palatable when discussing within the MPICH team, especially in the context of the sessions model.

This is for me the best option. It is very clear and symmetric:

  • We did never require the freeing of inactive persistent requests but we allow it and this feature is kept.
  • It is symmetric for finalize and disconnect: in both cases, i.e., after finalize and disconnect, the use of an inactive (=not freed) persistent request handle is erroneous.
  • And there is no difference after disconnect in the use of inactive persistent requests: both, MPI_Request_free and MPI_Start are erroneous, i.e., forbidden.

If there is some consensus on this 3rd option, I would change the pull request accordingly.

@RolfRabenseifner
Copy link
Author

In addition to my comment above on handling of inactive persistent request, here, I want to comment on nonblocking requests:

The following example is correct since MPI-2.0, although it might be seen as a corner case.

int snd_buf, rcv_buf;
MPI_Init();
MPI_Buffer_attach(...);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_dup(MPI_COMM_WORLD, &my_comm);
// same ranks on my_comm and MPI_COMM_WORLD
if (my_rank==0) {
  snd_buf = 5;
  MPI_Bsend(snd_buf, 1, MPI_INT, dest=1, tag=0, comm=my_comm);
} else if (my_rank==1) {
  MPI_Request rq;
  MPI_Irecv(rcv_buf, 1, MPI_INT, source=0, tag=0, comm=my_comm, request=&rq);
  MPI_Request_free(&rq); // rq is set to MPI_REQUEST_NULL, but operation need not to be complete:
                         // the rcv_buf must not be read or reused bofore the operation is complete 
}
MPI_Comm_disconnect(&my_comm); // collective over whole my_comm
          // all pending communication is complete
          // i.e.,  - the Irecv has filled in the rcv_buf
          //        - the Bsend must have send out its buffered data
if (my_rank==0) {
  // the application can now reuse the attached buffer for other purposes
} else if (my_rank==1) {
  // the rcv_buf is now filled with the value 5 and can be reused for other purposes
}
MPI_Finalize();

Agreed?

An even more extreme corner case example based on the advice to implementors for MPI_Finalze:

"The MPI implementation must ensure that a process has completed any involvement in MPI communication before MPI_FINALIZE returns."

int snd_buf, rcv_buf;
MPI_Init();
MPI_Buffer_attach(...);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (my_rank==1) {
  snd_buf = 5;
  MPI_Bsend(snd_buf, 1, MPI_INT, dest=0, tag=0, comm=MPI_COMM_WORLD);
} else if (my_rank==0) {
  MPI_Request rq;
  MPI_Irecv(rcv_buf, 1, MPI_INT, source=1, tag=0, comm=MPI_COMM_WORLD, request=&rq);
  MPI_Request_free(&rq); // rq is set to MPI_REQUEST_NULL, but operation need not to be complete:
                         // the rcv_buf must not be read or reused bofore the operation is complete 
}
MPI_Finalize(); 
          // all pending communication is complete
          // i.e.,  - the Irecv has filled in the rcv_buf
          //        - the Bsend must have send out its buffered data
if (my_rank==1) { // if this process still exists after MPI_Finalize
  // the application can now reuse the attached buffer for other purposes
} else if (my_rank==0) { // rank 0 in MPI_COMM_WORLD must still exist after MPI_Finalize
  // the rcv_buf is now filled with the value 5 and can be reused for other purposes
}

Agreed?

@Wee-Free-Scot
Copy link

In the example codes from @RolfRabenseifner that use buffered mode send operations, there is no attempt to detach the buffer before the call to disconnect/finalize -- do we wish "communication is complete" to mean "there are no internal references to the communicator used in the buffered mode send operations that were using space in the attached buffer"? That is, disconnect/finalize must do all decoupled activities, which involves draining the attached buffer, which is all/most of the work of detaching the buffer, so the user can refuse to detach the buffer but still expect a valgrind clean state at the end. When can the user free the heap-allocated buffer space they attached to MPI?

Aside: can I do a buffered mode send from within an delete attribute callback that is called during disconnect/finalize -- is the buffer still guaranteed to be attached at the point in execution of disconnect/finalize when those callbacks are called by MPI?

After the new proposal that permits a buffered mode buffer to be attached directly to the communicator, do we need to state that disconnect/finalize both do the work of detaching the buffer, if one is still attached? Of course, finalizing a session will also do the work of detaching the buffer attached to the session, if any, as well as doing the work of detaching all the buffers attached to all the communicators derived from that session, thereby releasing all the internal references to those communicators from the decoupled activities related to all the buffered mode send operations that used any of the communicators within that session. Do we need to state that too? Is it adequately covered by "decoupled activities"?

@Wee-Free-Scot
Copy link

Option 3 in the summary from @raffenet gives me the heebie-jeebies. It's either the status quo or a dangerous security flaw.

MPI_Recv_init(..., comm, &req);
MPI_Comm_disconnect(&comm);
assert(req == MPI_REQUEST_NULL); // this assert should fail, even if the request has been destroyed

In the above code, lets assume we have option (3) with an implementation that chooses to avail itself of this new(?) freedom to "free inactive requests during disconnect/finalize".

  1. MPI destroys the request object/struct/whatever that is internal to MPI. The allocation and freeing of internal request objects/structs/whatevers is (and has always been) entirely the purview of MPI. So, that bit seems fine to me -- indeed I would argue that is NOT a new freedom -- it is known during disconnect/finalize that usage of requests that reference any communicator that is being destroyed will be A Bad Idea (tm) and destroying the request is unlikely to make things worse.

  2. After disconnect/finalize, EITHER the user has a stale request handle (not MPI_REQUEST_NULL but also not valid) OR somehow MPI set the user's variable (if it still exists) to MPI_REQUEST_NULL. Neither of these is a great situation. The best thing the user can do is let their request handle variable go out of scope without touching it.

    • How does MPI know that scribbling over a user-space memory location (that was valid some time ago, but which might not be now) is not A Bad Thing (tm)? Short answer: it doesn't, so I'm strongly against this idea. The assert in the code above MUST fail; the user's variable must not be scribbled over by MPI because it might have gone out of scope by the time MPI gets around to scribbling on it.
    • How does MPI protect itself against all the other possible user actions? Simples -- we can state that usage of request handles after disconnect/finalize is erroneous and punt all problems into a giant SEP field.

I think MPI libraries are permitted this level of freedom already -- this is the status quo (except for an explicit "subsequent usage of request handles is erroneous" statement). An MPI library that chooses NOT to destroy inactive requests during disconnect/finalize, and thereby leak MPI-allocated memory, is just not high quality.

Aside: this makes MPI_Comm_free very different, semantically, from disconnect/finalize. Is it still true that MPI_Comm_free is "kind of like MPI_Comm_idisconnect with a request that gets completed by MPI sometime after the notional communicator ref-count hits zero and before finalize returns"? No. Calling MPI_Comm_disconnect gives MPI permission to begin destroying inactive requests, so calling MPI_Comm_idisconnect would too. However, the users expect to be able to use request handles that were inactive at the point of calling MPI_Comm_free after that call returns, so we now have a race.

@RolfRabenseifner
Copy link
Author

@Wee-Free-Scot

That is, disconnect/finalize must do all decoupled activities, which involves draining the attached buffer, which is all/most of the work of detaching the buffer, so the user can refuse to detach the buffer but still expect a valgrind clean state at the end. When can the user free the heap-allocated buffer space they attached to MPI?

With the new issue #586, it is the work of MPI_BUFFER_FLUSH (instead of detach+attach).

My comment

// the application can now reuse the attached buffer for other purposes

meant that the attached global buffer can be reused for Bsends on other communicators. My comment did not want to say that the buffer is detached.

This means, here I do not want to discuss some automatic detach of the global buffer.
Furthermore, I would not expect that such automatic detach was ever intended, because also MPI_Finalize does not free any object.

I would really add somthing like

After calling MPI_COMM_DISCONNECT, any subsequent usage of an inactive persistent request handle is erroneous if this request was created with an operation on comm.

(It is always up to the programmer to fulfill such rules.)

Okay?

@RolfRabenseifner
Copy link
Author

Here the protocol from the Virtual MPI forum meeting, June 21, 2023:
mpi41-report_Issue710_PR823_update9_for-2023-6-21-meeting-protocol.pdf

@RolfRabenseifner
Copy link
Author

The result from the sessions WG meeting on June 19, 2023 and the virtual MPI forum meeting on June 21, 2023 is now added to this PR 823.

Commented PDF file: mpi41-report_Issue710_PR823_update10.pdf

The changes by this PR 823 are marked with PR823 and also in yellow color on pages 100, 148, 477, 483-484, 527-528, 1027, 1056, 1059, 1091, 1094-1095, 1097-1099 with additional

  • option C on pages 100 and 484 (light yellow / green color)
    • without option C: same behavior for inactive persistent request for
      MPI_COMM_DISCONNECT, MPI_FINALIZE and MPI_SESSION_FINALIZE.
    • with option C: when a session is finalized, disallowing associated inactive persistent handles
  • options A or B on page 148: solving the problem about datatype handles, see page 484
    • with option A: Advice to implementors: relation beteen MPI_TYPE_COMMIT nd session initialization and finalization.
    • with option B: Same as advice to users: hint that MPI libraries may have problems with the optimization of datatypes in case of session initialization and finalization.

It will be the input for the errata reading and voting at the MPI forum meeting on July 10-13, 2023.
Before voting, two decisions are needed (e.g., by straw votes):

  • between option A and option B,
  • between without option C and with option C.

@schulzm
Copy link

schulzm commented Jun 24, 2023

As for Option C: while I understand the motivation to make a change here to fix something that we don't like in the world model, I would strongly argue against this: this would create a difference between the default "world" session and all other sessions. This will not only cause issues for tools, as discussed in the last call, it will also cause problems for any library that has that behavior - those could not be used safely in a session model, but this is hard to determine for the end user (and the library itself would not be aware of the fact that it is used from sessions). Also, the property of not removing persistent requests at the end would not be an unexpected behavior for a library that reuses persistent requests across library calls.

@RolfRabenseifner
Copy link
Author

The result from the sessions WG meeting on June 19, 2023 and the virtual MPI forum meeting on June 21, 2023 is now added to this PR 823, plus a few additional English corrections (June 22).

Commented PDF file: mpi41-report_Issue710_PR823_update11.pdf

The changes by this PR 823 are marked with PR823 and also in yellow color on pages 100, 148, 477, 483-484, 527-528, 1027, 1056, 1059, 1091, 1094-1095, 1097-1099 with additional

  • option C on pages 100 and 484 (light yellow / green color)
    • without option C: same behavior for inactive persistent request for
      MPI_COMM_DISCONNECT, MPI_FINALIZE and MPI_SESSION_FINALIZE.
    • with option C: when a session is finalized, disallowing associated inactive persistent handles
  • options A or B on page 148: solving the problem about datatype handles, see page 484
    • with option A: Advice to implementors: relation beteen MPI_TYPE_COMMIT nd session initialization and finalization.
    • with option B: Same as advice to users: hint that MPI libraries may have problems with the optimization of datatypes in case of session initialization and finalization.

It will be the input for the errata reading and voting at the MPI forum meeting on July 10-13, 2023.
Before voting, two decisions are needed (e.g., by straw votes):

  • between option A and option B,
  • between without option C and with option C.

@Wee-Free-Scot
Copy link

@schulzm makes three points against option C; with extensive editorializing, these are:

  1. it creates a difference between the "world session" and all other sessions -- this relies on the existence of lazy users who do not (wish to) clean up handles from a library
  2. the difference between world and other sessions causes problems for tools -- this is predicated on the existence of tools that intercept MPI_INIT & MPI_INIT_THREAD & MPI_FINALIZE and choose to implement them using PMPI_SESSION_INIT & PMPI_SESSION_FINALIZE, rather than the more traditional route of using PMPI_INIT & PMPI_INIT_THREAD & PMPI_FINALIZE
  3. the difference between world and other sessions causes problems for libraries -- this hinges on the existence of libraries that do not initialize either the world model or a session themselves, but use MPI objects passed to them from outside the library without being told whether those objects are derived from the world model or the sessions model

Rebuttals for these three objections might use the following arguments:

  1. this is the intent of the option and will be seen by those in favor of option C as A Good Thing (tm)
  2. this is a strange implementation choice for the tool; there is no need to incur this implementation difficultly unless the tool writer wishes to incur it
  3. this violates the intent of the sessions model, which invites each library to initialize its own session and not to share MPI objects externally (neither "in" nor "out"); also handles that are created outwith the library are not the responsibility of the library (in this case, the library is a distraction) and libraries that create handles the tie-up resources but fail to release them are not well written and are less usable than ones that clean up after themselves

How far should we go out of our way to support bad user code?

@schulzm
Copy link

schulzm commented Jun 27, 2023

To 2) - this is not bad code or a bad design decision - tools do subset COMM_WORLD and mask that upwards to the application and one of the use cases for sessions was to support tools with extra processes, e.g., in overlay networks. Hence, this will not be an usual case for tools - or rather we are preventing legitimate tool use cases

To 3) - even a session-ized library will have to (for quite some time) rely on existing other libraries and hence falls into this category, hence, again not bad code: we would basically be saying, if you want to switch to sessions you cannot rely on any existing MPI library/code anymore, you have to write everything from scratch, unless you know exactly what is in this library. This will not help us to promote the use of sessions - nobody will want to give up using the existing MPI ecosystem.

@RolfRabenseifner
Copy link
Author

Protocol from the formal reading, July 10, 2023:
mpi41-report_Issue710_PR823_update12-from-reading.pdf

@RolfRabenseifner
Copy link
Author

Final version for no-no-vote: mpi41-report_Issue710_PR823_update14-for-voting.pdf

@wesbland
Copy link
Member

This passed a no-no vote.

Yes No Abstain
30 0 4

@wesbland wesbland added had reading Completed the formal proposal reading and removed scheduled errata vote Errata vote is scheduled for the next meeting scheduled reading Reading is scheduled for the next meeting labels Jul 12, 2023
@wesbland
Copy link
Member

This passed a errata vote.

Yes No Abstain
27 2 5

@wesbland wesbland added the passed final vote Passed the final formal vote label Jul 12, 2023
@wesbland wesbland moved this from In Progress to Passed 2nd Vote in MPI 4.1 Jul 12, 2023
@wgropp wgropp closed this as completed Jul 13, 2023
@github-project-automation github-project-automation bot moved this from Passed 2nd Vote to Done in MPI 4.1 Jul 13, 2023
@hzhou
Copy link

hzhou commented Jul 14, 2023

I am very surprised that one of the conclusions of this issue is -- users do not need free a persistent request that they explicitly created. This is just odd with no precedence. I am a bit upset that this gets sneaked in as errata. It is one thing that the previous text neglected to mention the requirement, but it is another thing to issue an errata to clarify that the neglect was intentional.

I guess I am ahead of myself -- Is my interpretation of the errata correct? And what is the logic that allows explicit MPI object creation without corresponding explicit free/destroy?

@RolfRabenseifner

PS: sorry to comment on an already closed issue.

@Wee-Free-Scot
Copy link

Wee-Free-Scot commented Jul 14, 2023

I created a new issue to capture the ongoing discussion of this: #732

@hzhou
Copy link

hzhou commented Jul 14, 2023

Thanks @Wee-Free-Scot for creating new tracking issue.

Separately, @raffenet and I think it is questionable for the current errata procedures. Errata by common sense should be a correction of non-controversial mistakes. The errata in question has significant negative votes -- 2 No and 5 Abstain. Thus, in my opinion, should not qualify as an errata. We think passing errata should have a much higher bar than other votes. For example, I think it is reasonable to expect "No-No" for errata votes.

@hzhou
Copy link

hzhou commented Jul 14, 2023

@wgropp What do you think #710 (comment) ?

hzhou pushed a commit to hzhou/mpich that referenced this issue Jul 14, 2023
hzhou pushed a commit to hzhou/mpich that referenced this issue Jul 15, 2023
Maintain a per-comm hash for persistent requests so we can check and
free them at MPI_Comm_disconnect or MPI_Finalize if user forget to free
them.

We need apply per-vci critical sections to prevent data race.

Reference: mpi-forum/mpi-issues#710.
hzhou pushed a commit to hzhou/mpich that referenced this issue Aug 7, 2023
Maintain a per-comm hash for persistent requests so we can check and
free them at MPI_Comm_disconnect or MPI_Finalize if user forget to free
them.

We need apply per-vci critical sections to prevent data race.

Reference: mpi-forum/mpi-issues#710.
hzhou pushed a commit to hzhou/mpich that referenced this issue Aug 15, 2023
Maintain a per-comm hash for persistent requests so we can check and
free them at MPI_Comm_disconnect or MPI_Finalize if user forget to free
them.

We need apply per-vci critical sections to prevent data race.

Reference: mpi-forum/mpi-issues#710.
hzhou pushed a commit to hzhou/mpich that referenced this issue Aug 16, 2023
Maintain a per-comm hash for persistent requests so we can check and
free them at MPI_Comm_disconnect or MPI_Finalize if user forget to free
them.

We need apply per-vci critical sections to prevent data race.

Reference: mpi-forum/mpi-issues#710.
@Wee-Free-Scot Wee-Free-Scot added the chap-p2p Point to Point Communication Chapter Committee label Sep 7, 2023
janciesko pushed a commit to janciesko/mpich that referenced this issue Mar 26, 2024
Maintain a per-comm hash for persistent requests so we can check and
free them at MPI_Comm_disconnect or MPI_Finalize if user forget to free
them.

We need apply per-vci critical sections to prevent data race.

Reference: mpi-forum/mpi-issues#710.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee chap-dynamic Process Initialization, Creation, and Management Chapter Committee chap-p2p Point to Point Communication Chapter Committee errata Errata items for the previous MPI Standard had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote
Projects
No open projects
Status: Done
Development

No branches or pull requests