-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question of the MPI Network Layer (is the order guaranteed for the message?) #397
Comments
BTW: What is the intuition behind the assert statement? Galois/libdist/src/NetworkBuffered.cpp Lines 226 to 229 in 59d5aa5
|
NetworkInterface does not guarantee in-order delivery (even for the same tag). Gluon(-Async) ensures that all messages are eventually received. You can read the Gluon-Async paper for more details. The assert in line 226 is just asserting that a message with all 0s is NOT sent/received. |
Hi, @roshandathathri "In MPI, you are guaranteed that all messages on the same communicator/tag/rank combo will be received in the same order they were sent." |
No, MPI guarantees in-order delivery for the same tag. NetworkInteface doesn't because it can buffer messages and return smaller, later messages before larger, earlier messages are received. |
Cool @roshandathathri . We are on the same page regarding "in-order delivery for the same tag". Then, it comes back to the original question I proposed, related to the implementation codebase. The receiver side actually is using one workerThread, with a while loop. It keeps doing Send, Probe, and Receive with MPI. When something is probed, then it will continue to call MPI_IRecv to get the message, and put it into a FIFO queue (a std::deque), then the incoming message is dequeued following the FIFO order when the other threads call receiveTagged, with a tag parameter. My concern is, the tag parameter is incrementally increasing, (regarding the evliPhase variable), so the application must receive tag=1, and then tag=2. But if the message with tag=2 is probed earlier and enqued earlier, it will stay at the top of the queue. Whenever the receiveTagged (with tag=1) API is called, it will check rq.hasData, then it finds the top message's tag is not 1, it will return false. In that way, will that be stuck here? (just like head-of-line blocking) |
Yes, the application is expected to check for both tags and NOT block/wait on one tag. |
Hi, Galois Staff.
I am now studying the network layer of Galois (mainly NetworkBuffer.cpp and NetworkIOMPI.cpp)
I notice that, in async mode, Galois is calling MPI_ISend and MPI_IRecv for communication, and messages are given different tags (based on galois::runtime::evilPhase).
I can understand that messages are sent in the asending order of tag (evilPhase).
But I found here (refer to the link below) that MPI does not guanratee messages are received in the same order, if they are sent with different tags
https://stackoverflow.com/questions/20909648/is-there-a-fifo-per-process-fifo-guarantee-on-mpi-isend
That means, when MPI_ISend send 2 messages with tag1 and tag2 respectively, tag1< tag2 (Let's say tag1=1, tag2=2), the other node may receive tag2 first ahead of tag1, because in
Galois/libdist/src/NetworkIOMPI.cpp
Line 147 in 59d5aa5
it is probing any tag. If tag2 is first recived and put into done queue, and the workerThread copies it to recvBuffer.data.
Galois/libdist/src/NetworkBuffered.cpp
Lines 414 to 425 in 59d5aa5
Later syncNetRecv calls receiveTagged with the tag=1,
Galois/libgluon/include/galois/graphs/GluonSubstrate.h
Lines 2490 to 2504 in 59d5aa5
But the head of tag =2
Galois/libdist/src/NetworkBuffered.cpp
Line 476 in 59d5aa5
Galois/libdist/src/NetworkBuffered.cpp
Lines 77 to 78 in 59d5aa5
In that way, we can never match the tag and get the message with tag=1 (because the data queue is a deque type rather than a priority queue). Once reordering really occurs and the larger tag (tag=2) is received first, then it will cause head-of-line blocking. I am not sure whether my guess can happen, so I hope to get some explanation from you staff.
The text was updated successfully, but these errors were encountered: