Skip to content

Commit

Permalink
Merge pull request #2359 from JasonRuonanWang/mpihandshake
Browse files Browse the repository at this point in the history
Added timeout back to MpiHandshake
  • Loading branch information
JasonRuonanWang authored Jul 6, 2020
2 parents 4e6d572 + 6e84655 commit a6fb7fe
Show file tree
Hide file tree
Showing 6 changed files with 13 additions and 29 deletions.
13 changes: 2 additions & 11 deletions docs/user_guide/source/engines/ssc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,14 @@ The SSC engine is designed specifically for strong code coupling. Currently SSC

The SSC engine takes the following parameters:

1. ``RendezvousAppCount``: Default **2**. The number of applications, including both writers and readers, that will work on this stream. The SSC engine's open function will block until all these applications reach the open call. If there are multiple applications in a workflow, this parameter needs to be set respectively for every application. For example, in a three-app coupling scenario: App 0 writes Stream A to App 1; App 1 writes Stream B to App 0; App 2 writes Stream C to App 1; App 1 writes Stream D to App 2, the parameter RendezvousAppCount for engine instances of every stream should be all set to 2, because for each of the streams, two applications will work on it. In another example, where App 0 writes Stream A to App 1 and App 2; App 1 writes Stream B to App 2, the parameter RendezvousAppCount for engine instances of Stream A and B should be set to 3 and 2 respectively, because three applications will work on Stream A, while two applications will work on Stream B.
1. ``OpenTimeoutSecs``: Default **10**. Timeout in seconds for opening a stream. The SSC engine's open function will block until the RendezvousAppCount is reached, or timeout, whichever comes first. If it reaches the timeout, SSC will throw an exception.

2. ``MaxStreamsPerApp``: Default **1**. The maximum number of streams that all applications sharing this MPI_COMM_WORLD can possibly open. It is required that this number is consistent across all ranks from all applications. This is used for pre-allocating the vectors holding MPI handshake informations and due to the fundamental communication mechanism of MPI, this information must be set statically through engine parameters, and the SSC engine cannot provide any mechanism to check if this parameter is set correctly. If this parameter is wrongly set, the SSC engine's open function will either exit early than expected without gathering all applications' handshake information, or it will block until timeout. It may cause other unpredictable errors too.

3. ``OpenTimeoutSecs``: Default **10**. Timeout in seconds for opening a stream. The SSC engine's open function will block until the RendezvousAppCount is reached, or timeout, whichever comes first. If it reaches the timeout, SSC will throw an exception.

4. ``MaxFilenameLength``: Default **128**. The maximum length of filenames across all ranks from all applications. It is used for allocating the handshake buffer. Due to the limitation of MPI communication, this number must be set statically. The default number should work for most use cases. SSC will throw an exception if any rank opens a stream with a filename longer than this number.

5. ``MpiMode``: Default **TwoSided**. MPI communication modes to use. Besides the default TwoSided mode using two sided MPI communications, MPI_Isend and MPI_Irecv, for data transport, there are four one sided MPI modes: OneSidedFencePush, OneSidedPostPush, OneSidedFencePull, and OneSidedPostPull. Modes with **Push** are based on the push model and use MPI_Put for data transport, while modes with **Pull** are based on the pull model and use MPI_Get. Modes with **Fence** use MPI_Win_fence for synchronization, while modes with **Post** use MPI_Win_start, MPI_Win_complete, MPI_Win_post and MPI_Win_wait.
2. ``MpiMode``: Default **TwoSided**. MPI communication modes to use. Besides the default TwoSided mode using two sided MPI communications, MPI_Isend and MPI_Irecv, for data transport, there are four one sided MPI modes: OneSidedFencePush, OneSidedPostPush, OneSidedFencePull, and OneSidedPostPull. Modes with **Push** are based on the push model and use MPI_Put for data transport, while modes with **Pull** are based on the pull model and use MPI_Get. Modes with **Fence** use MPI_Win_fence for synchronization, while modes with **Post** use MPI_Win_start, MPI_Win_complete, MPI_Win_post and MPI_Win_wait.

=============================== ================== ================================================
**Key** **Value Format** **Default** and Examples
=============================== ================== ================================================
RendezvousAppCount integer **2**, 3, 5, 10
MaxStreamsPerApp integer **1**, 2, 4, 8
OpenTimeoutSecs integer **10**, 2, 20, 200
MaxFilenameLength integer **128**, 32, 64, 512
MpiMode string **TwoSided**, OneSidedFencePush, OneSidedPostPush, OneSidedFencePull, OneSidedPostPull
=============================== ================== ================================================

Expand Down
6 changes: 0 additions & 6 deletions source/adios2/engine/ssc/SscReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,6 @@ SscReader::SscReader(IO &io, const std::string &name, const Mode mode,

helper::GetParameter(m_IO.m_Parameters, "MpiMode", m_MpiMode);
helper::GetParameter(m_IO.m_Parameters, "Verbose", m_Verbosity);
helper::GetParameter(m_IO.m_Parameters, "MaxFilenameLength",
m_MaxFilenameLength);
helper::GetParameter(m_IO.m_Parameters, "RendezvousAppCount",
m_RendezvousAppCount);
helper::GetParameter(m_IO.m_Parameters, "MaxStreamsPerApp",
m_MaxStreamsPerApp);
helper::GetParameter(m_IO.m_Parameters, "OpenTimeoutSecs",
m_OpenTimeoutSecs);

Expand Down
3 changes: 0 additions & 3 deletions source/adios2/engine/ssc/SscReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,6 @@ class SscReader : public Engine
ssc::RankPosMap &allOverlapRanks);

int m_Verbosity = 0;
int m_MaxFilenameLength = 128;
int m_MaxStreamsPerApp = 1;
int m_RendezvousAppCount = 2;
int m_OpenTimeoutSecs = 10;
};

Expand Down
6 changes: 0 additions & 6 deletions source/adios2/engine/ssc/SscWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,6 @@ SscWriter::SscWriter(IO &io, const std::string &name, const Mode mode,

helper::GetParameter(m_IO.m_Parameters, "MpiMode", m_MpiMode);
helper::GetParameter(m_IO.m_Parameters, "Verbose", m_Verbosity);
helper::GetParameter(m_IO.m_Parameters, "MaxFilenameLength",
m_MaxFilenameLength);
helper::GetParameter(m_IO.m_Parameters, "RendezvousAppCount",
m_RendezvousAppCount);
helper::GetParameter(m_IO.m_Parameters, "MaxStreamsPerApp",
m_MaxStreamsPerApp);
helper::GetParameter(m_IO.m_Parameters, "OpenTimeoutSecs",
m_OpenTimeoutSecs);

Expand Down
3 changes: 0 additions & 3 deletions source/adios2/engine/ssc/SscWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,6 @@ class SscWriter : public Engine
ssc::RankPosMap &allOverlapRanks);

int m_Verbosity = 0;
int m_MaxFilenameLength = 128;
int m_MaxStreamsPerApp = 1;
int m_RendezvousAppCount = 2;
int m_OpenTimeoutSecs = 10;
};

Expand Down
11 changes: 11 additions & 0 deletions source/adios2/helper/adiosMpiHandshake.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,22 @@ const std::vector<std::vector<int>> Handshake(const std::string &filename,
fsc << "completed";
fsc.close();

auto startTime = std::chrono::system_clock::now();
while (true)
{
std::ifstream fs;
try
{
auto nowTime = std::chrono::system_clock::now();
auto duration =
std::chrono::duration_cast<std::chrono::seconds>(
nowTime - startTime);
if (duration.count() > timeoutSeconds)
{
throw(std::runtime_error(
"Mpi handshake timeout for Stream " + filename));
}

fs.open(filename + ".w.c");
std::string line;
std::getline(fs, line);
Expand Down

0 comments on commit a6fb7fe

Please sign in to comment.