A test for the core assumption of data + signal of MSCCL++ #91

saeedmaleki · 2023-05-31T00:42:47Z

A test that can check if our core assumption about data->write followed by signal()/wait() always preserve the write order on the receiving GPU side.

Something like this:

__device__ void pingPongTest(volatile int* sendBuff, mscclpp::channel::SimpleDeviceChannel devChan, int rank, int worldSize, int nranksPerNode,
                           int remoteRank, size_t nelemsPerGPU) {
  int nTries = 1000;
  int flusher = 0;
  int rank1Offset = 10000000;
  if (true){
    for (int i = 0; i < nTries; i++){
      if (rank == 0){
        if (i > 0){
          if (threadIdx.x == 0)
            devChan.wait();
          __syncthreads();
          for (int j = threadIdx.x; j < nelemsPerGPU; j+=blockDim.x){
            if (sendBuff[j] != rank1Offset+i-1+j){
              printf("rank 0 ERROR: sendBuff[%d] = %d, expected %d\n", j, sendBuff[j], 100000+i-1+j);
            }
          }
        }
        for (int j = threadIdx.x; j < nelemsPerGPU; j+=blockDim.x){
          sendBuff[j] = i+j;
        }
        __syncthreads();
        // __threadfence_system(); // not necessary if we make sendBuff volatile
        if (threadIdx.x == 0)
          devChan.putWithSignal(0, nelemsPerGPU * sizeof(int));
      }
      if (rank == 1){
        if (threadIdx.x == 0)
          devChan.wait();
        __syncthreads();
        for (int j = threadIdx.x; j < nelemsPerGPU; j+=blockDim.x){
          if (sendBuff[j] != i+j){
            printf("rank 1 ERROR: sendBuff[%d] = %d, expected %d\n", j, sendBuff[j], i+j);
          }
        }
        if (i < nTries-1){
          for (int j = threadIdx.x; j < nelemsPerGPU; j+=blockDim.x){
            sendBuff[j] = rank1Offset+i+j;
          }
          __syncthreads();
          // __threadfence_system(); // not necessary if we make sendBuff volatile
          if (threadIdx.x == 0)
            devChan.putWithSignal(0, nelemsPerGPU * sizeof(int));
        }
      }
      flusher++;
      if (flusher == 100){
        devChan.flush();
        flusher = 0;
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

chhwang · 2023-06-01T14:17:00Z

Hi @saeedmaleki, I'm trying to add this to our unit test, and the code is here. Seems this is an orthogonal issue, but the test code returns an error message NonblockingFuture::get() called before ready (Mscclpp failure: InvalidUsage). Do you have any ideas why? The commands is:

mpirun --allow-run-as-root -tag-output -np 2 ./build/test/mp_unit_tests --gtest_filter=ChannelOneToOneTest.PingPongIb

chhwang · 2023-06-02T05:39:08Z

The issue is resolved and the test is added to unit tests in #81

This was referenced May 31, 2023

Update unit tests #81

Merged

MSCCL++ v0.2.0 Release Plan (Released) #31

Closed

chhwang closed this as completed Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A test for the core assumption of data + signal of MSCCL++ #91

A test for the core assumption of data + signal of MSCCL++ #91

saeedmaleki commented May 31, 2023

chhwang commented Jun 1, 2023

chhwang commented Jun 2, 2023

A test for the core assumption of data + signal of MSCCL++ #91

A test for the core assumption of data + signal of MSCCL++ #91

Comments

saeedmaleki commented May 31, 2023

chhwang commented Jun 1, 2023

chhwang commented Jun 2, 2023