`invalid IPC handle` when using CUDA-Aware MPI #12

cwpearson · 2020-02-17T13:38:23Z

It may be the case that CUDA-Aware MPI stuff is initialized during MPI_Init, so cudaSetDevice must be called before that. Furthermore, cudaSetDevice may need to be called before MPI_Send.

To this end, we can probably only use this with one GPU per rank.

https://devblogs.nvidia.com/benchmarking-cuda-aware-mpi/

The text was updated successfully, but these errors were encountered:

cwpearson · 2020-02-17T13:41:14Z

added cudaSetDevice before MPI_Isend and MPI_Irecv in CudaAwareMpiSender in e9ba952. It's possible this will fix the issue.

cwpearson · 2020-02-17T18:26:54Z

This fixes the issue for one rank per node, or one rank per GPU, but otherwise multiple GPUs per rank does not work

cwpearson added the bug Something isn't working label Feb 17, 2020

cwpearson assigned cwpearson and merthidayetoglu Feb 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`invalid IPC handle` when using CUDA-Aware MPI #12

`invalid IPC handle` when using CUDA-Aware MPI #12

cwpearson commented Feb 17, 2020

cwpearson commented Feb 17, 2020

cwpearson commented Feb 17, 2020 •

edited

Loading

invalid IPC handle when using CUDA-Aware MPI #12

invalid IPC handle when using CUDA-Aware MPI #12

Comments

cwpearson commented Feb 17, 2020

cwpearson commented Feb 17, 2020

cwpearson commented Feb 17, 2020 • edited Loading

`invalid IPC handle` when using CUDA-Aware MPI #12

`invalid IPC handle` when using CUDA-Aware MPI #12

cwpearson commented Feb 17, 2020 •

edited

Loading