Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid IPC handle when using CUDA-Aware MPI #12

Open
cwpearson opened this issue Feb 17, 2020 · 2 comments
Open

invalid IPC handle when using CUDA-Aware MPI #12

cwpearson opened this issue Feb 17, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@cwpearson
Copy link
Owner

It may be the case that CUDA-Aware MPI stuff is initialized during MPI_Init, so cudaSetDevice must be called before that. Furthermore, cudaSetDevice may need to be called before MPI_Send.

To this end, we can probably only use this with one GPU per rank.

https://devblogs.nvidia.com/benchmarking-cuda-aware-mpi/

@cwpearson cwpearson added the bug Something isn't working label Feb 17, 2020
@cwpearson
Copy link
Owner Author

added cudaSetDevice before MPI_Isend and MPI_Irecv in CudaAwareMpiSender in e9ba952. It's possible this will fix the issue.

@cwpearson
Copy link
Owner Author

cwpearson commented Feb 17, 2020

This fixes the issue for one rank per node, or one rank per GPU, but otherwise multiple GPUs per rank does not work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants