Handling CUDA_VISIBLE_DEVICES #24

cwpearson · 2020-08-18T15:36:40Z

On some platforms (e.g. OLCF Summit), MPI ranks' visibility of GPUs is typically restricted with CUDA_VISIBLE_DEVICES.
We currently require that all ranks be able to see all GPUs, so we can detect GPU distance, for example:

stencil/include/stencil/partition.hpp

Lines 710 to 713 in 6770d3c

    
           // recover the cuda device ID for this component 
        
           const int di = globalCudaIds[ranks[ri] * gpusPerRank + gi]; 
        
           const int dj = globalCudaIds[ranks[rj] * gpusPerRank + gj]; 
        
           bandwidth[ci][cj] = gpu_topo::bandwidth(di, dj);

If all GPUs have ID 0, our GPU topology code will think all those GPUs are the same device, since according to a particular rank GPU0 is GPU0.

It may be possible to have the ranks report a UUID for each GPU instead of their CUDA id, and use that throughout to distinguish GPUs.

Once we can support this, we can allow users to tie CPU execution to CPUs with affinity for a particular GPU, which could improve performance.

cwpearson added enhancement New feature or request needsinfo resource:summit labels Aug 18, 2020

cwpearson self-assigned this Aug 18, 2020

cwpearson added the performance label Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling CUDA_VISIBLE_DEVICES #24

Handling CUDA_VISIBLE_DEVICES #24

cwpearson commented Aug 18, 2020 •

edited

Loading

Handling CUDA_VISIBLE_DEVICES #24

Handling CUDA_VISIBLE_DEVICES #24

Comments

cwpearson commented Aug 18, 2020 • edited Loading

cwpearson commented Aug 18, 2020 •

edited

Loading