Support multiple GPUs on multiple nodes #9

yuhc · 2020-06-08T23:13:10Z

AvA already supports single-node multi-GPU case, where a single process can get access to multiple GPUs on a single GPU node.
The CUDA process needs to call cudaSetDevice explicitly to choose the in-use GPU during the runtime, and this feature can be utilized to support multi-node multi-GPU.

The basic idea is to run a worker on a GPU (which can be on different GPU nodes). When the application calls cudaSetDevice, guestlib changes the address of the worker dynamically and all following CUDA APIs will be forwarded to that worker. This assumes that there is no inter-GPU data transfer via channels like NVLink.

An improvement will be to use multiple local GPUs in a worker, and the guestlib changes the worker address and forwards cudaSetDevice(adjusted GPU ID) to that worker.

The text was updated successfully, but these errors were encountered:

yuhc added the enhancement New feature or request label Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple GPUs on multiple nodes #9

Support multiple GPUs on multiple nodes #9

yuhc commented Jun 8, 2020

Support multiple GPUs on multiple nodes #9

Support multiple GPUs on multiple nodes #9

Comments

yuhc commented Jun 8, 2020