Enable GPU access with DeviceRequests #7929

aiordache · 2020-11-13T21:28:03Z

Convert compose-spec devices mapping to DeviceRequest to enable GPU access to containers.

Tested on a GPU host:

$ cat /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Sample compose files:

services:
  test:
    image: nvidia/cuda
    command: nvidia-smi
    runtime: nvidia

or

services:
  test:
    image: nvidia/cuda
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
          - 'driver': 'nvidia'
            'count': 1
            'capabilities': ['gpu', 'utility']

$ docker-compose up
Creating network "gpu_default" with the default driver
Creating gpu_test_1 ... done
Attaching to gpu_test_1
test_1  | Fri Nov 13 20:46:11 2020       
test_1  | +-----------------------------------------------------------------------------+
test_1  | | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.1     |
test_1  | |-------------------------------+----------------------+----------------------+
test_1  | | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
test_1  | | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
test_1  | |                               |                      |               MIG M. |
test_1  | |===============================+======================+======================|
test_1  | |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
test_1  | | N/A   23C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
test_1  | |                               |                      |                  N/A |
test_1  | +-------------------------------+----------------------+----------------------+
test_1  |                                                                                
test_1  | +-----------------------------------------------------------------------------+
test_1  | | Processes:                                                                  |
test_1  | |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
test_1  | |        ID   ID                                                   Usage      |
test_1  | |=============================================================================|
test_1  | |  No running processes found                                                 |
test_1  | +-----------------------------------------------------------------------------+
gpu_test_1 exited with code 0

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;print(tf.test.gpu_device_name())"
    deploy:
      resources:
        reservations:
          devices:
          - 'driver': 'nvidia'
            'capabilities': ['gpu']

$ docker-compose up
Creating network "gpu_default" with the default driver
Creating gpu_test_1 ... done
Attaching to gpu_test_1
test_1  | 2020-11-13 20:49:54.444634: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
.....
test_1  | 2020-11-13 20:49:56.048674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 13970 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
test_1  | /device:GPU:0
gpu_test_1 exited with code 0

Tested on a multi-GPU host:

$ nvidia-smi 
Fri Nov 13 20:57:48 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
| N/A   72C    P8    12W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
| N/A   67C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            On   | 00000000:00:1D.0 Off |                    0 |
| N/A   74C    P8    12W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   62C    P8    11W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Enable access only to GPU-0 and GPU-3 devices

services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;print(tf.test.gpu_device_name())"
    deploy:
      resources:
        reservations:
          devices:
          - 'driver': 'nvidia'
            'device_ids': ['0','3']
            'capabilities': ['gpu']

$ docker-compose up
...
test_1  | 2020-11-13 21:02:52.076151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 13970 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5)
test_1  | 2020-11-13 21:02:52.076752: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1  | 2020-11-13 21:02:52.077844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:1 with 13970 MB memory) -> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
test_1  | /device:GPU:0
gpu_test_1 exited with code 0

Requires compose-spec/compose-spec#109
Closes #6691

opptimus · 2020-11-15T07:24:08Z

Wonderful and significant features, expect to be compiled！

chris-crone

One minor change but LGTM

chris-crone · 2020-11-16T11:06:53Z

compose/service.py

@@ -179,6 +180,7 @@ def __init__(
            ipc_mode=None,
            pid_mode=None,
            default_platform=None,
+            device_requests=None,


Careful of changing function parameter ordering. It's possible that users are calling this function with positional parameters so it's best to add new parameters to the end of the list (i.e.: after extra_labels).

Signed-off-by: aiordache <[email protected]>

Summary: Updating gpu access from docker-compose, pointed by this comment: https://github.com/facebookresearch/detectron2/blob/45a8bfb64053d71d9d7f136fb25a6abe841dc91f/docker/docker-compose.yml#L9 The solution comes from this [pull request](docker/compose#6691), and working since 1.28.0 [release](https://github.com/docker/compose/releases). It's the [official](docker/compose#7929) replace of `runtime: nvidia` In this way, we don't need to install nvidia-docker (less prerequisites 🎉), but nvidia-container-toolkit seems to be needed yet. Pull Request resolved: #2584 Reviewed By: theschnitz Differential Revision: D26318490 Pulled By: ppwwyyxx fbshipit-source-id: f732a8d05dbd42cd72d228719507ac45caa86ea4

aiordache requested review from ndeloof, rumpl and ulyssessouza as code owners November 13, 2020 21:28

aiordache force-pushed the gpu_device_request branch from abc93d0 to a48171e Compare November 16, 2020 09:52

aiordache requested review from chris-crone and gtardif November 16, 2020 09:53

aiordache force-pushed the gpu_device_request branch from a48171e to d554c57 Compare November 16, 2020 10:04

chris-crone approved these changes Nov 16, 2020

View reviewed changes

Implement device requests for GPU support

a3e7d51

Signed-off-by: aiordache <[email protected]>

aiordache force-pushed the gpu_device_request branch from d554c57 to a3e7d51 Compare November 16, 2020 13:43

gtardif approved these changes Nov 17, 2020

View reviewed changes

ndeloof merged commit 854c003 into docker:master Nov 17, 2020

estimadarocha mentioned this pull request Nov 17, 2020

Support for nvidia-container-toolkit and docker 19.03 portainer/portainer#3143

Closed

peaceiris mentioned this pull request Nov 21, 2020

Attach GPU via docker-compose SI-Aizu/documentation#329

Closed

bikol mentioned this pull request Nov 30, 2020

Change GPU configuration to match current docker-compose specs visheratin/dl-containers#1

Merged

aiordache added this to the 1.28.0 milestone Dec 7, 2020

JaledMC mentioned this pull request Feb 3, 2021

docker-compose gpu access updated, since 1.28.0 version facebookresearch/detectron2#2584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GPU access with DeviceRequests #7929

Enable GPU access with DeviceRequests #7929

aiordache commented Nov 13, 2020

opptimus commented Nov 15, 2020

chris-crone left a comment

chris-crone Nov 16, 2020

Enable GPU access with DeviceRequests #7929

Enable GPU access with DeviceRequests #7929

Conversation

aiordache commented Nov 13, 2020

opptimus commented Nov 15, 2020

chris-crone left a comment

Choose a reason for hiding this comment

chris-crone Nov 16, 2020

Choose a reason for hiding this comment