Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containers.cdi.dynamic.nvidia with docker_25 does not enable gpu use in containers #305312

Closed
collinarnett opened this issue Apr 19, 2024 · 2 comments · Fixed by #306337
Closed

Comments

@collinarnett
Copy link

Describe the bug

Cannot run nvidia compatable containers with docker 25 and virtualisation.containers.cdi.dynamic.nvidia.enable = true;

Steps To Reproduce

Steps to reproduce the behavior:
Enable the following

 virtualisation.docker.package = pkgs.docker_25;                                                                                                                      
 virtualisation.containers.cdi.dynamic.nvidia.enable = true;

 nixpkgs.config.allowUnfree = true;                                                                                                                                   
 services.xserver.videoDrivers = ["nvidia"];                                                                                                                          
 hardware.opengl.enable = true;  

Run the following command:

sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L

Result:

$ sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
[sudo] password for collin: 
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

Expected behavior

nvidia-smi prints gpu

If the same command above is run with podman, it correctly prints the GPU.

    virtualisation.podman.enable = true;                                                                                                                                 
    virtualisation.podman.dockerCompat = true;                                                                                                                           
    virtualisation.containers.cdi.dynamic.nvidia.enable = true;  
$ sudo docker run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
✔ docker.io/library/ubuntu:latest
Trying to pull docker.io/library/ubuntu:latest...
Getting image source signatures
Copying blob 3c645031de29 done   | 
Copying config 7af9ba4f0a done   | 
Writing manifest to image destination
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3f344b2c-09ed-da45-01f1-7a7ed22e6494)

Additional context

My use case revolves around using this feature in docker compose:

https://docs.docker.com/compose/gpu-support/

I am running this GPU via PCIE passthrough on a libvirt qemu vm if that's relevant.

Notify maintainers

@SomeoneSerge
@ereslibre

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.27, NixOS, 24.05 (Uakari), 24.05.20240416.5672bc9`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.21.2`
 - nixpkgs: `/nix/store/3kwj19dbdfxnjbcns4hw307ylhz3wgrm-source`

Add a 👍 reaction to issues you find important.

@ereslibre
Copy link
Member

ereslibre commented Apr 19, 2024

Hello @collinarnett!

I didn't have the time to try to reproduce yet -- I do use podman myself. -- Looking at the source code, it seems that it needs to be enabled:

https://github.com/moby/moby/blob/615dfdf67264ed5b08dd5e86657bf0e580731cea/cmd/dockerd/daemon.go#L279-L281

Also documented in: https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices

Have you enabled it explicitly in the daemon.json or by some NixOS means (virtualisation.docker.daemon.settings maybe?)

@ereslibre
Copy link
Member

ereslibre commented Apr 23, 2024

Good news! I was able to reproduce your issue (thank you for the detailed description @collinarnett!), and I can also confirm that adding the following to my NixOS configuration fixed it:

virtualisation.docker.daemon.settings = {
  features = {
    cdi = true;
  };
};

Now I'm going to open a PR to fix this problem, so that CDI works out of the box for Docker users as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants