Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run NVIDIA containers #20257

Closed
muety opened this issue Oct 4, 2023 · 2 comments
Closed

Unable to run NVIDIA containers #20257

muety opened this issue Oct 4, 2023 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@muety
Copy link

muety commented Oct 4, 2023

Issue Description

I installed Podman on Ubuntu 22.04 from the official sources. In addition, I installed the NVIDIA Container Toolkit as described here and generated a CDI spec as described here. Running nvidia-ctk cdi list properly gives the following output:

INFO[0000] Found 2 CDI devices                          
nvidia.com/gpu=0
nvidia.com/gpu=all

However, when trying to run

podman run --rm --device=nvidia.com/gpu=0 ubuntu nvidia-smi -L

I'm getting the following error.

Error: stat nvidia.com/gpu=0: no such file or directory

The above NVIDIA docs mention that the --device argument is only available from Podman 4.1.0, while I'm on 3.4.4, so maybe that's an issue? If yes, what is the correct way of running GPU-dependent containers with my Podman version?

Steps to reproduce the issue

See above.

Describe the results you received

Error: stat nvidia.com/gpu=0: no such file or directory

Describe the results you expected

Actual nvidia-smi output.

podman info output

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 48
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: ufo
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 44350
      size: 1
    - container_id: 1
      host_id: 1000000
      size: 1024
    uidmap:
    - container_id: 0
      host_id: 246740
      size: 1
    - container_id: 1
      host_id: 1000000
      size: 1024
  kernel: 6.2.0-34-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 128253014016
  memTotal: 134896099328
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/246740/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.6.1
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 4h 10m 47.75s (Approximately 0.17 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/ws/pu0288/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.10.5
        fuse-overlayfs: version 1.7.1
        FUSE library version 3.10.5
        using FUSE kernel interface version 7.31
  graphRoot: /var/containers/pu0288
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 5
  runRoot: /run/user/246740/containers
  volumePath: /var/containers/pu0288/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 01:00:00 1970
  GitCommit: ""
  GoVersion: go1.18.1
  OsArch: linux/amd64
  Version: 3.4.4

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@muety muety added the kind/bug Categorizes issue or PR as related to a bug. label Oct 4, 2023
@Luap99
Copy link
Member

Luap99 commented Oct 4, 2023

You need to use a newer podman, 4.6 or 4.7 is strongly recommend.
We do not support older versions upstream.

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2023
@muety
Copy link
Author

muety commented Oct 4, 2023

What is the recommended way to get Podman on Ubuntu 22.04? Since we have a whole pool of workstations to manage, I'd preferably wouldn't want to have to compile it myself.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jan 3, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

2 participants