Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[podman-next] podman container restore fails: Can't fstat inherit fd 6: Bad file descriptor #1367

Closed
martinpitt opened this issue Dec 4, 2023 · 2 comments · Fixed by #1368

Comments

@martinpitt
Copy link
Contributor

martinpitt commented Dec 4, 2023

Issue Description

Around Friday last week, cockpit-podman's TestApplication.testCheckpointRestore test case started to fail against the podman-next COPR, in our nightly runs. The same test also started to fail in podman PRs, e.g. in containers/podman#20857 (comment) or containers/podman#20647 (comment).

The earliest log where this started to fail shows only a few upgraded packages from COPR:

Upgrading:
 aardvark-dns                   x86_64  102:1.9.0-1.20231130212258085812.main.6.g045f64e.fc39         copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next  904 k
 container-selinux              noarch  102:2.226.0-1.20231130004103069762.main.0.gcff8553.fc39       copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next   49 k
 containers-common              noarch  4:1-100.fc39                                                  copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next   92 k
 containers-common-extra        noarch  4:1-100.fc39                                                  copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next   13 k
 crun                           x86_64  102:1.12-1.20231201155948394913.main.11.gab6fd3f.fc39         copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next  215 k
 gvisor-tap-vsock               x86_64  103:0.7.1-1.20231115155950872776.main.18.gc25d478.fc39        copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next  3.7 M
 gvisor-tap-vsock-gvforwarder   x86_64  103:0.7.1-1.20231115155950872776.main.18.gc25d478.fc39        copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next  1.8 M
 netavark                       x86_64  102:1.9.0-1.20231201104225764279.main.9.gdbe870c.fc39         copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next  3.2 M
 podman                         x86_64  102:5.0.0~dev-1.20231201211626708423.main.2603.bc124dd13.fc39 copr:copr.fedorainfracloud.org:rhcontainerbot:podman-next   15 M

I'll bisect them in a bit, but crun and podman are the most "promising" candidates.

Steps to reproduce the issue

Upgrade to the podman-next COPR:

dnf -y copr enable rhcontainerbot/podman-next
dnf -y update --repo 'copr*'

Then try checkpoint/restore:

podman run -d --name t1 quay.io/libpod/busybox sleep 1000
podman container checkpoint t1
podman container restore t1

This is the most basic test case that I can imagine. How did that not break your tests?

Describe the results you received

Error: OCI runtime error: crun: CRIU restoring failed -52.  Please check CRIU logfile `/var/lib/containers/storage/overlay-containers/adaf4ca81a74bdd67af49226c5105a3eb263d059b82a25d330c6b973cc0b8fca/userdata/restore.log`

That restore.log says:

(00.000019) Version: 3.18 (gitid 0)
(00.000038) Running on fedora-39-127-0-0-2-2201 Linux 6.6.2-201.fc39.x86_64 containers/podman#1 SMP PREEMPT_DYNAMIC Wed Nov 22 21:31:42 UTC 2023 x86_64
(00.000058) Loaded kdat cache from /run/criu/criu.kdat
(00.000100) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.000114) Hugetlb size 1024 Mb is supported but cannot get dev's number
(00.000631) Error (criu/files.c:1590): Can't fstat inherit fd 6: Bad file descriptor

Describe the results you expected

podman container restore t1 succeeds.

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.2-dev
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.fc39.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 97.75
    systemPercent: 1.23
    userPercent: 1.01
  cpus: 1
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: cloud
    version: "39"
  eventLogger: journald
  freeLocks: 2047
  hostname: fedora-39-127-0-0-2-2201
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.6.2-201.fc39.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 372948992
  memTotal: 1135865856
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.20231130212258085812.main.6.g045f64e.fc39.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0-dev
    package: netavark-1.9.0-1.20231201104225764279.main.9.gdbe870c.fc39.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.0-dev
  ociRuntime:
    name: crun
    package: crun-1.12-1.20231201155948394913.main.11.gab6fd3f.fc39.x86_64
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: c789fac2bb4b415068289490b73276458f9f9c5c
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231119.g4f1709d-1.fc39.x86_64
    version: |
      pasta 0^20231119.g4f1709d-1.fc39.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 1123282944
  swapTotal: 1135603712
  uptime: 0h 27m 57.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 12798898176
  graphRootUsed: 2070368256
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.0.0-dev-8d0be6409
  Built: 1701656353
  BuiltTime: Mon Dec  4 02:19:13 2023
  GitCommit: ""
  GoVersion: go1.21.4
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.0-dev-8d0be6409

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Standard F39 cloud image, or Testing Farm AWS instance

Additional information

No response

@martinpitt
Copy link
Contributor Author

A-ha! Downgrading crun from 102:1.12-1.20231201155948394913.main.11.gab6fd3f.fc39 to 1.12-1.20231201111844695411.main.9.gffa74d0.fc39.x86_64.rpm still fails, but another dnf downgrade crun moves it to 1.12-1.20231127173625176963.main.5.g6187359.fc39 and then it succeeds.

I'm fairly sure this is fallout from #1360 , can you please transfer the issue there if appropriate? (Or perhaps podman itself needs to be adjusted to it?)

@giuseppe giuseppe transferred this issue from containers/podman Dec 4, 2023
giuseppe added a commit to giuseppe/crun that referenced this issue Dec 4, 2023
these fds must be passed to the child process.

commit 3ad89be was overzealous and
introduced the regression.

Closes: containers#1367

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

giuseppe commented Dec 4, 2023

thanks, I've opened a PR: #1368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants