Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privileged podman ps broken after reboot #22159

Closed
ubergeek42 opened this issue Mar 25, 2024 · 11 comments
Closed

privileged podman ps broken after reboot #22159

ubergeek42 opened this issue Mar 25, 2024 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@ubergeek42
Copy link
Contributor

Issue Description

A running container killed with SIGKILL, and with the network namespace/tmpfs removed (say, because of a reboot), results in podman getting stuck in a broken state. In our environment, this seems to reliably happen after rebooting with a container running (systemd seems to SIGKILL our containers, and the reboot cleans up the network namespaces/tmpfs state files). After the host is in a broken state, running podman ps --sync or similar commands result in an error like: error joining network namespace for container CONTAINERID. Additionally podman inspect returns that the container is still running. I would expect that podman discovers this and recovers in a sane manner, currently the only fix we have is to run podman rm -f <broken container id>

Some details that might be relevant/differ from stock deployment:

  • We're running podman 3.4.4 on ubuntu 22.04 (the latest version from the ubuntu repos).
  • /etc/containers/containers.conf points to a non-default crun (crun version 1.14.1, commit: de537a7965bfbe9992e2cfae0baeb56a08128171), because we run a 6.5.x kernel and the stock crun is broken with regards to symlinks
  • We have a custom graphroot specified in /etc/containers/storage.conf, which points to a directory on an xfs filesystem (which is a loopback device mounted sometime after boot by our application). This is to support storage quotas.
  • Our containers are run as systemd units, and systemd seems to be sigkill-ing them on reboot, which is not ideal, but I would expect podman to recover from this. Our containers are ephemeral so we don't actually care if they're sigkill'd/corrupted/whatever, but expect podman to do the right thing here.

I believe I tracked down (at least part of) the underlying problem, and it appears to still be present in main. Even if it's not my specific problem, it's definitely incorrect code and should be fixed. I am not able to compile and test a new version to confirm if it fixes my issue or not yet. The issue is with this block of code:

if err := cmd.Start(); err != nil {
out, err2 := io.ReadAll(errPipe)
if err2 != nil {
return fmt.Errorf("getting container %s state: %w", ctr.ID(), err)
}
if strings.Contains(string(out), "does not exist") || strings.Contains(string(out), "No such file") {
if err := ctr.removeConmonFiles(); err != nil {
logrus.Debugf("unable to remove conmon files for container %s", ctr.ID())
}
ctr.state.ExitCode = -1
ctr.state.FinishedTime = time.Now()
ctr.state.State = define.ContainerStateExited
return ctr.runtime.state.AddContainerExitCode(ctr.ID(), ctr.state.ExitCode)
}
return fmt.Errorf("getting container %s state. stderr/out: %s: %w", ctr.ID(), out, err)
}

The problem appears to be thinking that cmd.Start() will return a non-nil error if the program exits abnormally with some stderr output, which is what cmd.Run() will do, but cmd.Start() never will. cmd.Start() returns simply whether the program could be started, and returns before the program is finished. So we never check the stderr output and update the container to exited if it's actually dead, instead charging ahead and leading to the failure we see later.

c554672 is where this error was introduced (~6 years ago)

For my broken containers, crun state returns no such file or directory for the status file.

$ sudo /usr/local/bin/crun-1.14.1-linux-amd64 state 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f 
2024-03-25T14:55:07.133949Z: error opening file `/run/crun/63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f/status`: No such file or directory

Steps to reproduce the issue

Steps to reproduce the issue

  1. Reboot (or maybe systemctl kill podman-unit)
  2. Run podman ps (or podman ps --sync)
  3. Experience error

I did try to reproduce this in an easier manner, the following steps resulted in the same brokenness:

  1. $ sudo podman run --rm -it --name test-container docker.io/library/bash:latest
  2. In another terminal, ps ax | grep podman, then kill the conmon, and podman run using kill -9 <PIDS>
  3. Clean up the network namespace - ip netns del cni-... (look in /run/netns to figure out the name)
  4. Clean up the /run/crun/ directory (seems optional, with it remaining podman ps still spits out the error)
  5. Run podman ps
  6. Experience error

Describe the results you received

Error messages and podman thinking the container is still alive:

$ sudo podman ps
ERRO[0000] error joining network namespace for container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f: error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory 
Error: error joining network namespace of container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f: error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory
$ sudo podman inspect 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f | jq .[0].State
ERRO[0000] error joining network namespace for container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f: error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory 
{
  "OciVersion": "1.0.2-dev",
  "Status": "running",
  "Running": true,
  "Paused": false,
  "Restarting": false,
  "OOMKilled": false,
  "Dead": false,
  "Pid": 962562,
  "ConmonPid": 962558,
  "ExitCode": 0,
  "Error": "",
  "StartedAt": "2024-03-22T16:04:42.994617267Z",
  "FinishedAt": "0001-01-01T00:00:00Z",
  "Healthcheck": {
    "Status": "",
    "FailingStreak": 0,
    "Log": null
  }
}

Describe the results you expected

Podman to correctly discover that the container is no longer running, update it's status appropriately, and not return any error messages (or possibly return a non-fatal error message)

podman info output

$ sudo podman info
ERRO[0000] error joining network namespace for container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f: error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory 
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 16
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: my-hostname
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.5.0-26-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 13532495872
  memTotal: 134967214080
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/local/bin/crun-1.14.1-linux-amd64
    version: |-
      crun version 1.14.1
      commit: de537a7965bfbe9992e2cfae0baeb56a08128171
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.6.1
  swapFree: 0
  swapTotal: 0
  uptime: 69h 52m 6.38s (Approximately 2.88 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/podman-storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 6
  runRoot: /run/containers/storage
  volumePath: /var/lib/podman-storage/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.18.1
  OsArch: linux/amd64
  Version: 3.4.4

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@ubergeek42 ubergeek42 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2024
@ubergeek42 ubergeek42 changed the title rootful podman ps broken after reboot privileged podman ps broken after reboot Mar 25, 2024
@baude
Copy link
Member

baude commented Mar 25, 2024

assigned to @giuseppe as he wrote the original code ... mind taking a look? if you dont have time, unassign yourself.

@baude
Copy link
Member

baude commented Mar 25, 2024

actually it looks like a PR was provider #22160

@mheon
Copy link
Member

mheon commented Mar 26, 2024

I don't disagree with your patch, but I see you discussing how reboots are breaking Podman and that absolutely should not be a problem.

@ubergeek42
Copy link
Contributor Author

Right, as best I can tell, the patch I made is probably only for the case of podman ps --sync, and if it was functional then maybe the container state would be set to exited and it wouldn't try to join the (missing) netns. But without --sync I think my patch is irrelevant to the error.

To clarify a reboot appears to cause the issue in my environment because systemd is going wild with SIGKILL on my containers.

I did some more digging. The error message I see appears to come from replaceNetNS: https://github.com/containers/podman/blob/v3.4/libpod/boltdb_state_linux.go#L13
The call to replaceNetNS is from UpdateContainer here (and appears to only affect privileged containers): https://github.com/containers/podman/blob/v3.4/libpod/boltdb_state.go#L777

I do see that this has been changed by this pr and is in v4.4 and newer. If I'm reading it right podman ps would no longer try to join the network namespace, but I'm still not sure if it would figure out/update that the container was dead.

The linked PR says:

The old version required us to always open the netns before we could attach it to the container state struct which caused problem in some cases were the netns was no longer valid.

Now we use the netns as string throughout the code, this allow us to only open it when needed reducing possible errors.

I've figured out how to make systemd stop SIGKILL-ing my containers: Add Delegate=yes so that the PostStop commands run properly; podman is launched with --cgroups=split --systemd=always --sdnotify=conmon inside the systemd unit. Without Delegate=yes systemd fails with status=219/CGROUP when trying to launch the PostStop command, I presume because the cgroup it's trying to launch the process into has been modified in some way by podman that systemd doesn't like. I don't fully understand what podman does to the cgroup to make systemd unhappy with it.

I guess I'll still get into this situation if my PostStop command running podman rm -f <cid> fails to terminate the container within the systemd timeout, resulting in a SIGKILL, but for now this seems like a good enough workaround for me.

Also for what it's worth this error doesn't prevent launching/inspecting new containers, just ps functionality, but that makes it hard to know what you have running.


Given this happens on ubuntu 22.04 and the distro released version of podman available there is v3.4.4 (which I understand is rather old at this point) and it appears there have been at least some PRs/changes to the codepath resulting in the error I'm seeing; I'm ok if you want to close this issue. I also don't have a newer environment where I can test with the latest podman to try to reproduce this error either, nor an easy way to get a newer version of podman onto my older hosts, so I think this is as far as I can get debugging/troubleshooting it.

A couple final questions just to make sure I didn't miss something obvious:

  • Is there anything that needs to be done after a reboot/crash/etc for podman to refresh it's internal state? Is that what --sync is for?
    • Does a non-standard graphroot location make any difference for this? (I just add --root=/var/lib/podman-storage to all my podman commands, it isn't set in /etc/containers/storage.conf)
  • Does the use of Delegate=yes make sense for the systemd arguments? Should that be the recommended setting when using --cgroups=split? I couldn't find a definitive answer for this; I see it referenced in some places people are discussing podman-systemd-generate, but not others, so I'm not completely sure I understand whether it's necessary or not (or why it helps to fix my problem).

@mheon
Copy link
Member

mheon commented Mar 26, 2024

We should be wiping container state on a reboot, though? Systemd SIGKILL shouldn't matter; we detect a reboot, we should reset container state to a sane value automatically, unless podman's tmpdir is not a tmpfs

@mheon
Copy link
Member

mheon commented Mar 26, 2024

Sync is basically an escape hatch for things having gone really wrong. It should not be mandatory at all.

I'll defer to @giuseppe on the Delegate question

@ubergeek42
Copy link
Contributor Author

ubergeek42 commented Mar 26, 2024

I did find some other discussion here on github before opening this issue, and confirmed that the runroot is on tmpfs(/run is a tmpfs, and that's where my runroot is), and that it's wiped on reboot. I even poked around other tmp directories that I thought might be related and couldn't find any state persisting after the reboot, outside of I guess the boltdb database. So as best I can tell I'm hitting a bug that probably is fixed by #16756 where it's trying to join the network namespace even though it doesn't exist.

@mheon
Copy link
Member

mheon commented Mar 26, 2024

For this, we actually care about tmpdir - e.g. from podman info --log-level=debug:

DEBU[0000] Using graph root /home/mheon/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000/containers     
DEBU[0000] Using static dir /home/mheon/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /home/mheon/tmp                
DEBU[0000] Using volume path /home/mheon/.local/share/containers/storage/volumes

The fourth line, tmp dir, is presently pointing at a completely nonsensical /home/mheon/tmp directory, which is certainly not a tmpfs (probably because I didn't have a systemd user session active when I first ran Podman). That is where we place our sentry file to detect reboots.

We're discussing improving this behavior a lot in #22141 (actively reporting bad configs to users in 5.0, and refusing to even start if an unhandled reboot is detected in 5.1) which should alleviate this for good in the future.

@ubergeek42
Copy link
Contributor Author

ubergeek42 commented Mar 26, 2024

Ah, I have a --log-level=trace output (buried in my notes, forgot to post this earlier), which shows tmp dir was /run/libpod, which again is tmpfs. So that shouldn't have been the issue. Including the full trace below.

DEBU[0000] Using graph root /var/lib/podman-storage 
DEBU[0000] Using run root /run/containers/storage       
DEBU[0000] Using static dir /var/lib/podman-storage/libpod 
DEBU[0000] Using tmp dir /run/libpod                    
DEBU[0000] Using volume path /var/lib/podman-storage/volumes         

$ df -h /run/libpod
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            13G  3.1M   13G   1% /run
Full ` podman ps --log-level=trace --sync` output
$ sudo podman ps --log-level=trace --sync
INFO[0000] podman filtering at log level trace          
DEBU[0000] Called ps.PersistentPreRunE(podman ps --log-level=trace --sync) 
TRAC[0000] Reading configuration file "/usr/share/containers/containers.conf" 
DEBU[0000] Merged system config "/usr/share/containers/containers.conf" 
TRAC[0000] &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.44.4 Annotations:[] CgroupNS:private Cgroups:enabled DefaultCapabilities:[CHOWN DAC_OVERRIDE FOWNER FSETID KILL NET_BIND_SERVICE SETFCAP SETGID SETPCAP SETUID SYS_CHROOT] DefaultSysctls:[net.ipv4.ping_group_range=0 0] DefaultUlimits:[nproc=4194304:4194304] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:false Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HTTPProxy:true Init:false InitPath: IPCNS:private LogDriver:journald LogSizeMax:-1 LogTag: NetNS: NoHosts:false PidsLimit:2048 PidNS:private PrepareVolumeOnCreate:false RootlessNetworking:slirp4netns SeccompProfile: ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/run/libpod/events/events.log EventsLogger:journald HelperBinariesDir:[/usr/local/libexec/podman /usr/local/lib/podman /usr/libexec/podman /usr/lib/podman] HooksDir:[/usr/share/containers/oci/hooks.d] ImageBuildFormat:oci ImageDefaultTransport:docker:// ImageParallelCopies:0 ImageDefaultFormat: InfraCommand: InfraImage:k8s.gcr.io/pause:3.5 InitPath:/usr/libexec/podman/catatonit LockType:shm MachineEnabled:false MultiImageArchive:false Namespace: NetworkCmdPath: NetworkCmdOptions:[] NoPivotRoot:false NumLocks:2048 OCIRuntime:crun OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc] runsc:[/usr/bin/runsc /usr/sbin/runsc /usr/local/bin/runsc /usr/local/sbin/runsc /bin/runsc /sbin/runsc /run/current-system/sw/bin/runsc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc kata runsc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/var/lib/podman-storage/libpod StopTimeout:10 TmpDir:/run/libpod VolumePath:/var/lib/podman-storage/volumes VolumePlugins:map[] ChownCopiedFiles:true} Machine:{CPUs:1 DiskSize:10 Image:testing Memory:2048} Network:{CNIPluginDirs:[/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] DefaultNetwork:podman DefaultSubnet:10.88.0.0/16 NetworkConfigDir:/etc/cni/net.d/} Secrets:{Driver:file Opts:map[]}} 
TRAC[0000] Reading configuration file "/etc/containers/containers.conf" 
DEBU[0000] Merged system config "/etc/containers/containers.conf" 
TRAC[0000] &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.44.4 Annotations:[] CgroupNS:private Cgroups:enabled DefaultCapabilities:[CHOWN DAC_OVERRIDE FOWNER FSETID KILL NET_BIND_SERVICE SETFCAP SETGID SETPCAP SETUID SYS_CHROOT] DefaultSysctls:[net.ipv4.ping_group_range=0 0] DefaultUlimits:[nproc=4194304:4194304] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:false Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HTTPProxy:true Init:false InitPath: IPCNS:private LogDriver:journald LogSizeMax:-1 LogTag: NetNS: NoHosts:false PidsLimit:2048 PidNS:private PrepareVolumeOnCreate:false RootlessNetworking:slirp4netns SeccompProfile: ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/run/libpod/events/events.log EventsLogger:journald HelperBinariesDir:[/usr/local/libexec/podman /usr/local/lib/podman /usr/libexec/podman /usr/lib/podman] HooksDir:[/usr/share/containers/oci/hooks.d] ImageBuildFormat:oci ImageDefaultTransport:docker:// ImageParallelCopies:0 ImageDefaultFormat: Infra these hosts.Command: InfraImage:k8s.gcr.io/pause:3.5 InitPath:/usr/libexec/podman/catatonit LockType:shm MachineEnabled:false MultiImageArchive:false Namespace: NetworkCmdPath: NetworkCmdOptions:[] NoPivotRoot:false NumLocks:2048 OCIRuntime:crun OCIRuntimes:map[crun:[/usr/local/bin/crun-1.14.1-linux-amd64 /usr/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc] runsc:[/usr/bin/runsc /usr/sbin/runsc /usr/local/bin/runsc /usr/local/sbin/runsc /bin/runsc /sbin/runsc /run/current-system/sw/bin/runsc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc kata runsc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/var/lib/podman-storage/libpod StopTimeout:10 TmpDir:/run/libpod VolumePath:/var/lib/podman-storage/volumes VolumePlugins:map[] ChownCopiedFiles:true} Machine:{CPUs:1 DiskSize:10 Image:testing Memory:2048} Network:{CNIPluginDirs:[/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] DefaultNetwork:podman DefaultSubnet:10.88.0.0/16 NetworkConfigDir:/etc/cni/net.d/} Secrets:{Driver:file Opts:map[]}} 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /var/lib/podman-storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /var/lib/podman-storage 
DEBU[0000] Using run root /run/containers/storage       
DEBU[0000] Using static dir /var/lib/podman-storage/libpod 
DEBU[0000] Using tmp dir /run/libpod                    
DEBU[0000] Using volume path /var/lib/podman-storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] cached value indicated that overlay is supported 
DEBU[0000] cached value indicated that metacopy is not being used 
DEBU[0000] NewControl(/var/lib/podman-storage/overlay): nextProjectID = 76297524 
DEBU[0000] cached value indicated that native-diff is usable 
DEBU[0000] backingFs=xfs, projectQuotaSupported=true, useNativeDiff=true, usingMetacopy=false 
DEBU[0000] Initializing event backend journald          
TRAC[0000] found runtime ""                             
DEBU[0000] configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument 
DEBU[0000] configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument 
TRAC[0000] found runtime ""                             
DEBU[0000] Using OCI runtime "/usr/local/bin/crun-1.14.1-linux-amd64" 
INFO[0000] Found CNI network custom-podman (type=bridge) at /etc/cni/net.d/02-custom-podman.conflist 
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
DEBU[0000] Default CNI network name podman is unchangeable 
INFO[0000] Setting parallel job count to 49             
ERRO[0000] error joining network namespace for container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f: error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory 
Error: failed to Statfs "/run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e": no such file or directory
error retrieving network namespace at /run/netns/cni-61c28b4e-386e-41a4-fd05-59c58bd9af0e
github.com/containers/podman/libpod.joinNetNS
	github.com/containers/podman/libpod/networking_linux.go:774
github.com/containers/podman/libpod.replaceNetNS
	github.com/containers/podman/libpod/boltdb_state_linux.go:26
github.com/containers/podman/libpod.(*BoltState).UpdateContainer
	github.com/containers/podman/libpod/boltdb_state.go:777
github.com/containers/podman/libpod.(*Container).syncContainer
	github.com/containers/podman/libpod/container_internal.go:340
github.com/containers/podman/libpod.(*Container).Batch
	github.com/containers/podman/libpod/container_api.go:653
github.com/containers/podman/pkg/ps.ListContainerBatch
	github.com/containers/podman/pkg/ps/ps.go:129
github.com/containers/podman/pkg/ps.GetContainerLists
	github.com/containers/podman/pkg/ps/ps.go:65
github.com/containers/podman/pkg/domain/infra/abi.(*ContainerEngine).ContainerList
	github.com/containers/podman/pkg/domain/infra/abi/containers.go:883
github.com/containers/podman/cmd/podman/containers.getResponses
	github.com/containers/podman/cmd/podman/containers/ps.go:176
github.com/containers/podman/cmd/podman/containers.ps
	github.com/containers/podman/cmd/podman/containers/ps.go:200
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext
	github.com/spf13/cobra/command.go:895
main.Execute
	github.com/containers/podman/cmd/podman/root.go:91
main.main
	github.com/containers/podman/cmd/podman/main.go:39
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571
error joining network namespace of container 63552f5ee0df008f197c5c005f002d6938a6a616e0f7fa417b900f601562232f
github.com/containers/podman/libpod.replaceNetNS
	github.com/containers/podman/libpod/boltdb_state_linux.go:31
github.com/containers/podman/libpod.(*BoltState).UpdateContainer
	github.com/containers/podman/libpod/boltdb_state.go:777
github.com/containers/podman/libpod.(*Container).syncContainer
	github.com/containers/podman/libpod/container_internal.go:340
github.com/containers/podman/libpod.(*Container).Batch
	github.com/containers/podman/libpod/container_api.go:653
github.com/containers/podman/pkg/ps.ListContainerBatch
	github.com/containers/podman/pkg/ps/ps.go:129
github.com/containers/podman/pkg/ps.GetContainerLists
	github.com/containers/podman/pkg/ps/ps.go:65
github.com/containers/podman/pkg/domain/infra/abi.(*ContainerEngine).ContainerList
	github.com/containers/podman/pkg/domain/infra/abi/containers.go:883
github.com/containers/podman/cmd/podman/containers.getResponses
	github.com/containers/podman/cmd/podman/containers/ps.go:176
github.com/containers/podman/cmd/podman/containers.ps
	github.com/containers/podman/cmd/podman/containers/ps.go:200
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext
	github.com/spf13/cobra/command.go:895
main.Execute
	github.com/containers/podman/cmd/podman/root.go:91
main.main
	github.com/containers/podman/cmd/podman/main.go:39
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571

@giuseppe
Copy link
Member

Without Delegate=yes systemd fails with status=219/CGROUP when trying to launch the PostStop command, I presume because the cgroup it's trying to launch the process into has been modified in some way by podman that systemd doesn't like. I don't fully understand what podman does to the cgroup to make systemd unhappy with it.

it creates two sub-cgroups in the current cgroup.

Delegate is necessary because it tells systemd that the service can modify cgroups, as it is the case with Podman when using --cgroups=split

@Luap99
Copy link
Member

Luap99 commented Apr 3, 2024

The code in question no longer exists and 3.4.4 is way to old for us to support so closing

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 3, 2024
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 3, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

5 participants