Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod level cgroup resource limits are not assigned to containers by default in Quadlet #23664

Open
ruihe774 opened this issue Aug 19, 2024 · 11 comments · Fixed by #23675
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. quadlet stale-issue

Comments

@ruihe774
Copy link
Contributor

Issue Description

Containers created by Quadlet implicitly use --cgroups=split, which ignores pod level cgroup resource limits if they join a pod.

Steps to reproduce the issue

Create a Quadlet .pod file with cgroup resource limits:

# parent.pod
[Unit]
Description=Parent Pod

[Pod]
PodmanArgs=--cpus=2

Create a Quadlet .container file which joins the pod:

# child.container
[Unit]
Description=Child Container

[Container]
Image=docker.io/busybox
Exec=sleep infinity
Pod=parent.pod

Start the container (or pod):

$ systemctl start --user child.service

Describe the results you received

The container and the pod belong to different cgroups:

$ systemd-cgls
...
└─child.service
  ├─libpod-payload-CONTAINER_ID
  │ └─ sleep infinity
  └─runtime
    └─ /usr/bin/conmon
...
└─user-libpod_pod_POD_ID.slice
  └─libpod-INFRA_ID.scope
    └─container
      └─ /catatonit -P
...

The cpu controller is set for the pod:

$ cat /sys/fs/cgroup/path/to/user-libpod_pod_POD_ID.slice/cpu.max
200000 100000

But is not set for the container:

$ cat /sys/fs/cgroup/path/to/libpod-payload-CONTAINER_ID/cpu.max
max 100000

Describe the results you expected

In CLI, with --cgroups=enabled (which is the default) or --cgroups=no-conmon, the cgroup of the container is inside the cgroup of the pod, so that the resource limits can take effect. IMO this is the expected behavior.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 95.71
    systemPercent: 0.82
    userPercent: 3.47
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: silverblue
    version: "40"
  eventLogger: journald
  freeLocks: 2037
  hostname: msk-silverblue
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.10.4-200.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 79935655936
  memTotal: 101100756992
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.1-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.1
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240814.g61c0b0d-1.fc40.x86_64
    version: |
      pasta 0^20240814.g61c0b0d-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 101100548096
  swapTotal: 101100548096
  uptime: 1h 34m 47.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /var/home/msk/.config/containers/storage.conf
  containerStore:
    number: 6
    paused: 0
    running: 6
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/msk/.local/share/containers/storage
  graphRootAllocated: 499857358848
  graphRootUsed: 344491053056
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 16
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /var/home/msk/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.0
  Built: 1722556800
  BuiltTime: Fri Aug  2 08:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

None

Additional information

None

@ruihe774 ruihe774 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 19, 2024
@Luap99 Luap99 added the quadlet label Aug 19, 2024
@rhatdan
Copy link
Member

rhatdan commented Aug 19, 2024

@giuseppe PTAL

@giuseppe
Copy link
Member

--cgroups=split cannot join the cgroup of the pod, since it is requesting to use the current one. We either disallow that combination, or warn users that the pod limits won't be honored

@ruihe774
Copy link
Contributor Author

ruihe774 commented Aug 19, 2024

The problem is that Quadlet implicitly set --cgroups=split and does not mention this behavior anywhere. This is somewhat confusing.

Another problem is that Quadlet does not have a specific key (CGroups=) to let users set cgroups behavior. Users have to set PodmanArgs=--cgroups=no-conmon to make limits work, which is indirect and does not look neat.

@rhatdan
Copy link
Member

rhatdan commented Aug 19, 2024

Care to open a PR to fix documentation or allow for CGroups= flag?

Not sure if we want to allow anything but --cgroups=split, though. I would leave this up to @giuseppe @mheon @alexlarsson @ygalblum

@mheon
Copy link
Member

mheon commented Aug 19, 2024

I wonder if we can get pod-level limits working with --cgroups=split given we own the pod cgroup and it seems reasonable that Conmon would be included in the overall pod limits...

@mheon
Copy link
Member

mheon commented Aug 19, 2024

Ah, Giuseppe already indicated no. Hm. Maybe containers in a pod should default to non-split even for Quadlet?

@giuseppe
Copy link
Member

I think Cgroups= makes sense. split is the correct default IMO, but I don't see reasons to block other valid configurations

@mheon
Copy link
Member

mheon commented Aug 19, 2024

We can probably pop up a warning in the logs if Cgroups=split is used on a pod with resource limits set, give people some breadcrumbs to figure out what is going wrong.

@ruihe774
Copy link
Contributor Author

Giving #23680 is closed, @mheon please help me reopen this issue. (I cannot reopen it at my side.)

@mheon mheon reopened this Aug 27, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@Shadow8472
Copy link

Shadow8472 commented Oct 22, 2024

I suspect my issue is related #24130. I've been tracking down a Rootless NFS volume permissions issue on the stable branch (v4.9.4 as of writing), and the key difference between Quadlet's non-functional unit file vs podman generate systemd's working one appears to be Quadlet hard-coding cgroups=split.

EDIT: Further testing points my permissions issues elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. quadlet stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants