libct/cg/sd/v2: rely on systemd to set device access rules #3847

kolyshkin · 2023-04-26T21:20:36Z

It seems that the code added by commit b810da1 had cgroup v1
in mind, where runc overwrites the rules set by systemd. It is all
different in v2, because both ebpf programs (systemd's and runc's) have
to say "allow" for the device to get access.

So, when using cgroup v2 with systemd cgroup driver, access to devices
rules for that can't be translated to systemd properties is not possible
at all, and it makes sense to error out (rather than warn) in such case,
as the container won't work as intended.

With this change in mind, provided that runc correctly translates all
the device access rule, and systemd correctly applies those, we no
longer have to create and apply a second eBPF program with own rules.
Let's stop doing that, instead relying on systemd only.

Having two sets of rules (two ebpf programs) for cgroupv2/ebpf is
problematic for two reasons:

Both sets should say "ok" for access to be granted.
After systemd daemon-reload (which happens during routine systemd
upgrade) the program runc adds is removed, so it's a time-bomb.

Note 1: by the way, this difference in cgroup v1 vs v2 behavior explains failures seen in #3620, #3708, #3671.

Note 2: since this may be a breaking change (container won't run vs device won't be accessible), let's not backport this one, but make it part of runc 1.2.

kolyshkin · 2023-05-03T21:18:33Z

As for the systemd, cgroup v2 is the default since systemd v243 (and most distros reverted that). Meaning, if we're running on cgroup v2 + systemd, we can assume systemd is at least at v243.

Also, here's the list of distros having cgroup v2 by default, and their respective systemd versions.

Distro	oldest systemd version	source of info
Fedora (since 31)	243	http://mirror.math.princeton.edu/pub/fedora-archive/fedora/linux/releases/31/Everything/source/tree/Packages/s/
Arch Linux (since April 2021)	248	https://wiki.archlinux.org/index.php?title=Cgroups&diff=next&oldid=653999
openSUSE Tumbleweed (since c. 2021)	?
Debian GNU/Linux (since 11)	247	https://packages.debian.org/bullseye/systemd
Ubuntu (since 21.10)	248	https://launchpad.net/ubuntu/+source/systemd/248.3-1ubuntu4
RHEL and RHEL-like distributions (since 9)	249	https://git.centos.org/rpms/systemd/history/SPECS/systemd.spec?identifier=c9-beta

kolyshkin · 2023-05-03T22:40:02Z

Based on the info from the previous comment, it does not make sense to check for minimal systemd version in systemd+cgroup v2 driver, as it will be > v240 in any case.

kolyshkin · 2023-05-10T22:53:14Z

@opencontainers/runc-maintainers @cyphar PTAL

kolyshkin · 2023-08-03T06:00:21Z

@cyphar what do you think about this? In my view, this fixes the mess of duplicate bpf programs.

The alternative is to switch to using BPFPprogram= property, but it requires systemd >= v249 which is somewhat new, meaning we'll have to keep all this device rule conversion code, too, for quite some time. Meaning, it's needed anyway.

libcontainer/cgroups/devices/systemd.go

kolyshkin · 2024-07-03T16:13:15Z

Rebased, ptal @cyphar @AkihiroSuda

thaJeztah · 2024-07-03T17:54:53Z

Ubuntu (since 21.10)

Worth noting that Ubuntu 20.04 is still quite popular, although I don't know cgroups v2 on that one (not near a computer to check right now)

It seems that the code added by commit b810da1 had cgroup v1 in mind, where runc overwrites the rules set by systemd. It is all different in v2, because both ebpf programs (systemd's and runc's) have to say "allow" for the device to get access. So, when using cgroup v2 with systemd cgroup driver, access to devices rules for that can't be translated to systemd properties is not possible at all, and it makes sense to error out (rather than warn) in such case, as the container won't work as intended. With this change in mind, provided that runc correctly translates all the device access rule, and systemd correctly applies those, we no longer have to create and apply a second eBPF program with own rules. Let's stop doing that, instead relying on systemd only. Having two sets of rules (two ebpf programs) for cgroupv2/ebpf is problematic for two reasons: 1. Both sets should say "ok" for access to be granted (as explained by the previous commit). 2. After systemd daemon-reload (which happens during routine systemd upgrade) the program runc adds is removed, so it's a time-bomb. Signed-off-by: Kir Kolyshkin <[email protected]>

Signed-off-by: Kir Kolyshkin <[email protected]>

cyphar · 2024-10-21T07:16:47Z

libcontainer/cgroups/devices/systemd.go

+					err = fmt.Errorf("systemd older than v240 does not support wildcard-minor rules for devices not listed in /proc/devices: +%v", *rule)
+					if cgroupVer == 2 {
+						return nil, err
+					}


Surely we don't want to always return an error in this case, even if systemd supports it? I guess it's more consistent but now we're going to start returning errors...

cyphar · 2024-10-21T07:17:46Z

I'm going to move the milestone to 1.3.0. I'm not sure this is a critically needed fix for 1.2.0, and the fact (AFAICS) we could return errors for configurations that were previously allowed we probably should let the code stew in a 1.3.0-rc1 first.

kolyshkin requested review from cyphar and AkihiroSuda April 26, 2023 21:20

kolyshkin force-pushed the systemd-dev-error branch from d1b211e to 27ecf36 Compare April 26, 2023 21:32

kolyshkin added this to the 1.2.0 milestone Apr 26, 2023

kolyshkin added the area/systemd label Apr 26, 2023

kolyshkin mentioned this pull request Apr 26, 2023

Release 1.1.7 #3846

Merged

kolyshkin force-pushed the systemd-dev-error branch from 27ecf36 to 021cc72 Compare May 3, 2023 01:10

kolyshkin changed the title ~~libct/cg/sd: error on untranslatable dev rules in v2~~ libct/cg/sd/v2: rely on systemd to set device access rules May 3, 2023

kolyshkin force-pushed the systemd-dev-error branch from 021cc72 to 1a7c643 Compare May 3, 2023 01:28

kolyshkin added impact/changelog area/cgroupv2 labels May 3, 2023

This comment was marked as resolved.

Sign in to view

kolyshkin marked this pull request as draft May 3, 2023 20:30

kolyshkin marked this pull request as ready for review May 3, 2023 22:40

kolyshkin mentioned this pull request May 4, 2023

docs/systemd: describe device rules #3853

Closed

kolyshkin added the area/docs label May 4, 2023

kolyshkin force-pushed the systemd-dev-error branch 2 times, most recently from 7ffb6e3 to 31ff5a6 Compare August 3, 2023 05:56

AkihiroSuda reviewed Mar 27, 2024

View reviewed changes

libcontainer/cgroups/devices/systemd.go Show resolved Hide resolved

AkihiroSuda reviewed Jul 2, 2024

View reviewed changes

libcontainer/cgroups/devices/systemd.go Outdated Show resolved Hide resolved

kolyshkin force-pushed the systemd-dev-error branch from 31ff5a6 to c335671 Compare July 3, 2024 16:12

kolyshkin added 3 commits October 21, 2024 18:13

docs/systemd: add references

7b4206c

Signed-off-by: Kir Kolyshkin <[email protected]>

docs/systemd: describe device rules

4b358f4

Signed-off-by: Kir Kolyshkin <[email protected]>

cyphar force-pushed the systemd-dev-error branch from c335671 to 4b358f4 Compare October 21, 2024 07:13

cyphar reviewed Oct 21, 2024

View reviewed changes

cyphar modified the milestones: 1.2.0, 1.3.0 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libct/cg/sd/v2: rely on systemd to set device access rules #3847

libct/cg/sd/v2: rely on systemd to set device access rules #3847

kolyshkin commented Apr 26, 2023 •

edited

Loading

This comment was marked as resolved.

kolyshkin commented May 3, 2023 •

edited

Loading

kolyshkin commented May 3, 2023 •

edited

Loading

kolyshkin commented May 10, 2023

kolyshkin commented Aug 3, 2023

kolyshkin commented Jul 3, 2024

thaJeztah commented Jul 3, 2024

cyphar Oct 21, 2024

cyphar commented Oct 21, 2024

libct/cg/sd/v2: rely on systemd to set device access rules #3847

Are you sure you want to change the base?

libct/cg/sd/v2: rely on systemd to set device access rules #3847

Conversation

kolyshkin commented Apr 26, 2023 • edited Loading

This comment was marked as resolved.

kolyshkin commented May 3, 2023 • edited Loading

kolyshkin commented May 3, 2023 • edited Loading

kolyshkin commented May 10, 2023

kolyshkin commented Aug 3, 2023

kolyshkin commented Jul 3, 2024

thaJeztah commented Jul 3, 2024

cyphar Oct 21, 2024

Choose a reason for hiding this comment

cyphar commented Oct 21, 2024

kolyshkin commented Apr 26, 2023 •

edited

Loading

kolyshkin commented May 3, 2023 •

edited

Loading

kolyshkin commented May 3, 2023 •

edited

Loading