-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/cg/sd/v2: rely on systemd to set device access rules #3847
base: main
Are you sure you want to change the base?
Conversation
d1b211e
to
27ecf36
Compare
27ecf36
to
021cc72
Compare
021cc72
to
1a7c643
Compare
This comment was marked as resolved.
This comment was marked as resolved.
As for the systemd, cgroup v2 is the default since systemd v243 (and most distros reverted that). Meaning, if we're running on cgroup v2 + systemd, we can assume systemd is at least at v243. Also, here's the list of distros having cgroup v2 by default, and their respective systemd versions.
|
Based on the info from the previous comment, it does not make sense to check for minimal systemd version in systemd+cgroup v2 driver, as it will be > v240 in any case. |
@opencontainers/runc-maintainers @cyphar PTAL |
7ffb6e3
to
31ff5a6
Compare
@cyphar what do you think about this? In my view, this fixes the mess of duplicate bpf programs. The alternative is to switch to using |
31ff5a6
to
c335671
Compare
Rebased, ptal @cyphar @AkihiroSuda |
Worth noting that Ubuntu 20.04 is still quite popular, although I don't know cgroups v2 on that one (not near a computer to check right now) |
It seems that the code added by commit b810da1 had cgroup v1 in mind, where runc overwrites the rules set by systemd. It is all different in v2, because both ebpf programs (systemd's and runc's) have to say "allow" for the device to get access. So, when using cgroup v2 with systemd cgroup driver, access to devices rules for that can't be translated to systemd properties is not possible at all, and it makes sense to error out (rather than warn) in such case, as the container won't work as intended. With this change in mind, provided that runc correctly translates all the device access rule, and systemd correctly applies those, we no longer have to create and apply a second eBPF program with own rules. Let's stop doing that, instead relying on systemd only. Having two sets of rules (two ebpf programs) for cgroupv2/ebpf is problematic for two reasons: 1. Both sets should say "ok" for access to be granted (as explained by the previous commit). 2. After systemd daemon-reload (which happens during routine systemd upgrade) the program runc adds is removed, so it's a time-bomb. Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
c335671
to
4b358f4
Compare
err = fmt.Errorf("systemd older than v240 does not support wildcard-minor rules for devices not listed in /proc/devices: +%v", *rule) | ||
if cgroupVer == 2 { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely we don't want to always return an error in this case, even if systemd supports it? I guess it's more consistent but now we're going to start returning errors...
I'm going to move the milestone to 1.3.0. I'm not sure this is a critically needed fix for 1.2.0, and the fact (AFAICS) we could return errors for configurations that were previously allowed we probably should let the code stew in a 1.3.0-rc1 first. |
It seems that the code added by commit b810da1 had cgroup v1
in mind, where runc overwrites the rules set by systemd. It is all
different in v2, because both ebpf programs (systemd's and runc's) have
to say "allow" for the device to get access.
So, when using cgroup v2 with systemd cgroup driver, access to devices
rules for that can't be translated to systemd properties is not possible
at all, and it makes sense to error out (rather than warn) in such case,
as the container won't work as intended.
With this change in mind, provided that runc correctly translates all
the device access rule, and systemd correctly applies those, we no
longer have to create and apply a second eBPF program with own rules.
Let's stop doing that, instead relying on systemd only.
Having two sets of rules (two ebpf programs) for cgroupv2/ebpf is
problematic for two reasons:
Both sets should say "ok" for access to be granted.
After
systemd daemon-reload
(which happens during routine systemdupgrade) the program runc adds is removed, so it's a time-bomb.
Note 1: by the way, this difference in cgroup v1 vs v2 behavior explains failures seen in #3620, #3708, #3671.
Note 2: since this may be a breaking change (container won't run vs device won't be accessible), let's not backport this one, but make it part of runc 1.2.