-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroupv2: fix fs2 driver initialization #2299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--- FAIL: TestUsernsCheckpoint (0.26s)
checkpoint_test.go:130: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestCheckpoint
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestCheckpoint (0.03s)
checkpoint_test.go:130: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestExecPS
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestExecPS (0.03s)
exec_test.go:52: |: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestUsernsExecPS
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestUsernsExecPS (0.03s)
exec_test.go:52: |: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestIPCPrivate
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCPrivate (0.03s)
utils_test.go:55: exec_test.go:82: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestIPCHost
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCHost (0.02s)
utils_test.go:55: exec_test.go:108: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestIPCJoinPath
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestIPCJoinPath (0.03s)
utils_test.go:55: exec_test.go:135: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestIPCBadPath
--- PASS: TestIPCBadPath (0.03s)
=== RUN TestRlimit
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestRlimit (0.03s)
utils_test.go:55: exec_test.go:200: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestUsernsRlimit
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestUsernsRlimit (0.03s)
utils_test.go:55: exec_test.go:200: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestEnter
--- SKIP: TestEnter (0.00s)
exec_test.go:211: cgroup v2 is not supported
=== RUN TestProcessEnv
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessEnv (0.03s)
utils_test.go:55: exec_test.go:326: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestProcessEmptyCaps
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessEmptyCaps (0.03s)
utils_test.go:55: exec_test.go:370: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestProcessCaps
time="2020-04-07T21:54:51Z" level=warning msg="signal: killed"
--- FAIL: TestProcessCaps (0.03s)
utils_test.go:55: exec_test.go:423: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
=== RUN TestAdditionalGroups
time="2020-04-07T21:54:52Z" level=warning msg="signal: killed"
--- FAIL: TestAdditionalGroups (0.03s)
utils_test.go:55: exec_test.go:487: unexpected error: container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy\""
|
the errors are happening on Travis |
The problem is we're enabling all the controllers no matter what limits are set. Two ways to fix this are:
I like 2 much better. Let's see if this can be done (and it will fix the above tests since looks like they do not set any limits) |
Added a commit to enable controllers only when required and on a best-effort basis. cgroup-v2 is passing now. The tests I add/enable in #2295 are passing, too, on my test box. |
@crosbymichael @mrunalp PTAL |
@cyphar PTAL (at least for those two patches that add/remove securejoin usage) |
Rebased to resolve conflict after #2288 is merged |
Let me rework the last patch (enable controllers on demand) -- while it works fine, I don't like that Update: done |
@AkihiroSuda please let me know if you want me to split into two PRs, as this one might be too big to review. I tried hard to split it into small logical commits but there are too many. |
fixed a bug in last commit |
func setCpu(dirPath string, cgroup *configs.Cgroup) error { | ||
if !isCpuSet(cgroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this validation is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added it is peculiar -- if someone will be modifying setCpu()
, adding more fields to set, they will be reminded to also modify isCpuSet()
. Having this check here might also help to uncover bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. someone adds cpuFoobar
setting to setCpu(), but forgets to modify isCpuSet()
. They add a unit test to check if cpuFoobar
is applied, and the test fails! Hope it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be just code comment lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be just code comment lines?
What do you mean? Add a comment like "if you're modifying this, don't forget to modify isSetCpu"? In my experience, this never works. The added runtime check might actually help to uncover such a bug.
Or do you mean adding a comment stating the reason why this check is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do you mean adding a comment stating the reason why this check is added?
I have amended the commit telling why is*Set
is added to set*
:
-
Amend all setFoo() functions to call isFooSet(). While this might
seem unnecessary, it might actually help to uncover a bug.
Imagine someone:- adds a cgroup.Resources.CpuFoo setting;
- modifies setCpu() to apply the new setting;
- but forgets to amend isCpuSet() accordingly <-- BUG
In this case, a test case modifying CpuFoo will help
to uncover the BUG. This is the reason why it's added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the former, but not a blocker
// systemd represents slice hierarchy using `-`, so we need to follow suit when | ||
// generating the path of slice. Essentially, test-a-b.slice becomes | ||
// /test.slice/test-a.slice/test-a-b.slice. | ||
func ExpandSlice(slice string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new in this PR, but can we have UT for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do (maybe in a separate PR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can be a separate PR
4edf797
to
95e0b4e
Compare
Looks good, but I'd like to merge the integration tests (#2295) before this. |
Those needs to be run on the (Vagrant Fedora 31) host (since we need real systemd running), and so we have to have all the tools needed to compile runc and run the tests. The good news is Fedora packages a decent and recent release of bats-core (1.1.0), which we can use (Debian does not), and we can also use golang (currently 1.13.9) from Fedora. The bad news are 1. Currently cgroups tests are only working with RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305) 2. Some tests that are not yet working: - ps.bats with RUNC_USE_SYSTEMD=yes (investigating) - events.bats (need cgroupv2 memory.events support) [v2: add -t to ssh to enforce pty] Signed-off-by: Kir Kolyshkin <[email protected]>
Those needs to be run on the (Vagrant Fedora 31) host (since we need real systemd running), and so we have to have all the tools needed to compile runc and run the tests. The good news is Fedora packages a decent and recent release of bats-core (1.1.0), which we can use (Debian does not), and we can also use golang (currently 1.13.9) from Fedora. The bad news are 1. Currently cgroups tests are only working with RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305) 2. Tests in events.bats do not work (need cgroupv2 memory.events support) [v2: add -t to ssh to enforce pty] [v3: disable events tests for cgroupv2] [v4: update container-selinux to fix ps tests] Signed-off-by: Kir Kolyshkin <[email protected]>
95e0b4e
to
dcdc45b
Compare
Those needs to be run on the (Vagrant Fedora 31) host (since we need real systemd running), and so we have to have all the tools needed to compile runc and run the tests. The good news is Fedora packages a decent and recent release of bats-core (1.1.0), which we can use (Debian does not), and we can also use golang (currently 1.13.9) from Fedora. The bad news are 1. Currently cgroups tests are only working with RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305) 2. Tests in events.bats do not work (need cgroupv2 memory.events support) 3. Fedora 31 image is 6 months old (and has broken container-selinux policy) so we need `dnf update`. [v2: add -t to ssh to enforce pty] [v3: disable events tests for cgroupv2] Signed-off-by: Kir Kolyshkin <[email protected]>
Disabled checkpoint tests (runc currently does not support cgroup v2 c/r). All others should pass |
Something ugly with CI -- logs for failed jobs are empty. Restarting |
a040eda
to
4fe6a28
Compare
This one is ready for merge. |
Having map of per-subsystem paths in systemd unified cgroups driver does not make sense and makes the code less readable. To get rid of it, move the systemd v1-or-v2 init code to libcontainer/factory_linux.go which already has a function to deduce unified path out of paths map. End result is much cleaner code. Besides, we no longer write pid to the same cgroup file 7 times in Apply() like we did before. While at it - add `rootless` flag which is passed on to fs2 manager - merge getv2Path() into GetUnifiedPath(), don't overwrite path if it is set during initialization (on Load). Signed-off-by: Kir Kolyshkin <[email protected]>
It seems that some paths are coming from user and are therefore untrusted. Signed-off-by: Kir Kolyshkin <[email protected]>
... and its Cgroup field. There is no sense to keep it public. This was generated by gorename. Signed-off-by: Kir Kolyshkin <[email protected]>
The fs2 cgroup driver has a sanity check for path. Since systemd driver is relying on the same path, it makes sense to move this check to the common code. Signed-off-by: Kir Kolyshkin <[email protected]>
fs2 cgroup driver was not working because it did not enable controllers while creating cgroup directory; instead it was merely doing MkdirAll() and gathered the list of available controllers in NewManager(). Also, cgroup should be created in Apply(), not while creating a new manager instance. To fix: 1. Move the createCgroupsv2Path function from systemd driver to fs2 driver, renaming it to CreateCgroupPath. Use in Apply() from both fs2 and systemd drivers. 2. Delay available controllers map initialization to until it is needed. With this patch: - NewManager() only performs minimal initialization (initializin m.dirPath, if not provided); - Apply() properly creates cgroup path, enabling the controllers; - m.controllers is initialized lazily on demand. Signed-off-by: Kir Kolyshkin <[email protected]>
This check was removed in commit 5406833. Now, when this function is called from a few places, it is no longer obvious that the path always starts with /sys/fs/cgroup/, so reinstate the check just to be on the safe side. This check also ensures that elements[3:] can be used safely. Signed-off-by: Kir Kolyshkin <[email protected]>
This slightly improves code readability. Signed-off-by: Kir Kolyshkin <[email protected]>
1. Rename the files - v1.go: cgroupv1 aka legacy; - v2.go: cgroupv2 aka unified hierarchy; - unsupported.go: when systemd is not available. 2. Move the code that is common between v1 and v2 to common.go Signed-off-by: Kir Kolyshkin <[email protected]>
1. Instead of enabling all available controllers, figure out which ones are required, and only enable those. 2. Amend all setFoo() functions to call isFooSet(). While this might seem unnecessary, it might actually help to uncover a bug. Imagine someone: - adds a cgroup.Resources.CpuFoo setting; - modifies setCpu() to apply the new setting; - but forgets to amend isCpuSet() accordingly <-- BUG In this case, a test case modifying CpuFoo will help to uncover the BUG. This is the reason why it's added. This patch *could be* amended by enabling controllers on a best-effort basis, i.e. : - do not return an error early if we can't enable some controllers; - if we fail to enable all controllers at once (usually because one of them can't be enabled), try enabling them one by one. Currently this is not implemented, and it's not clear whether this would be a good way to go or not. [v2: add/use is${Controller}Set() functions] [v3: document neededControllers()] [v4: drop "best-effort" part] Signed-off-by: Kir Kolyshkin <[email protected]>
Run in the same environment as systemd tests. Disable CRIU tests because: - they all fail with cgroup v2; - CRIU v3.14 is required and it's not yet released, and rebuilding it from sources with patches applied (like it is currently done in Dockerfile) is too much work. Signed-off-by: Kir Kolyshkin <[email protected]>
1. There is no need to try removing it recursively. 2. Do not treat ENOENT as an error (similar to fs and systemd v1 drivers). Signed-off-by: Kir Kolyshkin <[email protected]>
4fe6a28
to
ab276b1
Compare
This commit caused regression #2339 |
The main goal of this patchset is to fix fs2 driver initialization. With these patches, the tests that I added in #2295 are now passing with fs2 driver (i.e. when
RUNC_USE_SYSTEMD
is not set), except for one case (#2298).I also did some major cleanups, mostly for systemd driver. A high-level overview:
Please see individual commits for details.
@AkihiroSuda PTAL (this was discussed before in #2281 (comment))