-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc delete: call systemd's reset-failed #3888
Conversation
f228e7b
to
860d7fa
Compare
Here's a bigger picture. Currently, cri-o (possibly containerd, too, I haven't checked) sets
Since the second reason (runc bug) is fixed in 1.1.6, the only reason is the leftover failed units. In fact, it happens because Retrospectively, setting The solution to this race is to check the systemd unit status, but due to the above |
@opencontainers/runc-maintainers PTAL |
going to add an int test first |
6196fa1
to
6ef989c
Compare
@opencontainers/runc-maintainers PTAL |
I feel like his one needs to be backported to 1.1 |
There is no such thing as linux.resources.memorySwap (the mem+swap is set as linux.resources.memory.swap). As it is not used in this test anyway, remove it. Fixes: 4929c05 Signed-off-by: Kir Kolyshkin <[email protected]>
Sometimes we call resetFailedUnit as a cleanup measure, and we don't care if it fails or not. So, move error reporting to its callers, and ignore error in cases we don't really expect it to succeed. Signed-off-by: Kir Kolyshkin <[email protected]>
runc delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set). Call reset-failed from manager.Destroy so the failed unit will be removed during "runc delete". Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
The passing run (with the fix) looks like this: ---- delete.bats ✓ runc delete removes failed systemd unit [4556] runc spec (status=0): runc run -d --console-socket /tmp/bats-run-B08vu1/runc.lbQwU5/tty/sock test-failed-unit (status=0): Warning: The unit file, source configuration file or drop-ins of runc-cgroups-integration-test-12869.scope changed on disk. Run 'systemctl daemon-reload' to reload units. × runc-cgroups-integration-test-12869.scope - libcontainer container integration-test-12869 Loaded: loaded (/run/systemd/transient/runc-cgroups-integration-test-12869.scope; transient) Transient: yes Drop-In: /run/systemd/transient/runc-cgroups-integration-test-12869.scope.d └─50-DevicePolicy.conf, 50-DeviceAllow.conf Active: failed (Result: timeout) since Tue 2023-06-13 14:41:38 PDT; 751ms ago Duration: 2.144s CPU: 8ms Jun 13 14:41:34 kir-rhat systemd[1]: Started runc-cgroups-integration-test-12869.scope - libcontainer container integration-test-12869. Jun 13 14:41:37 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Scope reached runtime time limit. Stopping. Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Stopping timed out. Killing. Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Killing process 1107438 (sleep) with signal SIGKILL. Jun 13 14:41:38 kir-rhat systemd[1]: runc-cgroups-integration-test-12869.scope: Failed with result 'timeout'. runc delete test-failed-unit (status=0): Unit runc-cgroups-integration-test-12869.scope could not be found. ---- Before the fix, the test was failing like this: ---- delete.bats ✗ runc delete removes failed systemd unit (in test file tests/integration/delete.bats, line 194) `run -4 systemctl status "$SD_UNIT_NAME"' failed, expected exit code 4, got 3 .... ---- Signed-off-by: Kir Kolyshkin <[email protected]>
@opencontainers/runc-maintainers PTAL (want to include the backport into runc v1.1.8). This is relatively easy to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
According to the OCI runtime spec, runtime's delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set). Leaving a leftover failed unit is a violation of runtime spec; in addition, a leftover unit result in inability to start a container with the same systemd unit name (such operation will fail with "unit already exists" error). Call reset-failed from systemd's cgroup manager destroy_cgroup call, so the failed unit will be removed (by systemd) after "crun delete". This change is similar to the one in runc (see [1]). A (slightly modified) test case from runc added by the above change was used to check that the bug is fixed. For bigger picture, see [2] (issue A) and [3]. To test manually, systemd >= 244 is needed. Create a container config that runs "sleep 10" and has the following systemd annotations: org.systemd.property.RuntimeMaxUSec: "uint64 2000000" org.systemd.property.TimeoutStopUSec: "uint64 1000000" Start a container using --systemd-cgroup option. The container will be killed by systemd in 2 seconds, thus its systemd unit status will be "failed". Once it has failed, the "systemctl status $UNIT_NAME" should have exit code of 3 (meaning "unit is not active"). Now, run "crun delete $CTID" and repeat "systemctl status $UNIT_NAME". It should result in exit code of 4 (meaning "no such unit"). [1] opencontainers/runc#3888 [2] opencontainers/runc#3780 [3] cri-o/cri-o#7035 Signed-off-by: Kir Kolyshkin <[email protected]>
According to the OCI runtime spec [1], runtime's delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set). Leaving a leftover failed unit is a violation of runtime spec; in addition, a leftover unit result in inability to start a container with the same systemd unit name (such operation will fail with "unit already exists" error). Call reset-failed from systemd's cgroup manager destroy_cgroup call, so the failed unit will be removed (by systemd) after "crun delete". This change is similar to the one in runc (see [2]). A (slightly modified) test case from runc added by the above change was used to check that the bug is fixed. For bigger picture, see [3] (issue A) and [4]. To test manually, systemd >= 244 is needed. Create a container config that runs "sleep 10" and has the following systemd annotations: org.systemd.property.RuntimeMaxUSec: "uint64 2000000" org.systemd.property.TimeoutStopUSec: "uint64 1000000" Start a container using --systemd-cgroup option. The container will be killed by systemd in 2 seconds, thus its systemd unit status will be "failed". Once it has failed, the "systemctl status $UNIT_NAME" should have exit code of 3 (meaning "unit is not active"). Now, run "crun delete $CTID" and repeat "systemctl status $UNIT_NAME". It should result in exit code of 4 (meaning "no such unit"). [1] https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete [2] opencontainers/runc#3888 [3] opencontainers/runc#3780 [4] cri-o/cri-o#7035 Signed-off-by: Kir Kolyshkin <[email protected]>
runc delete is supposed to remove all the container's artefacts. In case systemd cgroup driver is used, and the systemd unit has failed (e.g. oom-killed), systemd won't remove the unit (that is, unless the "CollectMode: inactive-or-failed" property is set).
Call reset-failed from manager.Destroy so the failed unit will be removed during "runc delete".
This fixes
Issue A
from #3780 (which, in its original form, can only be reproduced with RHEL/CentOS 9 systemd version < 252.14, i.e. before they've added redhat-plumbers/systemd-rhel9#149). A test case that works with any recent systemd version is also added.