Multiple containers in the same cgroup #3132

kolyshkin · 2021-08-05T20:36:48Z

Currently runc allows multiple containers to share the same cgroup. While such shared configuration might be OK, there are some issues.

When each container has its own resource limits, the order of start determines which limits will be effectively applied.
When one of containers is paused, all others are paused, too.
When a container is paused, any attempt to do runc create/run/exec end up with runc init stuck inside a frozen cgroup.
When a systemd cgroup manager is used, this is not working at all -- such as, stop (or even failed start) of any container results in stop unit being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc.

All this may lead to various hard-to-debug situations in production (runc init stuck, cgroup removal error, wrong resource limits etc).

I originally found issue 3 from the list above and tried to solve it in #3131, and this is how I found issue 4.

What can be done to avoid these bad situations?

Require that the cgroup (systemd unit) of a container being created is not present.
Require that the cgroup (systemd unit) of a container being created is either not present or empty (no processes).
Require that the cgroup is not frozen before running runc init (this is what runc run/create: refuse non-empty cgroup; runc exec: refuse frozen cgroup #3131 does).

Surely, these measures might break some existing usage scenarios. This is why:

runc 1.1 will warn about non-empty cgroup for a new container (runc run/create);
runc 1.2 will make this warning an error.

The text was updated successfully, but these errors were encountered:

kolyshkin · 2021-08-10T22:12:20Z

It seems that crun is equally affected (filed containers/crun#716).

It makes sense for runtime to reject a cgroup which is frozen (for both new and existing container), otherwise the runtime command will just end up stuck. It makes sense for runtime to make sure the cgroup for a new container is empty (i.e. there are no processes it in), and reject it otherwise. The scenario in which a non-empty cgroup is used for a new container has multiple problems, for example: * If two or more containers share the same cgroup, and each container has its own limits configured, the order of container starts ultimately determines whose limits will be effectively applied. * If two or more containers share the same cgroup, and one of containers is paused/unpaused, all others are paused, too. * If cgroup.kill is used to forcefully kill the container, it will also kill other processes that are not part of this container but merely belong to the same cgroup. * When a systemd cgroup manager is used, this becomes even worse. Such as, stop (or even failed start) of any container results in stopTransientUnit command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. * Many other bad scenarios are possible, as the implicit assumption of 1:1 container:cgroup mapping is broken. opencontainers/runc#3132 containers/crun#716 Signed-off-by: Kir Kolyshkin <[email protected]>

It makes sense for runtime to reject a cgroup which is frozen (for both new and existing container), otherwise the runtime command (create/run/exec) may end up being stuck. It makes sense for runtime to make sure the cgroup for a new container is empty (i.e. there are no processes it in), and reject it otherwise. The scenario in which a non-empty cgroup is used for a new container has multiple problems, for example: * If two or more containers share the same cgroup, and each container has its own limits configured, the order of container starts ultimately determines whose limits will be effectively applied. * If two or more containers share the same cgroup, and one of containers is paused/unpaused, all others are paused, too. * If cgroup.kill is used to forcefully kill the container, it will also kill other processes that are not part of this container but merely belong to the same cgroup. * When a systemd cgroup manager is used, this becomes even worse. Such as, stop (or even failed start) of any container results in stopTransientUnit command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. * Many other bad scenarios are possible, as the implicit assumption of 1:1 container:cgroup mapping is broken. opencontainers/runc#3132 containers/crun#716 Signed-off-by: Kir Kolyshkin <[email protected]>

This comment has been minimized.

Sign in to view

kolyshkin mentioned this issue Aug 10, 2021

Multiple containers in the same cgroup containers/crun#716

Open

kolyshkin mentioned this issue Aug 11, 2021

runc run/create: refuse non-empty cgroup; runc exec: refuse frozen cgroup #3131

Closed

5 tasks

This was referenced Aug 24, 2021

Limited cgroup manager usage pattern (and how to fix it) #3178

Closed

libct/cg: rm dead code to improve clarity #3136

Merged

kolyshkin mentioned this issue Sep 27, 2021

run create/run/exec: refuse bad cgroup #3223

Merged

kolyshkin mentioned this issue Sep 27, 2021

config-linux: MAY reject an unfit cgroup opencontainers/runtime-spec#1125

Merged

AkihiroSuda closed this as completed in #3223 Nov 8, 2021

kolyshkin mentioned this issue May 15, 2023

v1.1.6 regression (rootless, cgroup v2): container's cgroup is not empty: 5 process(es) found #3828

Closed

kolyshkin mentioned this issue Oct 3, 2023

RFC: treat host pidns container with no init process as running if some processes exist in cgroup #4049

Closed

kolyshkin mentioned this issue Nov 28, 2023

Proposal: just signal all processes inside the container #2037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple containers in the same cgroup #3132

Multiple containers in the same cgroup #3132

kolyshkin commented Aug 5, 2021 •

edited

Loading

This comment has been minimized.

kolyshkin commented Aug 10, 2021

Multiple containers in the same cgroup #3132

Multiple containers in the same cgroup #3132

Comments

kolyshkin commented Aug 5, 2021 • edited Loading

This comment has been minimized.

kolyshkin commented Aug 10, 2021

kolyshkin commented Aug 5, 2021 •

edited

Loading