Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple containers in the same cgroup #3132

Closed
kolyshkin opened this issue Aug 5, 2021 · 2 comments · Fixed by #3223
Closed

Multiple containers in the same cgroup #3132

kolyshkin opened this issue Aug 5, 2021 · 2 comments · Fixed by #3223

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Aug 5, 2021

Currently runc allows multiple containers to share the same cgroup. While such shared configuration might be OK, there are some issues.

  1. When each container has its own resource limits, the order of start determines which limits will be effectively applied.
  2. When one of containers is paused, all others are paused, too.
  3. When a container is paused, any attempt to do runc create/run/exec end up with runc init stuck inside a frozen cgroup.
  4. When a systemd cgroup manager is used, this is not working at all -- such as, stop (or even failed start) of any container results in stop unit being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc.

All this may lead to various hard-to-debug situations in production (runc init stuck, cgroup removal error, wrong resource limits etc).

I originally found issue 3 from the list above and tried to solve it in #3131, and this is how I found issue 4.

What can be done to avoid these bad situations?

  1. Require that the cgroup (systemd unit) of a container being created is not present.
  2. Require that the cgroup (systemd unit) of a container being created is either not present or empty (no processes).
  3. Require that the cgroup is not frozen before running runc init (this is what runc run/create: refuse non-empty cgroup; runc exec: refuse frozen cgroup #3131 does).

Surely, these measures might break some existing usage scenarios. This is why:

  • runc 1.1 will warn about non-empty cgroup for a new container (runc run/create);
  • runc 1.2 will make this warning an error.
@kolyshkin

This comment has been minimized.

@kolyshkin
Copy link
Contributor Author

It seems that crun is equally affected (filed containers/crun#716).

kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 27, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command will just end up stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 28, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 28, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 30, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 30, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant