Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config-linux: MAY reject an unfit cgroup #1125

Merged
merged 1 commit into from
Dec 14, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Sep 27, 2021

It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command will just end up stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

  • If two or more containers share the same cgroup, and each container
    has its own limits configured, the order of container starts
    ultimately determines whose limits will be effectively applied.

  • If two or more containers share the same cgroup, and one of containers
    is paused/unpaused, all others are paused, too.

  • If cgroup.kill is used to forcefully kill the container, it will also
    kill other processes that are not part of this container but merely
    belong to the same cgroup.

  • When a systemd cgroup manager is used, this becomes even worse. Such
    as, stop (or even failed start) of any container results in
    stopTransientUnit command being sent to systemd, and so (depending
    on unit properties) other containers can receive SIGTERM, be killed
    after a timeout etc.

  • Many other bad scenarios are possible, as the implicit assumption
    of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin [email protected]

config-linux.md Outdated
Comment on lines 188 to 189
Runtimes MAY reject a cgroup which it deems unfit; in particular, a frozen
cgroup or, for a new container, a non-empty cgroup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reject a cgroup" needs to be worded more explicitly because it's a bit ambigous what it means. I guess you'd want to say something more like:

Runtimes MAY refuse to create or start a container which has been configured to use a cgroupsPath which it considers would not be fit for purpose. Examples include a frozen cgroup or (for a new container) a non-empty cgroup. The reason for this is that such cgroup settings could result in container operations on one container inadvertently affecting other containers in a way that users may not anticipate or understand.

There is a slight concern that technically this permission would allow you to create a runtime-spec-compliant runtime which simply refuses to start any containers but I guess that's a purely theoretical concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which has been configured to use a cgroupsPath

This is not entirely true. Even when cgroupsPath is not set, some default cgroup path is used, and that path is also checked against being empty and not frozen.

So, maybe this belongs to a different part of the spec. Let me take a look...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, PTAL

@kolyshkin kolyshkin force-pushed the bad-cgroups branch 2 times, most recently from ef2aaeb to 8527607 Compare September 28, 2021 19:17
config-linux.md Outdated
@@ -171,6 +171,14 @@ Also known as cgroups, they are used to restrict resource usage for a container
cgroups provide controls (through controllers) to restrict cpu, memory, IO, pids, network and RDMA resources for the container.
For more information, see the [kernel cgroups documentation][cgroup-v1].

A runtime MAY refuse to create or start a new container, or a process inside an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be explicit that "refuse" here means "MUST return an error" and that the runtime can't choose to just silently not do anything (and still be compliant)?

(I'm not sure whether we already have similar language elsewhere.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I could've sworn I used the phrase "return an error" in my suggestion but I guess not. 😅

Yeah we usually use "return an error" explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the phrase used elsewhere is "MUST generate an error".

Changed accordingly, PTAL @tianon @cyphar @AkihiroSuda

It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ♥️

@kolyshkin
Copy link
Contributor Author

@cyphar PTAL

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor Author

@opencontainers/runtime-spec-maintainers PTAL 🙏🏻

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- @cyphar are you good with this now? 🙏

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think putting a MUST inside a MAY is not quite right but I think we do it in other places and I think it's fairly obvious what that construction means.

cyphar added a commit that referenced this pull request Dec 14, 2021
Kir Kolyshkin (1):
  config-linux: MAY reject an unfit cgroup

LGTMs: guiseppe tianon cyphar
Closes #1125
@cyphar cyphar closed this in 8958f93 Dec 14, 2021
@cyphar cyphar merged commit 8958f93 into opencontainers:main Dec 14, 2021
@AkihiroSuda AkihiroSuda mentioned this pull request Jan 24, 2023
@AkihiroSuda AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023
@AkihiroSuda AkihiroSuda mentioned this pull request Jun 26, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants