-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config-linux: MAY reject an unfit cgroup #1125
Conversation
config-linux.md
Outdated
Runtimes MAY reject a cgroup which it deems unfit; in particular, a frozen | ||
cgroup or, for a new container, a non-empty cgroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reject a cgroup" needs to be worded more explicitly because it's a bit ambigous what it means. I guess you'd want to say something more like:
Runtimes MAY refuse to create or start a container which has been configured to use a
cgroupsPath
which it considers would not be fit for purpose. Examples include a frozen cgroup or (for a new container) a non-empty cgroup. The reason for this is that such cgroup settings could result in container operations on one container inadvertently affecting other containers in a way that users may not anticipate or understand.
There is a slight concern that technically this permission would allow you to create a runtime-spec-compliant runtime which simply refuses to start any containers but I guess that's a purely theoretical concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which has been configured to use a cgroupsPath
This is not entirely true. Even when cgroupsPath
is not set, some default cgroup path is used, and that path is also checked against being empty and not frozen.
So, maybe this belongs to a different part of the spec. Let me take a look...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, PTAL
ef2aaeb
to
8527607
Compare
config-linux.md
Outdated
@@ -171,6 +171,14 @@ Also known as cgroups, they are used to restrict resource usage for a container | |||
cgroups provide controls (through controllers) to restrict cpu, memory, IO, pids, network and RDMA resources for the container. | |||
For more information, see the [kernel cgroups documentation][cgroup-v1]. | |||
|
|||
A runtime MAY refuse to create or start a new container, or a process inside an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be explicit that "refuse" here means "MUST return an error" and that the runtime can't choose to just silently not do anything (and still be compliant)?
(I'm not sure whether we already have similar language elsewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I could've sworn I used the phrase "return an error" in my suggestion but I guess not. 😅
Yeah we usually use "return an error" explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the phrase used elsewhere is "MUST generate an error".
Changed accordingly, PTAL @tianon @cyphar @AkihiroSuda
8527607
to
e97c62b
Compare
It makes sense for runtime to reject a cgroup which is frozen (for both new and existing container), otherwise the runtime command (create/run/exec) may end up being stuck. It makes sense for runtime to make sure the cgroup for a new container is empty (i.e. there are no processes it in), and reject it otherwise. The scenario in which a non-empty cgroup is used for a new container has multiple problems, for example: * If two or more containers share the same cgroup, and each container has its own limits configured, the order of container starts ultimately determines whose limits will be effectively applied. * If two or more containers share the same cgroup, and one of containers is paused/unpaused, all others are paused, too. * If cgroup.kill is used to forcefully kill the container, it will also kill other processes that are not part of this container but merely belong to the same cgroup. * When a systemd cgroup manager is used, this becomes even worse. Such as, stop (or even failed start) of any container results in stopTransientUnit command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. * Many other bad scenarios are possible, as the implicit assumption of 1:1 container:cgroup mapping is broken. opencontainers/runc#3132 containers/crun#716 Signed-off-by: Kir Kolyshkin <[email protected]>
e97c62b
to
104385d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cyphar PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@opencontainers/runtime-spec-maintainers PTAL 🙏🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- @cyphar are you good with this now? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think putting a MUST inside a MAY is not quite right but I think we do it in other places and I think it's fairly obvious what that construction means.
Kir Kolyshkin (1): config-linux: MAY reject an unfit cgroup LGTMs: guiseppe tianon cyphar Closes #1125
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command will just end up stuck.
It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:
If two or more containers share the same cgroup, and each container
has its own limits configured, the order of container starts
ultimately determines whose limits will be effectively applied.
If two or more containers share the same cgroup, and one of containers
is paused/unpaused, all others are paused, too.
If cgroup.kill is used to forcefully kill the container, it will also
kill other processes that are not part of this container but merely
belong to the same cgroup.
When a systemd cgroup manager is used, this becomes even worse. Such
as, stop (or even failed start) of any container results in
stopTransientUnit command being sent to systemd, and so (depending
on unit properties) other containers can receive SIGTERM, be killed
after a timeout etc.
Many other bad scenarios are possible, as the implicit assumption
of 1:1 container:cgroup mapping is broken.
opencontainers/runc#3132
containers/crun#716
Signed-off-by: Kir Kolyshkin [email protected]