-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify cgroups path handling behavior #834
Conversation
|
||
You can configure a container's cgroups via the `resources` field of the Linux configuration. | ||
Do not specify `resources` unless limits have to be updated. | ||
The runtime MUST create the cgroups specified by the `cgroupsPath` if they don't exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which cgroups? Only those needed to fulfil resources
, no? Or MUST they create a new devices
cgroup even if there is nothing in linux.resources.devices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They must create all for the use case of orchestrator preparing the cgroups for you and just be asking the runtime to add the container process to the cgroups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They must create all for the use case of orchestrator preparing the cgroups for you and just be asking the runtime to add the container process to the cgroups.
How should the runtime distinguish between v1 and v2 cgroups (more on this here)? I think a safer approach for orchestrators would be to tell the runtime to leave cgroups alone and set them up completely in a hook (both creating/initializing any required cgroups and joining the runtime to them). On the other hand, with #237 rejected, there's no clear way to warn the runtime that you'll be messing with cgroups in the hook, so the runtime might be surprised with:
- No
cgroupsPath
orresources
, so the runtime creates a new devices cgroup and adds the container process. - Orchestration hook (or other post-create action) creates the cgroups that the orchestrator wants and moves the container process into them.
- The runtime looks in the devices cgroup it setup in step 1 and… where did that process go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They must create all for the use case of orchestrator preparing the cgroups for you and just be asking the runtime to add the container process to the cgroups.
But should the runtime join any extra cgroups? The old language handled this case.
You can configure a container's cgroups via the `resources` field of the Linux configuration. | ||
Do not specify `resources` unless limits have to be updated. | ||
The runtime MUST create the cgroups specified by the `cgroupsPath` if they don't exist. | ||
If `cgroupsPath` is empty, then the behavior is runtime implementation specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean “implementation-defined”.
And I liked the old language (where the runtime could pick a default cgroupsPath
much more than the huge hole you poke with an unqualified “implementation-defined”. The runtime could drive all sorts of crazy things unrelated to cgroups through this big loophole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this new sentence is fine and not any more of a "loophole" than the old language
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this new sentence is fine and not any more of a "loophole" than the old language
With the new language, config authors should not leave cgroupsPath
empty unless they have read the implementation-specific docs for any runtime that might consume their config. With the old language… I'm not sure. It looks like this PR (as of 81c9f31) still has the “MAY define the default” line. So what is this new line about? Is it just underlining that the default may not be a fixed string (e.g. maybe the runtime defaults to the container ID), so if you leave this unset you might get new cgroup or might join an existing cgroup? If so, I'd rather clarify that that is what is being implementation-defined. The current language in 81c9f31 supports runtimes which error out on unset cgroupsPath
(or ignore hooks
, etc., etc.), and I don't think we want to give that much leeway here.
With the old language, config authors could safely leave cgroupsPath
empty if they didn't care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too prefer the old language. What is the reason for this change @mrunalp?
config-linux.md
Outdated
|
||
The runtime MUST ensure that the container process is attached to the cgroups specified by `cgroupsPath`. | ||
If any property is set under `resources` then the runtime MUST set it for the container. | ||
Check individual properties for any specific handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this line? It sounds like “and don't forget to read the rest of the spec!” :p.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair. this sentence is a bit ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can take this line out.
config-linux.md
Outdated
For example, to run a new process in an existing container without updating limits, `resources` need not be specified. | ||
|
||
Runtimes MAY attach the container process to additional cgroup controllers beyond those necessary to fulfill the `resources` settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to drop this line. I think the additional attaching is surprising, so if we don't forbid runtimes from doing it (which I don't think we can with runC), I'd like to keep mentioning this as an runtime prerogative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this sentence can stay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my comment above, we would want the runtime to join all the cgroups specified by the cgroupsPath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this line is clearer in light of this comment. So I think this line should stay if runtimes remain free to not join all cgroups (known to the runtime? Mentioned in the spec?), but should be dropped if we start requiring them to join all of those cgroups.
Signed-off-by: Mrunal Patel <[email protected]>
The runtime MUST create the cgroups specified by the `cgroupsPath` if they don't exist. | ||
If `cgroupsPath` is empty, then the behavior is runtime implementation specific. | ||
|
||
The runtime MUST ensure that the container process is attached to the cgroups specified by `cgroupsPath`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cgroupv1 (which we do support but not explicitly) it is not clear what cgroups are specified by a single path. Could you clarify what was wrong with the old language (which had similar semantics but was more explicit about how different controllers are handled).
Also with cgroupv2 it's not clear what cgroup.controllers
should be set to in cgroupsPath
. Should it be +everything
or just +whatever_is_necessary
? The former makes rootless containers no longer possible in an OCI runtime, while the latter should also allow for runtimes to join extra cgroups.
NACK, I'm not sure I understand the loosening of some of the semantics of |
@cyphar @wking Okay, let us consider the 4 cases and then comment on what we would want for each. We can then add/remove/keep the language:
|
On Thu, May 18, 2017 at 05:47:26PM -0700, Mrunal Patel wrote:
1. cgroupsPath NOTSET resources NOTSET: We have two options here:
don't do anything or make it implementation-defined. Making it
implementation defined, gives more flexibility so I am leaning
towards that.
But it means that you might confuse runtimes that decide to put the
container process in a cgroup and expect it to stay there [1]. And
there may be problems with rootless containers [2] if the runtime
decides to error out if it can't accomplish whatever it would like to
do in this case. My preference is revive #237 in this case, which was
designed to make it easy for folks to experiment with alternative
cgroup managers. Lots more discussion on this front in [3].
2. cgroupsPath SET resources NOTSET: Considering the orchestrator
use case, I was leaning towards joining all the cgroups
controllers. We don't know what all resources the orchestrator
has set so safest would be to join all. An option would be to
additionally allow specifying what controllers should be joined.
“Join all the cgroups” still suffers from “v1 or v2?” and “including
ones the runtime may not be aware of?” (e.g. /sys/fs/cgroup/systemd?)
ambiguities. It's nice to have a way to join existing cgroups for the
‘exec’ and sub-container use-cases, but as long as v1 cgroups are
available and cases 3 and 4 are ambiguous, I'm not sure how to
accomplish that.
3. cgroupsPath NOTSET resources SET: Here we could have the runtime
only join the controllers (at a path of its own choice) required
to satisfy the resource requirements or all.
I would prefer this (easier to succeed for the rootless case) and this
is what the master wording already encourages.
4. cgroupsPath SET resources SET: Again similar to option 2, we need
to figure out whether runtime should join all controllers or only
the ones necessary to satisfy the resources. Also, could have an
option (like in 2) to specify the controllers explicitly (which
would need to be validated as being a superset of controllers
required to satisfy resources).
I think “has that cgroup property set” is sufficient for explicitly
wanting to join the cgroup. So “I'd like to join a devices cgroup but
not tweak it” would be:
"linux": {"resources": {"devices": {}}}
possibly including a cgroupsPath. As long as we collect properties by
cgroup (we'd want to address #806), that gives a clear way to request
joining cgroups known to the spec. There would be no way to request
to join cgroups not known to the spec (e.g. a systemd cgroup), but you
can handle that in hooks. And if the runtime decided to join some
oddball cgroup as well, then it's up to the runtime to not get
surprised by [1].
[1]: #834 (comment)
[2]: #834 (comment)
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
|
I feel like “what should happen in cgroup case $X?” for a number of
cases is a discussion that could go on for a while (and has been going
on for about as long as there has been a runtime spec). If the
current master wording is not clear or ideal, and with rc6 and a
feature freeze imminent, I think we may be in the same boat on this
subject as the Windows spec discussed in #817. The easy out is to
provide a way to say “please leave the cgroups alone; I'll handle them
myself” in the config (whether that's a revived #237, some other
property combination, or a new property in its own right). Then users
concerned about portability (if the spec is not clear enough on the
cgroup specification, #746) or with use cases not addressed by the
currently specified cgroup config have a portable way to use their own
cgroup manager. We can cut rc6 and 1.0 with that, and continue the
discussion about what should happen in any other cases without holding
up the 1.0 release. If subsequent discussion on whether cgroup
joining is greedy or not settles down close to whatever goes out with
rc6/1.0, it may be able to go out as 1.1 without the breaking-change
2.0 bump.
|
Closes: #745 |
I'll remove line 188 and add more language to clarify what cgroups to join. |
@mrunalp What exactly issue does this PR try to fix? Seems that there are still some concerns not clarified, can this be moved to post 1.0? |
@hqhq I am fine closing this for now. |
Signed-off-by: Mrunal Patel [email protected]