-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman: add new cgroup mode split #6666
podman: add new cgroup mode split #6666
Conversation
@vrothberg PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit messages states conmon-delegated
but the code adds a no-conmon-delegated
. I'd guess we should update the docs as well.
Do we need to update the systemd unit files in pkg/system/generate
?
libpod/container_internal_linux.go
Outdated
@@ -1505,6 +1506,17 @@ func (c *Container) getOCICgroupPath() (string, error) { | |||
case (rootless.IsRootless() && !unified) || c.config.NoCgroups: | |||
return "", nil | |||
case c.runtime.config.Engine.CgroupManager == config.SystemdCgroupsManager: | |||
if c.config.CgroupsMode == "no-conmon-delegated" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make "no-conmon-delegated" a constant somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit messages states
conmon-delegated
but the code adds ano-conmon-delegated
. I'd guess we should update the docs as well.
Thanks, I've renamed it to conmon-delegated
and added a constant
Do we need to update the systemd unit files in
pkg/system/generate
?
I don't think we can do that yet. crun needs containers/crun#409
0148073
to
8832f34
Compare
docs/source/markdown/podman-run.1.md
Outdated
|
||
Determines whether the container will create CGroups. | ||
|
||
Default is **enabled**. The **disabled** option will force the container | ||
to not create CGroups, and thus conflicts with CGroup options | ||
(**--cgroupns** and **--cgroup-parent**). | ||
The **no-conmon** option disables a new CGroup only for the **conmon** process. | ||
The **conmon-delegated** option reuses the current cgroup for both the conmon and the container payload, it works only on cgroup v2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's also specific to systemd cgroups (looks like it is) we should also mention that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it is so specific to systemd. It can be used also without systemd, even if it is less useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will also break remote. We probably need a check to not permit the option being used as remote as it changes the cgroup of the Podman process itself
Overall, I like this. I think it does break |
e168a7c
to
173e414
Compare
Is there a reason why you did cgroups 2 only? This approach works for cgroups 1: I'd happily make this coexist if need be. |
For that matter, why is cgroup-parent so draconian? It only allows items to end in .slice. And yet this works: in config.json Container cgroups could be implemented with --cgroup-parent /machine.slice/%n/container, if the cgroup-parent check weren't rejecting it. This commit proves it works. Would a better implementation be implementing --conmon-cgroup-parent /machine.slice/%n/supervisor? |
we could relax them.
The current implementation works well also without systemd, i.e. you have a cgroup you own and you want to create both conmon+container cgroup in it. |
just simplicity. We can surely add cgroup v1 support. It must be done for each controller separately, it is not enough to get the cgroup from the pid controller |
I read the cgroup from the pid controller, but in the Move call, I move all of them... that should do it, right? I don't know if pids will always be 3 - I can fix the find if need be. |
there could be different cgroups for each controller, the safest is to read each controller and then do the move as part of the same loop for that controller |
... except you were using the same call to determine the name of the supervisor cgroup - so I could only return 1 So should the supervisor cgroup name be based on the pids namespace, and the move do something else? i.e. just tack "container" onto the end? tack "container" onto the end if it's not /? |
A couple thoughts here:
|
FYI top is working for me - it properly shows the /slice/unit/container cgroup Stats is not. |
for cgroup v2 there cannot be processes in the inner groups. We need to make sure the conmon process is already in the right sub cgroup before it starts (or have a way to communicate to conmon to move itself). We need to do such steps before setting up the container, the previous code path cannot be used.
there is only one hierarchy in cgroup v2, so that is enough. In any case, I think I can just make my PR more generic to handle cgroup v1 too, even if it is not necessary to move the conmon process in a sub cgroup |
pr updated w/ podman stats working I'll be the first to admit, I don't fully understand all the implications, and I don't have a cgv2 machine to test on. I do know this all works on cgv1, and it looks simpler and reuses code paths that were already present and seem to make sense. But I don't see how podman stats is going to work if you never pass the cgroup parent back through the container object... Thanks! |
FCOS appears to be running cg1/2 hybrid, and pkg/cgroups doesn't adjust /sys/fs/cgroup/unified. Is the long term plan to fix pkg/cgroups or deprecate it? |
25e14a5
to
66c9e08
Compare
LGTM |
@mheon @vrothberg PTAL |
We want to hold this until Monday - we're in a semi-feature-freeze state for v2.0.0 release, which should happen in a few hours. Will do a full review once that is cut. |
@@ -80,7 +80,7 @@ If the host uses cgroups v1, the default is set to **host**. On cgroups v2 the | |||
Determines whether the container will create CGroups. | |||
Valid values are *enabled*, *disabled*, *no-conmon*, which the default being *enabled*. | |||
The *disabled* option will force the container to not create CGroups, and thus conflicts with CGroup options (**--cgroupns** and **--cgroup-parent**). | |||
The *no-conmon* option disables a new CGroup only for the conmon process. | |||
The *no-conmon* option disables a new CGroup only for the conmon process. The *split-current* option reuses the current cgroup for both the conmon and the container payload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just call this current? I don't see where split- adds any additional information to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need current
in future to really mean current
and use the cgroup where Podman is running instead of creating two sub-cgroups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse-current
or perhaps just reuse
then?
66c9e08
to
c555172
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c555172
to
ac92674
Compare
Can you add the cgroup-parent validation to |
When running under systemd there is no need to create yet another cgroup for the container. With conmon-delegated the current cgroup will be split in two sub cgroups: - supervisor - container The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend). Closes: containers#6400 Signed-off-by: Giuseppe Scrivano <[email protected]>
ac92674
to
6ee5f74
Compare
added and pushed a new version |
ready for review |
LGTM |
1 similar comment
LGTM |
The **no-conmon** option disables a new CGroup only for the **conmon** process. | ||
The **split** option splits the current cgroup in two sub-cgroups: one for conmon and one for the container payload. It is not possible to set **--cgroup-parent** with **split**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really, really, really need to create a conmon man page that we can add a link to for pages like this. @haircommander any chance of making this happen, even if the page is terribly short and sweet?
This is not a holdback for this review at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened containers/conmon#184
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/lgtm |
add a new cgroup mode split.
When running under systemd there is no need to create yet another cgroup for the container.
With conmon-delegated the current cgroup will be split in two sub cgroups:
The supervisor cgroup will hold conmon and the podman process, while the container cgroup is used by the OCI runtime (using the cgroupfs backend).
Closes: #6400
Signed-off-by: Giuseppe Scrivano [email protected]