Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] runc exec --cgroup #3040

Closed
kolyshkin opened this issue Jun 25, 2021 · 9 comments · Fixed by #3059
Closed

[RFC] runc exec --cgroup #3040

kolyshkin opened this issue Jun 25, 2021 · 9 comments · Fixed by #3059
Assignees

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Jun 25, 2021

Any container can have sub-cgroups

  • created by the container itself, if access is provided (cgroupns is enabled and cgroup mount is not readonly -- see examples for cgroup v1 and v2);
  • created by some other tools (higher-level orchestrators, systemd, etc);
  • created by runc itself (if a future runtime-spec will add an ability to define sub-cgroups and their resources).

This is a proposal to add a feature to have runc exec executed in a sub-cgroup of a container, rather than its top-level cgroup as it happens now (except for cgroup v2, which has a fallback to join container init's cgroup).

For example:

runc exec -d --cgroup /foo/bar CID cmd args...

will run cmd args inside container CID, putting cmd in container's foo/bar cgroup, relative to container's top-level cgroup.

Obviously, the default value for --cgroup is /, which is how it's working now.

In a similar manner, runtime-spec's Process structure need to add a Cgroup field with the same meaning as runc exec's --cgroup. For container init it doesn't make sense to have Cgroup set to any value other than /. For other execs, it can be changed.

One other implementation detail is, I guess runc exec --cgroup should NOT create the sub-cgroup if it does not exist, but return an error.

@kolyshkin kolyshkin self-assigned this Jun 25, 2021
@mrunalp
Copy link
Contributor

mrunalp commented Jun 25, 2021

The use case for this is dpdk applications that use a subset of the configured cpuset in the main container entirely and they can't tolerate k8s probes (runc exec) running on those cpus.

@AkihiroSuda
Copy link
Member

How will this relate to subtree_controllers?

@kolyshkin
Copy link
Contributor Author

How will this relate to subtree_controllers?

I think it should be orthogonal (except for the "runtime-spec will add an ability to define sub-cgroups" item which is actually not part of this proposal, and only given as an example).

The runc exec --cgroup merely uses the existing in-container cgroup, and errors out if it's not available.

@giuseppe
Copy link
Member

how will this work with cgroup v2? The processes can be only in the leaf nodes, do we need to ensure the existing processes are moved to a sibling cgroup first?

@kolyshkin
Copy link
Contributor Author

do we need to ensure the existing processes are moved to a sibling cgroup first?

I don't think it's a runc task (unless the sub-cgroups are created by runc itself, but that's not what this RFC proposes).

Currently if runc exec process fails to join the top-level container cgroup, it retries with the cgroup of the container init. I guess this fallback should be disabled when --cgroup is set explicitly.

@kolyshkin
Copy link
Contributor Author

An initial implementation is available (#3059). Will work on more tests next week but it's good enough to take a look.

@giuseppe
Copy link
Member

giuseppe commented Jul 2, 2021

do we need to ensure the existing processes are moved to a sibling cgroup first?

I don't think it's a runc task (unless the sub-cgroups are created by runc itself, but that's not what this RFC proposes).

So does it expect the cgroupfs to be mounted writeable? Otherwise how would the container process live in a separate sub-cgroup?

@kolyshkin
Copy link
Contributor Author

So does it expect the cgroupfs to be mounted writeable? Otherwise how would the container process live in a separate sub-cgroup?

Yes (in the future though we might implement sub-cgroup support in runtime spec, in which case runc will pre-create those sub-cgroups and writable cgroupfs won't be needed in case the cgroup tree is static).

@kolyshkin
Copy link
Contributor Author

in the future though we might implement sub-cgroup support in runtime spec, in which case runc will pre-create those sub-cgroups and writable cgroupfs won't be needed in case the cgroup tree is static

Obviously the use case of that (with read-only cgroupfs) would be limited to putting container init in a non-top cgroup, and runc exec --cgroup (as the container won't be able to manage anything). Not sure it can be useful or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants