Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

"docker run --sysctl ..." not supported #185

Closed
jodh-intel opened this issue Apr 5, 2018 · 11 comments
Closed

"docker run --sysctl ..." not supported #185

jodh-intel opened this issue Apr 5, 2018 · 11 comments
Assignees
Labels
limitation Issue cannot be resolved

Comments

@jodh-intel
Copy link
Contributor

Previously raised as clearcontainers/runtime#15.

@jodh-intel jodh-intel added the limitation Issue cannot be resolved label Apr 5, 2018
@WeiZhang555
Copy link
Member

I think this could be good start point for contributors, adding a "help wanted" label 😄

@caoruidong
Copy link
Member

caoruidong commented Apr 23, 2018

Is this duplicate of #163

@grahamwhaley
Copy link
Contributor

@caoruidong it looks the same to me. I think #163 has more info, and also has the help wanted label. @jodh-intel - shall we close this one in preference to #163 then?

@jodh-intel
Copy link
Contributor Author

@grahamwhaley
Copy link
Contributor

Ah, yeah, OK - let me copy some stuff over from #163 then, and then we can close that one instead...

@grahamwhaley
Copy link
Contributor

Copying info from #163, as a duplicate:


From @sameo on May 11, 2017 13:57

From @mcastelino on April 12, 2017 0:46

Docker supports setting namespaced kernel parameters at runtime, runc honors this. We do not honor the same and report success

docker run --runtime=cor --sysctl net.ipv4.ip_forward=1 -it alpine sh
/ # sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0

docker run --runtime=runc --sysctl net.ipv4.ip_forward=1 -it alpine sh
/ # sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Copied from original issue: intel/cc-oci-runtime#817

Copied from original issue: containers/virtcontainers#246


From @sameo on May 11, 2017 13:57

From @mcastelino on April 12, 2017 1:8

Note that we can actually support the setting of more kernel configuration variables with clear containers as we have an independent instance of the kernel running inside the virtual machine. However we also need to communicate that the sysctl settings for Clear Containers are not propagated from the host.

This non propagation will matter in the case of kubernetes (https://kubernetes.io/docs/concepts/cluster-administration/sysctl-cluster/) where certian unsafe sysctl settings can be safely performed in the case of clear containers.

Also there are some parameters that are not namespaced for example

sysctl -w net.bridge.bridge-nf-call-arptables=0
which is not namespaced today even though they are under net which is namespaced.

@caoruidong
Copy link
Member

So we design to set the host's config or per container's?

@amshinde
Copy link
Member

amshinde commented Mar 5, 2019

I took a look at this issue, to see what would be needed to support this.
Both docker and kubernetes support setting sysctls that are namespaced. For kubernetes, the sysctls are set on a pod basis, for docker it is ofcouse on a container bases.

Docker:
https://docs.docker.com/engine/reference/commandline/run/#configure-namespaced-kernel-parameters-sysctls-at-runtime

For k8s:
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/

Both allow only namespaced sysctls to be set. Broadly, those supported in k8s : (docker supports fewer I think)
-kernel.shm*,
-kernel.msg*,
-kernel.sem,
-fs.mqueue.,
-net.
.

I have verified that since we simply pass the sysctl conf in the OCI spec file, all the kernel *sysctl are applied by the agent using the libcontainer library.

For the net.* sysctls, libcontainer checks if a new network namespace has been created and only then applies the net* systcls. If it sees that if the container process is not running in a new net namespace, and a oci spec contains a net* sysctls, then it errors out.
I have added a fix for this in PR: kata-containers/agent#473

There are certain sysctls (eg sys.vm.map_cpu_count) that are not namespaced, so both docker and kubernetes dont whitelist them.
In our case, since the sysctls are applied inside a VM and not on the host, this is an advantage in case of Kata. We should introduce a mechanism by which non-namespace sysctls could be applied in case of a sandboxed runtime.

@egernst Is this something, that has been already discussed/implemented for Runtime classes?

@egernst
Copy link
Member

egernst commented Mar 6, 2019

@amshinde - no this has not come up yet -- runtimeClass has pretty limited capability right now.

You're right that this would be an interesting advantage, though how to expose this to the end user will be difficult. These would be considered "node level" sysctls, AFAICT, and wouldn't be available to set explicitly via pod.spec?

Can the end user just run a privileged container utilizing a kata runtimeClass in order to set in the guest, as a workaround?

/cc @tallclair

@tallclair
Copy link

Yeah, I agree with @egernst. I'd recommend using a privileged init container to do this.

@amshinde
Copy link
Member

amshinde commented Apr 8, 2019

Closing this, as we now support sysctls with Kata.

@amshinde amshinde closed this as completed Apr 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
limitation Issue cannot be resolved
Projects
None yet
Development

No branches or pull requests

7 participants