-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'runc exec' errors with 'failed to setns into net namespace: Operation not permitted' #4390
Comments
The order you join namespaces is important. All namespaces have an associated user namespace that is considered its "owner" and all permission checks are done based on that namespace. runc joins the user namespace first and then all other namespaces (this is necessary for rootless containers to work -- an unprivileged user can't create/join any namespace other than user namespaces, so you need to create/join the user namespace first). However, once you create/join a user namespace your privileges are forcefully scoped to that namespace and so you no longer have host There are three things we can do:
|
Thank you for your excellent comment. I think I got blocked because I wasn't testing the path dependence of joining I'll pursue the direction you point at in your second bullet point. I think doing so will allow me to become more comfortable with the details of namespace sharing. |
I'll send the patch in a week or two. |
@thundergolfer a fourth option is to let runc create the userns and the netns. This way, runc makes sure to create them in the right order (so it has the right ownership) and it is quite simple for you. Combining this with the fact that runc has This is what we are doing in containerd. Although we might change it, maybe for 2.0, due to other redesigns in containerd that made a better fit to create the userns+netns in containerd. In case you are interested in that, what we are considering in containerd now is this: containerd/containerd#10607. IOW, let containerd create the userns AND netns and specify that to runc. This would be option 2. The "trick" we are using there is to create a new process with the CLONE_NEWUSER and CLONE_NEWNET (https://github.com/containerd/containerd/pull/10607/files#diff-106945d93d68e955471ccab149a1302ebb7214c1832b7df0bbd8855992ddf397R49-R55). Linux does the right thing regarding ownership and that (there was a bug I think in very old kernels, like 3.x). We then open the fd of the namespace (open("/proc/pid/ns/net", similar for user) and mount it in the fs. This makes the namespace persistent (the process can crash and it won't be destroyed) and we use that path for the namespace in the config.json. |
@rata That is the third option I suggested, though maybe I could've phrased it better 😅 . You can do the same thing with |
We should join as many namespaces as possible first except the user namespace. Then we can join remainning namespaces after we join/unshare user ns. (opencontainers#4390) Signed-off-by: lifubang <[email protected]>
We should join as many namespaces as possible first except the user namespace, because there may be some ns paths are not owned by the user namespace we want to join, then we can join remainning namespaces after we join/unshare user ns. Please see opencontainers#4390. Signed-off-by: lifubang <[email protected]>
We should join as many namespaces as possible first except the user namespace, because there may be some ns paths are not owned by the user namespace we want to join, then we can join remainning namespaces after we join/unshare user ns. Please see opencontainers#4390. Signed-off-by: lifubang <[email protected]>
We should join as many namespaces as possible first except the user namespace, because there may be some ns paths are not owned by the user namespace we want to join, then we can join remainning namespaces after we join/unshare user ns. Please see opencontainers#4390. Signed-off-by: lifubang <[email protected]>
We should join as many namespaces as possible first except the user namespace, because there may be some ns paths are not owned by the user namespace we want to join, then we can join remainning namespaces after we join/unshare user ns. Please see opencontainers#4390. Signed-off-by: lifubang <[email protected]>
We should join as many namespaces as possible first except the user namespace, because there may be some ns paths are not owned by the user namespace we want to join, then we can join remainning namespaces after we join/unshare user ns. Please see opencontainers#4390. Signed-off-by: lifubang <[email protected]>
Description
At modal.com we run a custom multi-tenant container runtime which can use
runc
orrunsc
(gVisor). For usrunsc exec
is working but we're hitting a failure on doingrunc exec
which I've debugged for a long time and can't root cause.Doing
runc exec ta-01J5P4BZS64CE57EXK048QMNE1 bash
fails because ofEPERM
on attempting to enter therunc
container's network namespace.Using
sudo strace -ft runc exec -cap CAP_SYS_ADMIN ta-01J5P4BZS64CE57EXK048QMNE1 bash
I can see that specifically it's failing on thesetns
syscall like this:Oddly running
sudo nsenter --all --target=267854 ls
from the same terminal works. If Istrace
that command I can see that it makes the same syscalls asrunc exec
albeit in a different order.Things I've looked into:
sudo
so this shouldn't be a problemI'm stuck on figuring out what's wrong here. My next move was going to be compiling my own
runc
to add debugging code intonsexec.c
.Steps to reproduce the issue
I fear this is tricky to reproduce, but I will provide details on what we're doing:
runc --system-cgroup run ta-123 --bundle $BUNDLE_PATH
a.
config.json
given belowsudo runc --debug exec -c CAP_SYS_ADMIN ta-01J6NQG0GEHAQ07FTVHC4GAS64 ls
The container's network namespace is created from our container runtime with
ip netns add ta-123
prior to container creation, and inside aCreateRuntime
hook we use the CNI Bridge and Loopback plugins to setuplo
andeth0
.config.json
Describe the results you received and expected
I expect that
runc exec
will succeed, but it fails on entering the network namespace. Full failure:What version of runc are you using?
and
Host OS information
But also reproduced on Oracle Linux as well.
Host kernel information
Linux ip-10-1-1-198 5.15.0-1068-aws #74~20.04.1-Ubuntu SMP Tue Aug 6 19:32:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: