Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd's TestPodUserNS fails with runc v1.2 (succeeds with crun) on SELinux distro: setxattr /[...]/dev/mqueue: operation not permitted #4466

Closed
AkihiroSuda opened this issue Oct 23, 2024 · 5 comments · Fixed by #4473 or #4477
Labels
Milestone

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Oct 23, 2024

On Fedora 40 and Rocky Linux 9, containerd's TestPodUserNS fails with the following change on top of the main branch of containerd (containerd/containerd@bc3ce87):

diff --git a/script/setup/runc-version b/script/setup/runc-version
index 6a99dbb7fd74..79127d85a49f 100644
--- a/script/setup/runc-version
+++ b/script/setup/runc-version
@@ -1 +1 @@
-v1.1.14
+v1.2.0

Failure:

    default: === RUN   TestPodUserNS
    default: === RUN   TestPodUserNS/userns_uid_mapping
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:44.240499   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/ed8348b9215a10dba3ef48191f37dfa41c7a4648bbdf7fba9365fdf8a4c1ed4e/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/userns_gid_mapping
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/rootfs_permissions
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:44.623562   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/d89053afdbbc20f3b11b2eae107e3d70213b21f473707dd5baa762d1a317aa3c/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/volumes_permissions
    default: E1022 10:38:44.971328   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/83bda990b49619f5e98b41dd6fa5c6178264677bd3a2735debb66fb114ce0859/rootfs/dev/mqueue: operation not permitted
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default:     pod_userns_linux_test.go:251: Unexpected RunPodSandbox error: rpc error: code = Unknown desc = failed to start sandbox "e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b/rootfs/dev/mqueue: operation not permitted
    default: === RUN   TestPodUserNS/fails_with_several_mappings
    default:     pod_userns_linux_test.go:246: Create a sandbox with userns
    default: E1022 10:38:45.379638   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox "e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b": failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": setxattr /run/containerd-test/io.containerd.runtime.v2.task/k8s.io/e579723f7f6ece7cc7e6c5294fa73308777f15baacd3f0f317225c0911b9c01b/rootfs/dev/mqueue: operation not permitted
    default: E1022 10:38:45.401499   45870 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create network namespace for sandbox "461d0d64ea29a2c2b36262ad005d0ebaed8f1ea1d969f6944b575165caebc8a2": required only one uid mapping, but got 2 uid mapping(s)
    default: --- FAIL: TestPodUserNS (1.51s)
    default:     --- FAIL: TestPodUserNS/userns_uid_mapping (0.35s)
    default:     --- FAIL: TestPodUserNS/userns_gid_mapping (0.38s)
    default:     --- FAIL: TestPodUserNS/rootfs_permissions (0.35s)
    default:     --- FAIL: TestPodUserNS/volumes_permissions (0.41s)
    default:     --- PASS: TestPodUserNS/fails_with_several_mappings (0.02s)

https://github.com/containerd/containerd/actions/runs/11457221604/job/31880030218?pr=10877

This failure does not happen after reverting:

However, as the same test has been passing for crun without reverting them, probably this issue has to be rather fixed on runc's side.

@AkihiroSuda AkihiroSuda added area/selinux SELinux area/userns User Namespaces labels Oct 23, 2024
@AkihiroSuda AkihiroSuda changed the title containerd's TestPodUserNS fails with runc v1.2 (succeeds with crun): setxattr /[...]/dev/mqueue: operation not permitted containerd's TestPodUserNS fails with runc v1.2 (succeeds with crun) on SELinux distro: setxattr /[...]/dev/mqueue: operation not permitted Oct 23, 2024
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Oct 23, 2024

Sorry, "succeeds with crun" wasn't true. It wasn't tested with crun + SELinux.

@AkihiroSuda AkihiroSuda closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024
@rata
Copy link
Member

rata commented Oct 23, 2024

Just for the record, the bug seems to be in containerd, see here for more info: containerd/containerd#10877 (comment)

The summary is: containerd recently changed how user namespaces are created and it seems this is a bug with that. Before rc5 it was asking runc to create the userns (and all the namespaces), but in rc5 containerd changed to create the userns themselves (it makes sense for different reasons, although at rc5 is more questionable). However, the userns and netns are created together in one unshare call, and that seems to break with SELinux distros when we want to the mqueue mount later. When runc was doing it, it has a special handling for that (apparently a kernel bug that will be nice to fix at some point):

* A specific case of this is that the SELinux label of the
* internal kern-mount that mqueue uses will be incorrect if the
* UTS namespace is cloned before the USER namespace is mapped.
* I've also heard of similar problems with the network namespace
* in some scenarios. This also mirrors how LXC deals with this
* problem.

@lifubang
Copy link
Member

lifubang commented Oct 23, 2024

If we can do integration test with containerd and some other famous downstream projects in runc's ci, this will helps us to detect bugs before we make a release, not after we have released.

@AkihiroSuda
Copy link
Member Author

I actually ran the test with crun + SELinux, and it seems green.

@AkihiroSuda AkihiroSuda reopened this Oct 23, 2024
lifubang added a commit to lifubang/runc that referenced this issue Oct 24, 2024
fix: opencontainers#4466, in containerd, the net and user ns has been created
before start the container, and let run join these two ns when
starting the init process, it works for normal system, except
the system with selinux enabling and has mount label configed.
We can resolve it with two steps:
1. Join the user ns after joined all other namespaces;
2. If we have joined a user ns path, we should also become root
in the namespace, like what we do in unsharing a new user ns.

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 24, 2024
fix: opencontainers#4466, in containerd, the net and user ns has been created
before start the container, and let runc join these two ns when
starting the init process, it works for normal system, except
the system with selinux enabling and has mount label configed.
We can resolve it with two steps:
1. Join the user ns after joined all other namespaces;
2. If we have joined a user ns path, we should also become root
in the namespace, like what we do in unsharing a new user ns.

Signed-off-by: lifubang <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 24, 2024
fix: opencontainers#4466, in containerd, the net and user ns has been created
before start the container, and let runc join these two ns when
starting the init process, it works for normal system, except
the system with selinux enabling and has mount label configed.
We can resolve it with two steps:
1. Join the user ns after joined all other namespaces;
2. If we have joined a user ns path, we should also become root
in the namespace, like what we do in unsharing a new user ns.

Signed-off-by: lifubang <[email protected]>
@lifubang
Copy link
Member

@AkihiroSuda Could you please help to see whether #4473 has fixed your issue or not?
It seems that the issue has gone with this patch in my local fedora 40 with selinux enabled.

```bash root@iZrj92lfz91pzit984cd5tZ:~/go/src/github.com/containerd/containerd# /usr/local/go/bin/go test --count=1 -v -test.v -timeout 30s -run ^TestPodUserNS$ github.com/containerd/containerd/v2/integration === RUN TestPodUserNS === RUN TestPodUserNS/userns_gid_mapping pod_userns_linux_test.go:246: Create a sandbox with userns time="2024-10-24T23:15:45+08:00" level=info msg="Using the following image list: {Alpine:ghcr.io/containerd/alpine:3.14.0 BusyBox:ghcr.io/containerd/busybox:1.36 Pause:registry.k8s.io/pause:3.10 ResourceConsumer:registry.k8s.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.2 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1 ArgsEscaped:cplatpublic.azurecr.io/args-escaped-test-image-ns:1.0 DockerSchema1:registry.k8s.io/busybox@sha256:4bdd623e848417d96127e16037743f0cd8b528c026e9175e22a84f639eca58ff}" main_test.go:731: Image "ghcr.io/containerd/busybox:1.36" already exists, not pulling. pod_userns_linux_test.go:274: Create a container for userns pod_userns_linux_test.go:283: Start the container pod_userns_linux_test.go:286: Wait for container to finish running pod_userns_linux_test.go:301: Running check function === RUN TestPodUserNS/rootfs_permissions pod_userns_linux_test.go:246: Create a sandbox with userns main_test.go:731: Image "ghcr.io/containerd/busybox:1.36" already exists, not pulling. pod_userns_linux_test.go:274: Create a container for userns pod_userns_linux_test.go:283: Start the container pod_userns_linux_test.go:286: Wait for container to finish running pod_userns_linux_test.go:301: Running check function === RUN TestPodUserNS/volumes_permissions pod_userns_linux_test.go:246: Create a sandbox with userns main_test.go:731: Image "ghcr.io/containerd/busybox:1.36" already exists, not pulling. pod_userns_linux_test.go:274: Create a container for userns pod_userns_linux_test.go:283: Start the container pod_userns_linux_test.go:286: Wait for container to finish running pod_userns_linux_test.go:301: Running check function === RUN TestPodUserNS/fails_with_several_mappings pod_userns_linux_test.go:246: Create a sandbox with userns E1024 23:15:49.408259 255944 remote_runtime.go:132] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create network namespace for sandbox "b2c5d638eff48193d7d4188a22c938845f4dacaa7828196d03668d4534d540e8": required only one uid mapping, but got 2 uid mapping(s) === RUN TestPodUserNS/userns_uid_mapping pod_userns_linux_test.go:246: Create a sandbox with userns main_test.go:731: Image "ghcr.io/containerd/busybox:1.36" already exists, not pulling. pod_userns_linux_test.go:274: Create a container for userns pod_userns_linux_test.go:283: Start the container pod_userns_linux_test.go:286: Wait for container to finish running pod_userns_linux_test.go:301: Running check function --- PASS: TestPodUserNS (5.91s) --- PASS: TestPodUserNS/userns_gid_mapping (1.51s) --- PASS: TestPodUserNS/rootfs_permissions (1.45s) --- PASS: TestPodUserNS/volumes_permissions (1.46s) --- PASS: TestPodUserNS/fails_with_several_mappings (0.01s) --- PASS: TestPodUserNS/userns_uid_mapping (1.48s) PASS ok github.com/containerd/containerd/v2/integration 5.938s root@iZrj92lfz91pzit984cd5tZ:~/go/src/github.com/containerd/containerd# cat /etc/containerd/config.toml version = 2

[plugins]
[plugins."io.containerd.grpc.v1.cri"]
enable_selinux = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/usr/libexec/cni/"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.internal.v1.opt"]
path = "/var/lib/containerd/opt"
root@iZrj92lfz91pzit984cd5tZ:~/go/src/github.com/containerd/containerd# sestatus -v
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 33

Process contexts:
Current context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Init context: system_u:system_r:init_t:s0
/usr/sbin/sshd system_u:system_r:sshd_t:s0-s0:c0.c1023

File contexts:
Controlling terminal: unconfined_u:object_r:user_devpts_t:s0
/etc/passwd system_u:object_r:passwd_file_t:s0
/etc/shadow system_u:object_r:shadow_t:s0
/bin/bash system_u:object_r:shell_exec_t:s0
/bin/login system_u:object_r:login_exec_t:s0
/bin/sh system_u:object_r:bin_t:s0 -> system_u:object_r:shell_exec_t:s0
/sbin/agetty system_u:object_r:getty_exec_t:s0
/sbin/init system_u:object_r:bin_t:s0 -> system_u:object_r:init_exec_t:s0
/usr/sbin/sshd system_u:object_r:sshd_exec_t:s0

</details>

@lifubang lifubang added this to the 1.2.1 milestone Oct 24, 2024
lifubang added a commit to lifubang/runc that referenced this issue Oct 24, 2024
fix: opencontainers#4466, in containerd, for user ns pod, the net and user ns has been
created before start the container, and let runc join these two ns when
starting the init process, it works for normal systems, except systems
with selinux enabling and has mount label configed.

We can resolve it with two steps:
1. Join the user ns after joined all other namespaces, there may be some
namespaces are not owned by the user namespace;
2. Should also become root in the namespace, if we have joined a user ns
path like what we do in unsharing a new user ns.

Signed-off-by: lifubang <[email protected]>
kolyshkin added a commit to kolyshkin/containerd that referenced this issue Oct 24, 2024
This is just to run CI in order to check if
 opencontainers/runc#4474
fixes
 opencontainers/runc#4466.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/containerd that referenced this issue Oct 25, 2024
This is just to run CI in order to check if
 opencontainers/runc#4474
fixes
 opencontainers/runc#4466.

Signed-off-by: Kir Kolyshkin <[email protected]>
@cyphar cyphar closed this as completed in c78f3f2 Oct 25, 2024
kolyshkin added a commit to kolyshkin/runc that referenced this issue Oct 26, 2024
Containerd pre-creates userns and netns before calling runc, which
results in the current code not working when SELinux is enabled,
resulting in the following error:

> runc create failed: unable to start container process: error during
container init: error mounting "mqueue" to rootfs at "/dev/mqueue":
setxattr /path/to/rootfs/dev/mqueue: operation not permitted

The solution is to become root in the user namespace right after
we join it.

Fixes opencontainers#4466.

Co-authored-by: Wei Fu <[email protected]>
Co-authored-by: Kir Kolyshkin <[email protected]>
Co-authored-by: Aleksa Sarai <[email protected]>
Signed-off-by: lifubang <[email protected]>
(cherry picked from commit c78f3f2)
Signed-off-by: Kir Kolyshkin <[email protected]>
@rata rata mentioned this issue Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants