-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Error: crun: mount proc
to proc
: Operation not permitted: OCI permission denied" in nested rootless container
#20453
Comments
Usually this is because you are trying to run a container within a container. If on the outer container you do. podman run --security-opt unmask=/proc ... |
Hm. This isn't an SELinux thing right? I'm not using SELinux. What is masking? |
No this is not an SELinux issue. When podman or any container engine creates a container it masks over sections of /proc, and then within the container if you run another container engine, that container engine attempts to also modify /proc, and basically the kernel does not allow modification of a modified /proc. The --security-opt unmask=/proc, tells the outer container engine to not modify /proc. |
@deliciouslytyped does |
Sorry, I haven't had time to look at this. I'll likely wont have time for a few weeks, we'll see. Maybe over the weekend if I can repro the issue quickly. |
Edit: forgot that you stated that the unmask needs to be done on the outer container, which makes sense. |
Unmasking appears to have helped with what I'm trying currently. I still need to get back to what the issue is originally about. 👍 |
I am closing the issue since the error is expected with a masked proc |
I have the same error using podman inside a kubernetes container. I use buildah as part of the ci routine to build the image then i use podman to do some testing. Everything is run on the same ci container. buildah is fine it builds the image correctly bud when i try to
podman info:
buildah info:
I've tested the proposed solutions here and there at differents issues, nothing seems to work. One important think is that kubernetes is running on bottlerocket-os with eks, any idea on that ? maybe a kernel lockdown at the host level ? Basically the idea is to be able to run unprivileged containers and run podman tests through testcontainers an other testing tools. |
If /proc inside of the k8s container has been modified, then Podman will not be able to modify it. Kernel enforces this. Bottom line you have to tell k8s to not modify /proc which is being mounted into the container. |
How would /proc be modified inside the container ? Does buildah modify it during building ? And how to tell k8s "to not modify /proc" i didn't undestand that correctly. Also another think i didn't quite understand is how can buildah run the |
Are you running with --isolation=chroot? YOu could try to run with --pid=host inside of container. |
Yes thank you very much. I spend two todays reviewing podman documentation and repository and i tried to iterate my initial podman configuration many times to understand how podman works. This is the final result :
I wanted to test how far things can go and i stopped at three levels of nested podmans inside a kubernetes container which is already incredible. |
The last think now that i can't undestand is why do i have to :
Without that i get : Any ideas as to why podman doesn't like that sticky bit on newuidmap binary (it comes by default on debian based with uidmap package) ? |
That's not sticky that's suid no? |
This is an issue with the fedora base image not setting up the "filecap" bit correctly. |
This is explained in other issues (don't have a link on hand) - but what is it about the filecap not being set correctly that actually results in the problem? (possibly explained in the aforementioned ) |
The file cap is like a setuid setting, that allows newuidmap and newgidmap to configure the usernamespace based on the contents of /etc/subuid and /etc/subgid. This is a privileged operation. No file caps and rootless Podman no longer works. |
@rhatdan According to this PR https://github.com/containers/storage/pull/1188/files |
Podman does nothing other then execute the programs, it is up to the kernel to enforce these bits. A couple of questions though the kernel will not allow setuid of filecaps on file systems mounted with NOSUID (Rootless homedir mounted NOSUID), and if you are running code in emulation then the kernel will also ignore them. Do either of these make sense for your case? |
Yes it sounds like bottlerocketos enforces this. Thanx a lot. I'll have to do some digging on botlerocket. |
Wouldn't it be possible to keep the /proc modifications done by k8s while still providing a fresh pid namespaced proc mount? In our case, we are not able to run nested container when there is a tmpfs mounted on /proc/scsi. |
Try it but I doubt it. |
@rhatdan from the host I observe that podman is able to create a fresh procfs when it is modified, while buildah/unshare is not able to, would you know how this works? Here is a reproducer on fedora-39:
|
no, it is not possible. You must have a fully visible proc file system before you can mount a new one in a user namespace. We need this in Kubernetes: kubernetes/enhancements#4265 |
@giuseppe why is Podman successful, but buildah not? I would have thought podman would have blown up the same way as buildah in the example above? |
podman joins the existing user+mount namespace. If you run |
Issue Description
I have way too many terminals open with various facets of this, and I'm currently not set up to reproduce and run this, so I will have to come back and fill in the details properly when I get around to reproing.
These may be related:
#10864
#9813
I have the following error:
This is, IIRC, running rootless podman in rootless podman.
The external podman version is 4.5.0 and the internal is 4.7.0 .
The internal podman is I think, using (I had to do some extra configuration of the container I'm using) systemd cgroup management.
Findmnt shows the following, for /proc in a container that I think should be similar:
I read something in the mentioned possibly related issues that proc can't be mounted over if it has mounts in it. I didn't find the kernel documentation for this.
I don't quite understand what is supposed to be happening - this may help diagnose.
TODO
Describe the results you received
Describe the results you received
Describe the results you expected
Describe the results you expected
podman info output
TODO
Podman in a container
Yes
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
TODO
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
The text was updated successfully, but these errors were encountered: