Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPERM mounting sysfs with rootless/userns container #3672

Open
maleadt opened this issue Nov 29, 2022 · 10 comments
Open

EPERM mounting sysfs with rootless/userns container #3672

maleadt opened this issue Nov 29, 2022 · 10 comments

Comments

@maleadt
Copy link

maleadt commented Nov 29, 2022

I'm trying out runc to get a simple unpriviliged containerized execution, but am having issues mounting sysfs:

"mounts": [
    {
        "destination": "/sys",
        "type": "sysfs",
        "source": "sysfs",
        "options": [
            "nosuid",
            "noexec",
            "nodev"
        ]
    }
]
❯ runc run test
ERRO[0000] runc run failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount sysfs:/sys (via /proc/self/fd/7), flags: 0xe: operation not permitted

Meanwhile, crun manages fine:

❯ crun run test
root@test:~# mount | grep sysfs
sys on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
Full config
{
    "ociVersion": "1.0.1",
    "platform": {
        "os": "linux",
        "arch": "amd64"
    },
    "root": {
        "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f",
        "readonly": true
    },
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs",
            "options": [
                "nosuid",
                "strictatime",
                "mode=755",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/pts",
            "type": "devpts",
            "source": "devpts",
            "options": [
                "nosuid",
                "noexec",
                "newinstance",
                "ptmxmode=0666",
                "mode=0620"
            ]
        },
        {
            "destination": "/dev/shm",
            "type": "tmpfs",
            "source": "shm",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "mode=1777",
                "size=65536k"
            ]
        },
        {
            "destination": "/dev/mqueue",
            "type": "mqueue",
            "source": "mqueue",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev"
            ]
        },
        {
            "destination": "/sys/fs/cgroup",
            "type": "cgroup",
            "source": "cgroup",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "relatime",
                "ro"
            ]
        }
    ],
    "process": {
        "terminal": true,
        "cwd": "/root",
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "args": [
            "/bin/bash", "--login"
        ],
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ],
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "inheritable": [
                    "CAP_AUDIT_WRITE",
                    "CAP_KILL",
                    "CAP_NET_BIND_SERVICE"
                ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL"
            ],
            "ambient": [
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "noNewPrivileges": true
    },
    "user": {
        "uid": 0,
        "gid": 0
    },
    "hostname": "test",
    "linux": {
        "resources": {
            "devices": [
                {
                    "allow": false,
                    "access": "rwm"
                }
            ]
        },
        "namespaces": [
            {
                "type": "pid"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            },
            {
                "type": "user"
            },
            {
                "type": "cgroup"
            }
        ],
        "uidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "gidMappings": [
            {
                "containerID": 0,
                "hostID": 1000,
                "size": 1
            }
        ],
        "devices": null
    }
}

Binding sys instead works around the issue:

"mounts": [
    {
        "destination": "/sys",
        "type": "none",
        "source": "/sys",
        "options": [
            "rbind",
            "nosuid",
            "noexec",
            "nodev",
            "ro"
        ]
    },
]
@kolyshkin
Copy link
Contributor

I barely remember this depends on the kernel version, so some kernels (mistakenly) denied this mount.

Two possible solutions are:

  1. Upgrade the kernel
  2. Do not use rootless+userns+sysfs (lack of /sys might be OK for some containers).

I am not sure what are the implications of bind-mounting host /sys, and so I would not recommend doing that (without doing some security analysis first, that is).

@kolyshkin
Copy link
Contributor

Now,

  1. This is not a runc bug (but rather a kernel bug)
  2. There's nothing runc can do about this (there's no easy workaround, and bind-mounting /sys is questionable)

Based on these two points, I am closing this as not-a-bug.

Let me know if you feel different.

@maleadt
Copy link
Author

maleadt commented Nov 30, 2022

There's nothing runc can do about this (there's no easy workaround, and bind-mounting /sys is questionable)

But crun manages fine? I'm unfamiliar with the exact logic taking care of mounting sysfs, but this seems to indicate that there is a way to deal with this from the runtime's side.

Also, I'm happy to upgrade my kernel, but I'm using 5.15 -- the latest LTS -- which isn't exactly ancient. It's still what e.g. Ubuntu 22.04 is using/supporting for the next 5 years or so.

@maleadt
Copy link
Author

maleadt commented Nov 30, 2022

Also, this reproduces on kernel 6.0.10 (Arch Linux)...

@kolyshkin kolyshkin reopened this Nov 30, 2022
@kolyshkin
Copy link
Contributor

OK, please tell us how to repro this (what is your environment and the steps to repro) and we'll take a look.

@maleadt
Copy link
Author

maleadt commented Dec 2, 2022

OK, please tell us how to repro this (what is your environment and the steps to repro) and we'll take a look.

There's not much more to to it than what I've reported here:

./runc.amd64 run test
ERRO[0000] runc run failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount sysfs:/sys (via /proc/self/fd/7), flags: 0xe: operation not permitted

@g0dA
Copy link

g0dA commented Dec 6, 2022

This is not runc bug, kernels denied this mount. this is right

why crun can mount sysfs?

because if in user namespace, crun bind /sys not sysfs

https://github.com/containers/crun/blob/2700598aa9df55945d09084ca035e1d140bc7f73/src/libcrun/linux.c#L1084

@maleadt
Copy link
Author

maleadt commented Dec 6, 2022

I see; thanks!

@maleadt maleadt closed this as completed Dec 6, 2022
@kolyshkin
Copy link
Contributor

containers/crun@6785cef

We should do the same for runc I guess

@kolyshkin
Copy link
Contributor

Note that runc spec --rootless generates a spec which has /sys as a bind mount. I guess that is why we never saw this error. The code was added by #744 (specifically, commit d04cbc4).

I think we still have to support replacing a proper /sys mount with a bind mount because crun does it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants