Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless podman loses capabilities for binaries #1550

Closed
debarshiray opened this issue Sep 26, 2018 · 12 comments
Closed

Rootless podman loses capabilities for binaries #1550

debarshiray opened this issue Sep 26, 2018 · 12 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@debarshiray
Copy link
Member

/kind bug

Description

This is similar to #1526 but about capabilties, and not the SUID bit.

Once you enter the Silverblue toolbox you see:

[rishi@bollard fedora-toolbox]$ ./fedora-toolbox enter
🔹[rishi@toolbox ~]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1000(rishi)
gid=1000(rishi)
groups=10(wheel)
🔹[rishi@toolbox ~]$ getcap /usr/bin/ping
🔹[rishi@toolbox ~]$ ls -l /usr/bin/ping
-rwxr-xr-x. 1 root root 63224 Feb  8  2018 /usr/bin/ping
🔹[rishi@toolbox ~]$ ping fedoraproject.org
ping: socket: Operation not permitted
🔹[rishi@toolbox ~]$ sudo su -
[root@toolbox ~]# ping fedoraproject.org
ping: socket: Operation not permitted

Since, /usr/bin/ping isn't present in the base fedora image, we need to use an image that layers in sudo. eg., the fedora-toolbox image.

$ podman run -it --rm --uidmap 1000:0:1 --uidmap 0:1:1000 --uidmap 1001:1001:64536 fedora-toolbox:28 bash
[root@b18ed3028937 /]# capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=
[root@b18ed3028937 /]# getcap /usr/bin/ping
Failed to get capabilities of file `/usr/bin/ping' (Numerical result out of range)

Output of podman version:

Version:       0.9.3.1
Go Version:    go1.10.4
OS/Arch:       linux/amd64

Note that this is podman-0.9.3.1 with the fix for #1526 cherry-picked on top. I also have the patch from opencontainers/runc#1862 in my runc build.

Output of podman info:

host:
  Conmon:
    package: podman-0.9.3.1-1.1.git1cd906d.fc28.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.12.0-dev, commit: 8cc84bd282d7badb733d4d1e041b5d7ef7a63190-dirty'
  MemFree: 4333494272
  MemTotal: 16696311808
  OCIRuntime:
    package: runc-1.0.0-53.1.dev.git70ca035.fc28.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.0'
  SwapFree: 4208979968
  SwapTotal: 4208979968
  arch: amd64
  cpus: 4
  hostname: bollard
  kernel: 4.18.9-200.fc28.x86_64
  os: linux
  uptime: 3h 57m 37.86s (Approximately 0.12 days)
insecure registries:
  registries: []
registries:
  registries:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ContainerStore:
    number: 1
  GraphDriverName: vfs
  GraphOptions: []
  GraphRoot: /var/home/rishi/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 7
  RunRoot: /run/user/1000/run

Additional environment details (AWS, VirtualBox, physical, etc.):

This is a physical laptop running Fedora 28 Silverblue 28.20180923.0.

@rhatdan
Copy link
Member

rhatdan commented Sep 27, 2018

@giuseppe @nalind Looks like we need to maintainer file capabilities in addition to setuid bits when we chown.

@giuseppe
Copy link
Member

giuseppe commented Oct 5, 2018

I don't think this depends on the file caps not set and that containers/storage#217 is going to fix it.

If you'd like to use ping, then you need to setup unprivileged ICMP on the host:

# sysctl -w "net.ipv4.ping_group_range=0 429296729"

in this way ICMP can be used without requiring additional caps (as the ping binary does)

@giuseppe
Copy link
Member

giuseppe commented Oct 5, 2018

@debarshiray could you verify if that command solves the issue you've seen?

@giuseppe
Copy link
Member

@debarshiray have you had a chance to look at my proposed fix?

@debarshiray
Copy link
Member Author

Sorry for the delay. I was trying this out last Friday, when I realized that IPv6 doesn't quite work from inside the Red Hat Brno network even though ping(8) was trying to use it. Then I got distracted testing the new podman builds and forgot about this.

If you'd like to use ping, then you need to setup unprivileged ICMP on the host:

# sysctl -w "net.ipv4.ping_group_range=0 429296729"

Yes, this makes ping work.

I didn't know about net.ipv4.ping_group_range. (Reading https://lwn.net/Articles/422330/ and http://man7.org/linux/man-pages/man7/icmp.7.html now.)

in this way ICMP can be used without requiring additional caps (as the ping binary does)

I wonder if it will be OK to install a /etc/sysctl.d/fedora-toolbox.conf file as part of the fedora-toolbox package. The ping binary from iputils still seems to be using capabilities, which makes me wonder if the ping_group_range approach might have any side-effects:

$ getcap /usr/bin/ping
/usr/bin/ping = cap_net_admin,cap_net_raw+p

@giuseppe
Copy link
Member

I think it should be left to the admin of the machine to enable unprivileged ICMP and should not be done by default (as also it might be limited to only a subset of the users).

@debarshiray
Copy link
Member Author

The user-facing issue is that ping works out-of-the-box on the host, but not the fedora-toolbox container. Can we do something to remove this distinction?

@giuseppe
Copy link
Member

ping on the host is installed by root while for a rootless container it is an unprivileged user to install it so even adding the capabilities to the file doesn't really map to the user having them. In general there is no solution to this issue, a rootless container cannot get more privileges than the user has on the host, but for this case, you could probably just bind mount the ping executable from the host (which is owned by root) and see if that works from the container

@rhatdan
Copy link
Member

rhatdan commented Oct 15, 2018

@debarshiray Seems the best we can do is better documentation. Perhaps we add something to podman run describing stuff that will not work quite well in Non Root user.

@debarshiray
Copy link
Member Author

you could probably just bind mount the ping executable from
the host (which is owned by root) and see if that works from the
container

To avoid weird ABI issues, I guess, one will also have to bind mount the shared objects from the host.

Seems the best we can do is better documentation. Perhaps we add
something to podman run describing stuff that will not work quite well in
Non Root user.

That could be useful. In the worst case, I wonder what might be the best way to convey such a limitation to a person trying to use ping from the toolbox container.

This is now going off-topic for this issue tracker, though. :)

@halfline
Copy link

halfline commented Oct 26, 2018

I think it should be left to the admin of the machine to enable unprivileged ICMP and should not be done by default (as also it might be limited to only a subset of the users).

I think I disagree here. weren't icmp echo sockets added specifically so these caps could get dropped? we should probably just get the global default distro sysctl.conf fixed, rather than having a drop-in file though.

@debarshiray
Copy link
Member Author

I filed systemd/systemd#13141 to get net.ipv4.ping_group_range enabled by default.

poettering pushed a commit to systemd/systemd that referenced this issue Jul 24, 2019
This makes ping(8) work without CAP_NET_ADMIN and CAP_NET_RAW because
those aren't effective inside rootless Podman containers.

It's quite useful when using OSTree based operating systems like Fedora
Silverblue, where development environments are often set up using
rootless Podman containers with helpers like Toolbox [1]. Not having
a basic network utility like ping(8) work inside the development
environment can be inconvenient.

See:
https://lwn.net/Articles/422330/
http://man7.org/linux/man-pages/man7/icmp.7.html
containers/podman#1550

The upper limit of the range of group identifiers is set to 2147483647,
which is 2^31-1. Values greater than that get rejected by the kernel
because of this definition in linux/include/net/ping.h:
  #define GID_T_MAX (((gid_t)~0U) >> 1)

That's not so bad because values between 2^31 and 2^32-1 are reserved
on systemd-based systems anyway [2].

[1] https://github.com/debarshiray/toolbox
[2] https://systemd.io/UIDS-GIDS.html#summary
Yamakuzure pushed a commit to elogind/elogind that referenced this issue Sep 23, 2019
This makes ping(8) work without CAP_NET_ADMIN and CAP_NET_RAW because
those aren't effective inside rootless Podman containers.

It's quite useful when using OSTree based operating systems like Fedora
Silverblue, where development environments are often set up using
rootless Podman containers with helpers like Toolbox [1]. Not having
a basic network utility like ping(8) work inside the development
environment can be inconvenient.

See:
https://lwn.net/Articles/422330/
http://man7.org/linux/man-pages/man7/icmp.7.html
containers/podman#1550

The upper limit of the range of group identifiers is set to 2147483647,
which is 2^31-1. Values greater than that get rejected by the kernel
because of this definition in linux/include/net/ping.h:
  #define GID_T_MAX (((gid_t)~0U) >> 1)

That's not so bad because values between 2^31 and 2^32-1 are reserved
on systemd-based systems anyway [2].

[1] https://github.com/debarshiray/toolbox
[2] https://systemd.io/UIDS-GIDS.html#summary
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
Development

No branches or pull requests

5 participants