Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman 3.4.4 and Fedora 40 filesystem permission problems #21012

Closed
bcl opened this issue Dec 13, 2023 · 16 comments
Closed

podman 3.4.4 and Fedora 40 filesystem permission problems #21012

bcl opened this issue Dec 13, 2023 · 16 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@bcl
Copy link

bcl commented Dec 13, 2023

Issue Description

I've been using podman to run the test suite for Lorax and weldr-client for a number of years with no problems until recently.
Everything works fine locally using Fedora 38 and podman 4.7.2, the problem is that I use the same setup in my GitHub actions and with Fedora Rawhide (f40) I've started getting permission denied errors when rsync and tar try to modify files in the container.

Steps to reproduce the issue

I've setup a simplified repo here that demonstrates what I'm seeing:
https://github.com/bcl/podman-fedora-action/actions
it uses rsync to make a copy of the source tree in the container. The failed action run is with rawhide, the passing one is with Fedora 39.

Describe the results you received

You can see it complain about permissions in the output:

podman run --rm -v `pwd`/.test-results/:/test-results \
	-v `pwd`:/lorax-ro:ro --security-opt label=disable \
	--env RUN_TESTS="ci" \
	welder/lorax-tests:rawhide make test-in-copy
rsync -a --exclude=.git /lorax-ro/ /lorax/
rsync: [generator] failed to set permissions on "/lorax/.github": Operation not permitted (1)
...

If I add --no-perms to rsync it runs just fine.

Describe the results you expected

I expected no permission errors :)

podman info output

Run podman info && make test-in-podman
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 4
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: fv-az777-825
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 127
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 6.2.0-1018-azure
  linkmode: dynamic
  logDriver: journald
  memFree: 14859001856
  memTotal: 16757788672
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.6.1
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 2m 8.92s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/runner/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/runner/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /run/user/1001/containers
  volumePath: /home/runner/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.17.3
  OsArch: linux/amd64
  Version: 3.4.4

Podman in a container

Yes

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

This is running via GitHub actions on ubuntu-latest (22.04) with podman 3.4.4, I think it's in a container but I really don't know how GitHub runs the action.

Additional information

If I switch the container to use Fedora 39 (as you can see in the passing actions in my example repo) it works fine. So I suspect this is some kind of interaction between podman 3.4.4 on Ubuntu and Fedora Rawhide (40) but I don't know how to track it any further than that.

With rsync I can work around it by passing --no-perms but after seeing similar issues with tar in the weldr-client project I figured it might be better to track down what's actually happening.

@bcl bcl added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2023
@giuseppe
Copy link
Member

can you show the output of id and ls -ld /lorax/.github ?

@bcl
Copy link
Author

bcl commented Dec 15, 2023

id
uid=0(root) gid=0(root) groups=0(root)
ls -l /lorax-ro/
total 16
-rw-r--r-- 1 root root  194 Dec 15 17:01 Dockerfile.test
-rw-r--r-- 1 root root 6031 Dec 15 17:01 Makefile
-rw-r--r-- 1 root root   11 Dec 15 17:01 test-packages
ls -ld /lorax-ro/.github
drwxr-xr-x 3 root root 4096 Dec 15 17:01 /lorax-ro/.github
rsync -a --exclude=.git /lorax-ro/ /lorax/
rsync: [generator] failed to set permissions on "/lorax/.github": Operation not permitted (1)
rsync: [generator] failed to set permissions on "/lorax/.github/workflows": Operation not permitted (1)
rsync: [generator] failed to set permissions on "/lorax/.test-results": Operation not permitted (1)
rsync: [receiver] failed to set permissions on "/lorax/.Dockerfile.test.kUToFZ": Operation not permitted (1)
rsync: [receiver] failed to set permissions on "/lorax/.Makefile.gxalWE": Operation not permitted (1)
rsync: [receiver] failed to set permissions on "/lorax/.test-packages.x4Ol6W": Operation not permitted (1)
rsync: [receiver] failed to set permissions on "/lorax/.github/workflows/.tests.yml.lTnJPc": Operation not permitted (1)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1336) [sender=3.2.7]
make: *** [Makefile:122: test-in-copy] Error 23
make: *** [Makefile:131: test-in-podman] Error 2
Error: Process completed with exit code 2.

From https://github.com/bcl/podman-fedora-action/actions/runs/7224879680/job/19687140051

@giuseppe
Copy link
Member

thanks, permissions seem correct.

Is there a way to strace the rsync process to see exactly what syscall is failing?

Do commands like touch /lorax/foo; chown 0:0 /lorax/foo; chmod 700 /lorax/foo work?

What capabilities do you have inside the container (grep ^Cap /proc/self/status)? What is the underlying file system for /lorax? Is it on the container overlay?

@bcl
Copy link
Author

bcl commented Dec 18, 2023

Manually changing permissions seems fine. /lorax is created on the overlay filesystem inside the container. /lorax-ro/ is on an ext4 filesystem on the host it looks like. See the output at https://github.com/bcl/podman-fedora-action/actions/runs/7251169583/job/19752950663

grep ^Cap /proc/self/status
CapInh:	00000000800405fb
CapPrm:	00000000800405fb
CapEff:	00000000800405fb
CapBnd:	00000000800405fb
CapAmb:	0000000000000000

I'll see if I can strace things...

ETA: strace isn't allowed in this environment:

strace rsync -a --exclude=.git /lorax-ro/ /lorax/
strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
strace: PTRACE_SETOPTIONS: Operation not permitted
strace: detach: waitpid(18): No child processes
strace: Process 18 detached
make: *** [Makefile:124: test-in-copy] Error 1
make: *** [Makefile:134: test-in-podman] Error 2

@ikerexxe
Copy link

The SSSD project is facing a similar issue in its CI. We have an automated system running in Github Actions that generates a set of containers that are used within this CI. The Fedora 40 container generation has been failing for the last 3 weeks. All other versions (i.e. Fedora 39, 38, Debian 12, Ubuntu) are generated correctly.

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2023

Could you try with adding the podman command --cap-add CAP_SYS_PTRACE

@rhatdan
Copy link
Member

rhatdan commented Dec 19, 2023

So we are stuck until debean updates and F38 goes away.

@bcl
Copy link
Author

bcl commented Dec 19, 2023

Could you try with adding the podman command --cap-add CAP_SYS_PTRACE

I didn't think github would let that work, but it did:
https://github.com/bcl/podman-fedora-action/actions/runs/7267989029/job/19802969813

But I can't tell what's going on. It looks like rsync talks to itself over a socket (fd 3 and 4) so that may be making it harder to follow. I don't see any obvious failures trying to change permissions other than the errors being printed.

@giuseppe
Copy link
Member

I suggest you add the flags -f and -s 1000 to strace, so it follows also new processes if they are forked

@rhatdan
Copy link
Member

rhatdan commented Dec 20, 2023

CAP_SYS_STRACE Means a procesess running with one UID attempted to examine memory of a process running with a different UID. UID==0 examining UID==1000 requires CAP_SYS_PTRACE.

@bcl
Copy link
Author

bcl commented Dec 20, 2023

Ah, there we go - https://github.com/bcl/podman-fedora-action/actions/runs/7277447619/job/19829467928#step:3:750

[pid    19] fchmodat2(AT_FDCWD, ".github", 0755, AT_SYMLINK_NOFOLLOW <unfinished ...>
[pid    18] fstat(3, {st_mode=S_IFREG|0644, st_size=6194, ...}) = 0
[pid    19] <... fchmodat2 resumed>)    = -1 EPERM (Operation not permitted)

@justin-stephenson
Copy link

We see the rsync failed error reported specifically with . dot files

https://github.com/SSSD/sssd-ci-containers/actions/runs/7242610512/job/19728275381

fatal: [base-ground]: FAILED! => changed=false 
  cmd: /usr/bin/rsync --delay-updates -F --compress --archive --blocking-io --rsh='podman exec -i' --out-format='<<CHANGED>>%i %n%L' /home/runner/work/sssd-ci-containers/sssd-ci-containers/src/ansible/roles/common/../../../../data/ sssd-wip-base:/data/
  msg: |-
    rsync: [generator] failed to set permissions on "/data/.": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.ca.crt.KNZQuI": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.ca.key.kiPXhU": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.dc.samba.test.crt.N81BLl": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.dc.samba.test.key.8HKjkU": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.master.keycloak.test.crt.eggudG": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.master.keycloak.test.key.hEYfps": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.master.ldap.test.crt.CBvrWH": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/certs/.master.ldap.test.key.gXFCTW": Operation not permitted (1)
    rsync: [receiver] failed to set permissions on "/data/configs/.dnsmasq.conf.RDJAIU": Operation not permitted (1)

@giuseppe
Copy link
Member

I think what is happening is that on Fedora 40 there is a new kernel with the fchmodat2 syscall, while it is not present on older kernels and rsync falls back to different syscalls.

Please try disabling seccomp: --security-opt seccomp=unconfined

@giuseppe
Copy link
Member

opened a PR to allow fchmodat2 in the default profile: containers/common#1773

@bcl
Copy link
Author

bcl commented Dec 20, 2023

I think what is happening is that on Fedora 40 there is a new kernel with the fchmodat2 syscall, while it is not present on older kernels and rsync falls back to different syscalls.

Ah! That would explain why f39 works. Did a new run with seccomp disabled and rsync is happy:
https://github.com/bcl/podman-fedora-action/actions/runs/7281687496/job/19842695719#step:3:984

@giuseppe
Copy link
Member

thanks for confirming it.

I am closing the issue since it was fixed in newer Podman versions where the default error for unknown syscalls was changed to ENOSYS

pbrezina added a commit to pbrezina/sssd-ci-containers that referenced this issue Jan 5, 2024
To workaround [1] in Github actions environment. Github Actions Runners
do not yet have up to date podman version that contains this fix.

[1] containers/podman#21012
pbrezina added a commit to SSSD/sssd-ci-containers that referenced this issue Jan 8, 2024
To workaround [1] in Github actions environment. Github Actions Runners
do not yet have up to date podman version that contains this fix.

[1] containers/podman#21012
bcl added a commit to bcl/weldr-client that referenced this issue Feb 12, 2024
See containers/podman#21012 for the full
background. Basically, running build-in-podman as part of the github
workflow doesn't work when using Fedora 40 or later because of the new
fchmodat2 syscall.

Disable seccomp until this works again.
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

5 participants