Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless Podman dropping packages to external network #20429

Closed
phil-flip opened this issue Oct 20, 2023 · 9 comments
Closed

Rootless Podman dropping packages to external network #20429

phil-flip opened this issue Oct 20, 2023 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature stale-issue

Comments

@phil-flip
Copy link

Issue Description

A friend of mine have been putting quite a lot of hours into this issue, and we are unable to find any information about it on the webs.

Podman seems to drop ICMP echo requests to the internet. This issue doesn't happen, when trying to ping another container. Could that be an issue of podmans routing? Or maybe because of IPv6? Or the Podman version? or…maybe all of them? :D
At this point we are out of ideas and thought i might open up a Issue on here to see if someone knows this issue pattern.

Steps to reproduce the issue

  1. Deploy Rocky Linux 9.1
  2. Install
  • podman
  • podman-docker
  • docker-compose binary (manually)
  • slirp4netns
  • fuse-overlayfs
  1. Follow the rootless_tutorial.md guide
  2. Disable and stop podman systemd
  3. Enable and start podman as user with systemd
  4. (My college also needed to do something additional to keep pods running, because the podman socket kept stopping when the ssh sessions were closed.)
  5. Set DOCKER_HOST to podman socket of the user (unix:///run/user/1000/podman/podman.sock) for docker-compose
  6. Deploy Traefik, Authentik, Outline and more

Describe the results you received

We were trying to debug why our OIDC login was so random and sometimes failed completely. After a lot of debugging we found out that the Outline app-pod had package drops. The issue arose when I tried to switch the OIDC endpoints to reach Authentik directly, and it worked way better.

$ docker exec -it traefik-traefik-1 sh
/ # ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=1 ttl=42 time=1.135 ms
64 bytes from 1.1.1.1: seq=3 ttl=42 time=1.310 ms
64 bytes from 1.1.1.1: seq=5 ttl=42 time=0.910 ms
64 bytes from 1.1.1.1: seq=7 ttl=42 time=0.906 ms
64 bytes from 1.1.1.1: seq=8 ttl=42 time=1.292 ms
^C
--- 1.1.1.1 ping statistics ---
9 packets transmitted, 5 packets received, 44% packet loss
round-trip min/avg/max = 0.906/1.110/1.310 ms

(Same with other pods; “docker” is just an alias)

Describe the results you expected

No ping drops with a functioning Authentication process and non-stuck webpages.

podman info output

host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-1.el9_2.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: e6cdc9a4d6319e039efa13e532c1e58b713c904d'
  cpuUtilization:
    idlePercent: 95.06
    systemPercent: 1.09
    userPercent: 3.85
  cpus: 4
  distribution:
    distribution: '"rocky"'
    version: "9.2"
  eventLogger: journald
  hostname: vmd119246.contaboserver.net
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-284.30.1.el9_2.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1506439168
  memTotal: 8048586752
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-1.el9_2.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-3.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 23h 35m 24.00s (Approximately 0.96 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/emfadmin/.config/containers/storage.conf
  containerStore:
    number: 31
    paused: 0
    running: 30
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/emfadmin/.local/share/containers/storage
  graphRootAllocated: 208172843008
  graphRootUsed: 77387190272
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 46
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/emfadmin/.local/share/containers/storage/volumes
version:
  APIVersion: 4.4.1
  Built: 1695770580
  BuiltTime: Wed Sep 27 01:23:00 2023
  GitCommit: ""
  GoVersion: go1.19.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

The problem first arose on a sketchy and cheap VPS, but when we tried it at home with a similar setup as described above, the issue was still present.
The issues do not happen on Docker.

Additional information

We had the same issues on Ubuntu, but I wouldn't count that attempt as it wasn't very stably running. (Now to think of it, we probably messed something up and forgot the slirp4netns package.)

If someone can tell me how to get Version 4.7.1 running on Rocky 9.1 instead the “stable” 4.4.1 Version, and it fixes our issue, I would be more than happy.

We tried many things and I probably forgot a lot of them to mention in here, but I will try my best to keep potential Ideas up to date.

@phil-flip phil-flip added the kind/bug Categorizes issue or PR as related to a bug. label Oct 20, 2023
@Luap99
Copy link
Member

Luap99 commented Oct 23, 2023

Can you narrow this down somehow? What happens if you try a simple podman run --rm alpine ping 1.1.1.1? If this works try podman run --rm --network bridge alpine ping 1.1.1.1.

Also you can try up to date versions from https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/

@Luap99 Luap99 added the network Networking related issue or feature label Oct 23, 2023
@1player
Copy link

1player commented Oct 24, 2023

Looks a lot like the issue I'm having here. I'm losing every other packet, on a container connected to two networks:

[root@bernard ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=57 time=5.44 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=57 time=5.45 ms

I did open a Q&A question on the netavark forum about this: containers/netavark#828

@FlipperLP is your container attached to more than one network as well?

@1player
Copy link

1player commented Oct 24, 2023

In my case the issue has been fixed by setting net.ipv4.conf.default.rp_filter=2 as suggested by @Luap99

@phil-flip
Copy link
Author

Looks a lot like the issue I'm having here. I'm losing every other packet, on a container connected to two networks:

[root@bernard ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=57 time=5.47 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=57 time=5.44 ms
64 bytes from 1.1.1.1: icmp_seq=7 ttl=57 time=5.45 ms

I did open a Q&A question on the netavark forum about this: containers/netavark#828

@FlipperLP is your container attached to more than one network as well?

YES, that might be the issue, why I was not able to replicate the issue on the alpine packages! I was just about to test it with docker-compose and the alpine image. But multiple networks might explain the issue.
Unfortunately your solution @1player does not work. I tried restarting the podman.socket and restarted the whole VM.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 25, 2023

@Luap99 any update on this?

@Luap99
Copy link
Member

Luap99 commented Nov 27, 2023

As mentioned above this is likely due the sysctl setting, but the report never confirmed whenever it only happens with multiple networks.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2024

Because I never got a reply closing

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 4, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. network Networking related issue or feature stale-issue
Projects
None yet
Development

No branches or pull requests

4 participants