Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangling file in /var/lib/cni/networks/podman prevents container starting podman 3.0 #9465

Closed
skitoxe opened this issue Feb 22, 2021 · 23 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@skitoxe
Copy link

skitoxe commented Feb 22, 2021

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
When doing an podman stop <container> && sleep 5 && podman start <container> or just a normal stop and start i on some occasions get an orphan in /var/lib/cni/networks/podman/.
In exactly the same manner as described in issue #3759, which is now closed. But is see further comments indicating this is still an issue and i can see that its still a problem in 3.0. I have to manually delete the file to get the container to start again. Error message recived is:

ERRO[0000] Error adding network: failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed
ERRO[0000] Error while adding pod to CNI network "podman_default": failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed
Error: unable to start container "5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe": error configuring network namespace for container 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe: failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed

Steps to reproduce the issue:

  1. podman stop && sleep 5 && podman start

Describe the results you received:
No starting container and the follwing error when trying to start:

ERRO[0000] Error adding network: failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed
ERRO[0000] Error while adding pod to CNI network "podman_default": failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed
Error: unable to start container "5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe": error configuring network namespace for container 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe: failed to allocate for range 0: 10.89.0.64 has been allocated to 5e1c23b05eee19f0833d7ff116fa50886ead824f944652239640c8aefdd1e5fe, duplicate allocation is not allowed

Describe the results you expected:
Container starts

Additional information you deem important (e.g. issue happens only occasionally):
Seems to be more common when run by a crontab. But that might be my imagination.
Output of podman version:

Version:      3.0.0
API Version:  3.0.0
Go Version:   go1.15.7
Built:        Fri Feb 12 00:12:57 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.2
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.26-1.fc33.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.26, commit: 777074ecdb5e883b9bec233f3630c5e7fa37d521'
  cpus: 40
  distribution:
    distribution: fedora
    version: "33"
  eventLogger: journald
  hostname: xxxx
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.15-200.fc33.x86_64
  linkmode: dynamic
  memFree: 190546157568
  memTotal: 201951862784
  ociRuntime:
    name: crun
    package: crun-0.17-1.fc33.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 92h 34m 59.04s (Approximately 3.83 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 3
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1613085177
  BuiltTime: Fri Feb 12 00:12:57 2021
  GitCommit: ""
  GoVersion: go1.15.7
  OsArch: linux/amd64
  Version: 3.0.0

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.0-1.fc33.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Bare metal server dual xeon cpus and 192GB RAM.

I also have this in my /usr/lib/tmpfiles.d/podman.conf

# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks
@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 22, 2021
@Luap99
Copy link
Member

Luap99 commented Feb 22, 2021

I assume you have a static IP for your container defined.
Can you add --log-level debug to the podman stop command. Maybe this gives some insight on what is going on.

@skitoxe
Copy link
Author

skitoxe commented Feb 22, 2021

No i do not have static ip configured for the container.

INFO[0000] podman filtering at log level debug
DEBU[0000] Called stop.PersistentPreRunE(podman stop <container_name> --log-level debug)
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf"
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.33.4 Annotations:[] CgroupNS:private Cgroups:enabled DefaultCapabilities:[CHOWN DAC_OVERRIDE FOWNER FSETID KILL NET_BIND_SERVICE SETFCAP SETGID SETPCAP SETUID SYS_CHROOT] DefaultSysctls:[net.ipv4.ping_group_range=0 0] DefaultUlimits:[nproc=4194304:4194304] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:true Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HTTPProxy:true Init:false InitPath: IPCNS:private LogDriver:k8s-file LogSizeMax:-1 NetNS:bridge NoHosts:false PidsLimit:2048 PidNS:private SeccompProfile:/usr/share/containers/seccomp.json ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{ImageBuildFormat:oci CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/run/libpod/events/events.log EventsLogger:journald HooksDir:[/usr/share/containers/oci/hooks.d] ImageDefaultTransport:docker:// InfraCommand: InfraImage:k8s.gcr.io/pause:3.2 InitPath:/usr/libexec/podman/catatonit LockType:shm MultiImageArchive:false Namespace: NetworkCmdPath: NetworkCmdOptions:[] NoPivotRoot:false NumLocks:2048 OCIRuntime:crun OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/var/lib/containers/storage/libpod StopTimeout:10 TmpDir:/run/libpod VolumePath:/var/lib/containers/storage/volumes VolumePlugins:map[]} Network:{CNIPluginDirs:[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] DefaultNetwork:podman NetworkConfigDir:/etc/cni/net.d/}}
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] cached value indicated that overlay is supported
DEBU[0000] cached value indicated that metacopy is being used
DEBU[0000] cached value indicated that native-diff is not being used
INFO[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled
DEBU[0000] backingFs=btrfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/crun"
DEBU[0000] using runtime "/usr/bin/runc"
INFO[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
INFO[0000] Found CNI network podman_default (type=bridge) at /etc/cni/net.d/podman_default.conflist
DEBU[0000] Default CNI network name podman is unchangeable
INFO[0000] Setting parallel job count to 121
DEBU[0000] Starting parallel job on container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] Stopping ctr 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 (timeout 0)
DEBU[0000] Stopping container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 (PID 1734021)
DEBU[0000] Sending signal 9 to container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] Cleaning up container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] Tearing down network namespace at /run/netns/cni-91364a4a-2782-b19a-3eda-41f64e471856 for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
INFO[0000] Got pod network &{Name:<container_name> Namespace:<container_name> ID:3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 NetNS:/run/netns/cni-91364a4a-2782-b19a-3eda-41f64e471856 Networks:[{Name:podman_default Ifname:}] RuntimeConfig:map[podman_default:{IP: MAC: PortMappings:[{HostPort:25565 ContainerPort:25565 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}
INFO[0000] About to del CNI network podman_default (type=bridge)
DEBU[0000] Successfully cleaned up container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] unmounted container "3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7"
3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] Called stop.PersistentPostRunE(podman stop <container_name> --log-level debug)

my versions are as follows of some other components as well:

Package crun-0.17-1.fc33.x86_64 is already installed.
Package runc-2:1.0.0-279.dev.gitdedadbf.fc33.x86_64 is already installed.
Package fuse-overlayfs-1.4.0-1.fc33.x86_64 is already installed.

Can this be caused by container being launched by one user via sudo and then stopped and re-launched by root's crontab?

@mheon
Copy link
Member

mheon commented Feb 22, 2021

Can you check if podman restart resolves the issue? It could be that the cleanup process isn't firing, or isn't finishing - restart does not have that issue, and would narrow things down if it works.

@skitoxe
Copy link
Author

skitoxe commented Feb 22, 2021

Funny you should ask. The reason i went with the stop and start to begin with is that restart seems to be so fast that it doesnt allow time for the bind for the port to release. That's the issue i had from the beginning that led me to using stop && wait && start.
I get the below failure on restart, every single time. And i can immediately issue another restart or start and it will pop back up again.

INFO[0000] podman filtering at log level debug
DEBU[0000] Called restart.PersistentPreRunE(podman restart <container_name> --log-level debug)
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf"
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{Containers:{Devices:[] Volumes:[] ApparmorProfile:containers-default-0.33.4 Annotations:[] CgroupNS:private Cgroups:enabled DefaultCapabilities:[CHOWN DAC_OVERRIDE FOWNER FSETID KILL NET_BIND_SERVICE SETFCAP SETGID SETPCAP SETUID SYS_CHROOT] DefaultSysctls:[net.ipv4.ping_group_range=0 0] DefaultUlimits:[nproc=4194304:4194304] DefaultMountsFile: DNSServers:[] DNSOptions:[] DNSSearches:[] EnableKeyring:true EnableLabeling:true Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm] EnvHost:false HTTPProxy:true Init:false InitPath: IPCNS:private LogDriver:k8s-file LogSizeMax:-1 NetNS:bridge NoHosts:false PidsLimit:2048 PidNS:private SeccompProfile:/usr/share/containers/seccomp.json ShmSize:65536k TZ: Umask:0022 UTSNS:private UserNS:host UserNSSize:65536} Engine:{ImageBuildFormat:oci CgroupCheck:false CgroupManager:systemd ConmonEnvVars:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] ConmonPath:[/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] DetachKeys:ctrl-p,ctrl-q EnablePortReservation:true Env:[] EventsLogFilePath:/run/libpod/events/events.log EventsLogger:journald HooksDir:[/usr/share/containers/oci/hooks.d] ImageDefaultTransport:docker:// InfraCommand: InfraImage:k8s.gcr.io/pause:3.2 InitPath:/usr/libexec/podman/catatonit LockType:shm MultiImageArchive:false Namespace: NetworkCmdPath: NetworkCmdOptions:[] NoPivotRoot:false NumLocks:2048 OCIRuntime:crun OCIRuntimes:map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] PullPolicy:missing Remote:false RemoteURI: RemoteIdentity: ActiveService: ServiceDestinations:map[] RuntimePath:[] RuntimeSupportsJSON:[crun runc] RuntimeSupportsNoCgroups:[crun] RuntimeSupportsKVM:[kata kata-runtime kata-qemu kata-fc] SetOptions:{StorageConfigRunRootSet:false StorageConfigGraphRootSet:false StorageConfigGraphDriverNameSet:false StaticDirSet:false VolumePathSet:false TmpDirSet:false} SignaturePolicyPath:/etc/containers/policy.json SDNotify:false StateType:3 StaticDir:/var/lib/containers/storage/libpod StopTimeout:10 TmpDir:/run/libpod VolumePath:/var/lib/containers/storage/volumes VolumePlugins:map[]} Network:{CNIPluginDirs:[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] DefaultNetwork:podman NetworkConfigDir:/etc/cni/net.d/}}
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] cached value indicated that overlay is supported
DEBU[0000] cached value indicated that metacopy is being used
DEBU[0000] cached value indicated that native-diff is not being used
INFO[0000] Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled
DEBU[0000] backingFs=btrfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=true
DEBU[0000] Initializing event backend journald
DEBU[0000] using runtime "/usr/bin/crun"
DEBU[0000] using runtime "/usr/bin/runc"
INFO[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
INFO[0000] Found CNI network podman_default (type=bridge) at /etc/cni/net.d/podman_default.conflist
DEBU[0000] Default CNI network name podman is unchangeable
INFO[0000] Setting parallel job count to 121
DEBU[0000] Stopping ctr 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 (timeout 0)
DEBU[0000] Stopping container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 (PID 1768486)
DEBU[0000] Sending signal 9 to container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0000] Tearing down network namespace at /run/netns/cni-17639dc8-c383-44bf-f6ed-6a3fa5ee7a20 for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
INFO[0000] Got pod network &{Name:<container_name> Namespace:<container_name> ID:3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 NetNS:/run/netns/cni-17639dc8-c383-44bf-f6ed-6a3fa5ee7a20 Networks:[{Name:podman_default Ifname:}] RuntimeConfig:map[podman_default:{IP: MAC: PortMappings:[{HostPort:25565 ContainerPort:25565 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}
INFO[0000] About to del CNI network podman_default (type=bridge)
DEBU[0000] Created root filesystem for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 at /var/lib/containers/storage/overlay/203c50b0d3fb2bfc01cef7400ba5d4fd8394059a959cbe7b64fd3ec112733ea3/merged
DEBU[0000] Made network namespace at /run/netns/cni-5e42f9e8-8a36-c4ac-5d20-e1cce9d5f3fd for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
INFO[0000] Got pod network &{Name:<container_name> Namespace:<container_name> ID:3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 NetNS:/run/netns/cni-5e42f9e8-8a36-c4ac-5d20-e1cce9d5f3fd Networks:[{Name:podman_default Ifname:eth0}] RuntimeConfig:map[podman_default:{IP: MAC: PortMappings:[{HostPort:25565 ContainerPort:25565 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}] Aliases:map[podman_default:[3d7d3511df63 <container_name>]]}
INFO[0000] About to add CNI network podman_default (type=bridge)
DEBU[0001] [0] CNI result: &{0.4.0 [{Name:cni-podman1 Mac:26:3d:6e:a2:0e:ce Sandbox:} {Name:vethb35434a3 Mac:0a:f8:b1:25:67:df Sandbox:} {Name:eth0 Mac:46:d9:4d:d9:1b:a9 Sandbox:/run/netns/cni-5e42f9e8-8a36-c4ac-5d20-e1cce9d5f3fd}] [{Version:4 Interface:0xc0006b5748 Address:{IP:10.89.0.86 Mask:ffffff00} Gateway:10.89.0.1}] [{Dst:{IP:0.0.0.0 Mask:00000000} GW:<nil>}] {[10.89.0.1]  [] []}}
DEBU[0001] Workdir "/data" resolved to a volume or mount
DEBU[0001] Recreating container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 in OCI runtime
DEBU[0001] Successfully cleaned up container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0001] adding nameserver(s) from cni response of '["10.89.0.1"]'
DEBU[0001] /etc/system-fips does not exist on host, not mounting FIPS mode subscription
DEBU[0001] Setting CGroups for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 to machine.slice:libpod:3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0001] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0001] Created OCI spec for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 at /var/lib/containers/storage/overlay-containers/3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7/userdata/config.json
DEBU[0001] /usr/bin/conmon messages will be logged to syslog
DEBU[0001] running conmon: /usr/bin/conmon               args="[--api-version 1 -c 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 -u 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7/userdata -p /run/containers/storage/overlay-containers/3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7/userdata/pidfile -n <container_name> --exit-dir /run/libpod/exits --socket-dir-path /run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/overlay-containers/3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7/userdata/ctr.log --log-level debug --syslog --conmon-pidfile /run/containers/storage/overlay-containers/3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg info --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7]"
DEBU[0001] Cleaning up container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
DEBU[0001] Tearing down network namespace at /run/netns/cni-5e42f9e8-8a36-c4ac-5d20-e1cce9d5f3fd for container 3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7
INFO[0001] Got pod network &{Name:<container_name> Namespace:<container_name> ID:3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7 NetNS:/run/netns/cni-5e42f9e8-8a36-c4ac-5d20-e1cce9d5f3fd Networks:[{Name:podman_default Ifname:}] RuntimeConfig:map[podman_default:{IP: MAC: PortMappings:[{HostPort:25565 ContainerPort:25565 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}
INFO[0001] About to del CNI network podman_default (type=bridge)
DEBU[0001] unmounted container "3d7d3511df63951ca773e18b251d3d49c3185f88d0618e4726a7ad64b180a5e7"
Error: cannot listen on the TCP port: listen tcp4 :25565: bind: address already in use

@mheon
Copy link
Member

mheon commented Feb 22, 2021

Alright, that's very strange. podman restart should already handle that case (manually killing Conmon if required)

@baude
Copy link
Member

baude commented Feb 23, 2021

this is a secondary issue and should be split off.

@baude
Copy link
Member

baude commented Feb 23, 2021

@mheon does cleanup remove files in /var/lib/cni/networks/podman/ ?

@baude
Copy link
Member

baude commented Feb 23, 2021

or how about when we detect a restart has occurred?

@mheon
Copy link
Member

mheon commented Feb 23, 2021

As of Podman 3.0, we ship a systemd-tmpfiles file instructing systemd to wipe the CNI reservations directory on reboot.

@mheon
Copy link
Member

mheon commented Feb 23, 2021

I doubt this is actually a CNI reservation issue given the way this is being described, though. If podman restart shows it as well, something rather more serious is wrong.

@skitoxe Can you check if there are any orphan conmon processes hanging around? Stopping all containers should remove them all, so any hanging around after that would be interesting to know about.

@baude
Copy link
Member

baude commented Feb 23, 2021

@mheon could selinux be interupting things?

@mheon
Copy link
Member

mheon commented Feb 23, 2021

Possible? Checking the audit log could be helpful.

I'm still not certain what's actually going on here, though. We're getting port in use errors from Conmon, which usually indicate that Conmon is hanging around for a while and blocking the ports - but we fixed podman restart to kill -9 the old Conmon process, so that shouldn't be happening. After the manual delay added, we see CNI complain about an already-allocated IP - but the container's not using a static IP and CNI is smart enough not to use already-allocated IPs, so something is telling it to use an already-in-use IP or CNI is getting exceptionally confused. Something tells me these two are related but I can't put my finger on how.

@Luap99
Copy link
Member

Luap99 commented Feb 23, 2021

Can you please paste your cni config file for the podman_default network.

@skitoxe
Copy link
Author

skitoxe commented Feb 23, 2021

Can you please paste your cni config file for the podman_default network.

You mean this file right? /etc/cni/net.d

cat podman_default.conflist
{
   "args": {
      "podman_labels": {
         "com.docker.compose.network": "default",
         "com.docker.compose.project": "podman",
         "com.docker.compose.version": "1.27.4"
      }
   },
   "cniVersion": "0.4.0",
   "name": "podman_default",
   "plugins": [
      {
         "type": "bridge",
         "bridge": "cni-podman1",
         "isGateway": true,
         "ipMasq": true,
         "hairpinMode": true,
         "ipam": {
            "type": "host-local",
            "routes": [
               {
                  "dst": "0.0.0.0/0"
               }
            ],
            "ranges": [
               [
                  {
                     "subnet": "10.89.0.0/24",
                     "gateway": "10.89.0.1"
                  }
               ]
            ]
         }
      },
      {
         "type": "portmap",
         "capabilities": {
            "portMappings": true
         }
      },
      {
         "type": "firewall",
         "backend": ""
      },
      {
         "type": "tuning"
      },
      {
         "type": "dnsname",
         "domainName": "dns.podman",
         "capabilities": {
            "aliases": true
         }
      }
   ]

@skitoxe
Copy link
Author

skitoxe commented Feb 23, 2021

I doubt this is actually a CNI reservation issue given the way this is being described, though. If podman restart shows it as well, something rather more serious is wrong.

@skitoxe Can you check if there are any orphan conmon processes hanging around? Stopping all containers should remove them all, so any hanging around after that would be interesting to know about.

i didnt appear to have, granted i could have been to slow to catch them if they stick around too long but still very short, depends on the speed the restart allocates the port again, in comparison to how fast i get the process grep.

I did however get this error while stopping this time:

Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
ERRO[0000] unable to cleanup network for container cd67e977bf281eeab3b192e18aceb5d614335962d0770f0f50360ea8cdb2472f: "error tearing down CNI namespace configuration for container cd67e977bf281eeab3b192e18aceb5d614335962d0770f0f50360ea8cdb2472f: Error while removing pod from CNI network \"podman_default\": running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.89.0.80 -j CNI-dd5462ad46da9543b99f4cff -m comment --comment name: \"podman_default\" id: \"cd67e977bf281eeab3b192e18aceb5d614335962d0770f0f50360ea8cdb2472f\" --wait]: exit status 2: iptables v1.8.5 (legacy): Couldn't load target `CNI-dd5462ad46da9543b99f4cff':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information.\n"

@mheon
Copy link
Member

mheon commented Feb 23, 2021

Hm. Any chance the firewall was reloaded since the container was created? That could be related, though I'm not 100% sure how yet.

@Luap99
Copy link
Member

Luap99 commented Feb 23, 2021

Yes that's the correct file and it also looks correct to me.

cpus: 40

Do you really have 40 cores on your system? This could be race which is more likely to happen with so many cpu cores.

@baude
Copy link
Member

baude commented Feb 23, 2021

couple of observations here. the network config provided ^^^ was created by docker-compose (based on the labels). So the first question I have is if that is true? And then /me wonders if you should be using a network that you create with podman network create?

@skitoxe
Copy link
Author

skitoxe commented Feb 23, 2021

Hm. Any chance the firewall was reloaded since the container was created? That could be related, though I'm not 100% sure how yet.

Yes it has been reloaded.

Do you really have 40 cores on your system? This could be race which is more likely to happen with so many cpu cores.

10 cores and 20 threads per CPU. 2x xeon e5 2640v4.

couple of observations here. the network config provided ^^^ was created by docker-compose (based on the labels). So the first question I have is if that is true? And then /me wonders if you should be using a network that you create with podman network create?

The containers were created by Docker-compose but with no additional network configuration in the .yml file apart from dns configuration passed to the container.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 26, 2021

@skitoxe @mheon @baude is this still an issue?

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Apr 27, 2021

Since I was never answered, I am going to assume no and close. Reopen if I am mistaken.

@rhatdan rhatdan closed this as completed Apr 27, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

6 participants