Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebooted, dangling file in /var/lib/cni/networks/podman prevents container starting #3759

Closed
space88man opened this issue Aug 8, 2019 · 46 comments · Fixed by #4086
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@space88man
Copy link

space88man commented Aug 8, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I did a systemctl reboot; encountered a orphan /var/lib/cni/networks/podman/10.88.0.20 file
which prevented the container from starting.

ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
Error: unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254

Steps to reproduce the issue:

  1. Upgraded Fedora 30 host, rebooted, tried to start a container

Describe the results you received:

INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist 
DEBU[0000] Made network namespace at /var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]} 
INFO[0000] About to add CNI network cni-loopback (type=loopback) 
DEBU[0000] overlay: mount_data=nodev,metacopy=on,lowerdir=/var/lib/containers/storage/overlay/l/Z6S3A3PN6PO5DCD4E3FZZOFLO6:/var/lib/containers/storage/overlay/l/II26Q5SUSAWGOVHLKY3MKC7GNU,upperdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/diff,workdir=/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/work,context="system_u:object_r:container_file_t:s0:c362,c945" 
DEBU[0000] mounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7" at "/var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged" 
DEBU[0000] Created root filesystem for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 at /var/lib/containers/storage/overlay/8cc72886f3925230caa200d3bce8b24599897b0c64ca5a6a4d6d56871f118d4d/merged 
INFO[0000] Got pod network &{Name:freeswitch-init_1 Namespace:freeswitch-init_1 ID:7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 NetNS:/var/run/netns/cni-76dce0ef-599f-b625-7710-a7eeef4159e5 PortMappings:[] Networks:[podman] NetworkConfig:map[podman:{IP:10.88.0.20}]} 
INFO[0000] About to add CNI network podman (type=bridge) 
ERRO[0000] Error adding network: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] unmounted container "7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7" 
DEBU[0000] Cleaning up container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 
DEBU[0000] Network is already cleaned up, skipping...   
DEBU[0000] Container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7 storage is already unmounted, skipping... 
ERRO[0000] unable to start container "freeswitch-init_1": error configuring network namespace for container 7690a7bc4960b76799d906dbc07a32d174a5e75581886349d3604d455e093cf7: failed to allocate for range 0: requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 

Describe the results you expected:
Container starts

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:            1.4.4
RemoteAPI Version:  4
Go Version:         go1.12.7
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.7
  podman version: 1.4.4
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.4-4.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.0.0-dev, commit: 164df8af4e62dc759c312eab4b97ea9fb6b5f1fc'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 7664599040
  MemTotal: 8340746240
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 4294963200
  SwapTotal: 4294963200
  arch: amd64
  cpus: 4
  hostname: containers.localdomain
  kernel: 5.2.5-200.fc30.x86_64
  os: linux
  rootless: false
  uptime: 7m 27.03s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mountopt=nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 6
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):
In /var/lib/cni/networks/podman I have:

  1. 10.88.0.20: contains the ID of the container
  2. last_reserved_ip.0: contains 10.88.0.20
  3. lock

The guard file 10.88.0.20 must have been left behind from the reboot. After deleting the file, the container could start.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 8, 2019
@space88man space88man changed the title requested IP address 10.88.0.20 is not available in range set 10.88.0.1-10.88.255.254 Rebooted, dangling file in /var/lib/cni/networks/podman prevents container starting Aug 8, 2019
@mheon
Copy link
Member

mheon commented Aug 8, 2019 via email

@space88man
Copy link
Author

No; this was after a graceful reboot.

@mheon
Copy link
Member

mheon commented Aug 8, 2019

@mccv1r0 Can you take a look at this? it looks like CNI didn't clean up address reservations after a clean reboot?

@dcbw
Copy link
Contributor

dcbw commented Aug 8, 2019

@mheon the runtime is solely responsible for calling CNI DEL for every container that is no longer running but has not had DEL called on it. I know that podman/CRIO keep a cache of containers somewhere on-disk. When that cache is reconciled with what is actually running when CRIO/podman start, they need to call CNI DEL on every container in that cache list that is not currently running to allow the network plugin to clean up.

@rhatdan
Copy link
Member

rhatdan commented Aug 8, 2019

@dcbw Is this something we could configure to put these files on a tmpfs, so that we don't have to cleanup after a reboot?

@dcbw
Copy link
Contributor

dcbw commented Aug 8, 2019

@rhatdan you have no idea what kind of information the network plugin has to clean up, so you have to tell the network plugin to clean up. Which is CNI DEL.

If your container is no longer running, and it wasn't given a CNI DEL, you must call CNI DEL to clean up.

@dcbw
Copy link
Contributor

dcbw commented Aug 8, 2019

@mheon @rhatdan you have a container database:

// save container state to the database
func (c *Container) save() error {
	if err := c.runtime.state.SaveContainer(c); err != nil {
		return errors.Wrapf(err, "error saving container %s state", c.ID())
	}
	return nil
}

presumably that doesn't get blown away on restart. So after the next time podman runs (for any pod) it'll need to reconcile that database list with what's actually running and prune out the old stuff and call CNI DEL on those that aren't running. Or something like that.

@mheon
Copy link
Member

mheon commented Aug 8, 2019

We do actually blow away most database state on reboot, on the assumption that none of it survived the reboot - what was running no longer is, what was mounted no longer is. It may be possible to work a CNI DEL into the process of refreshing the state post-reboot - I'll investigate.

@mheon
Copy link
Member

mheon commented Aug 8, 2019

Alright, this bit promises to be complicated.

We wipe container state very early in the refresh process, because we can't retrieve containers otherwise - the state is not sane because of the restart, so all of our validation fails. The runtime never touches a state with the CNI result available after a reboot - it's gone by the time we have the ability to actually get containers.

We can change this to preserve the cached CNI result, but some bits of what we pass to OCICNI as part of TeardownPod are not going to survive - network namespace path, for example, is definitely gone, and I don't see a way of preserving it.

@mheon
Copy link
Member

mheon commented Aug 8, 2019

Hmmm. If the network namespace path isn't strictly mandatory - I think we can probably call OCICNI's TeardownPod without preserving the old CNI result.

@mheon
Copy link
Member

mheon commented Sep 23, 2019

Should be fixed by #4086

mheon added a commit to mheon/libpod that referenced this issue Sep 24, 2019
CNI expects that a DELETE be run before re-creating container
networks. If a reboot occurs quickly enough that containers can't
stop and clean up, that DELETE never happens, and Podman
currently wipes the old network info and thinks the state has
been entirely cleared. Unfortunately, that may not be the case on
the CNI side. Some things - like IP address reservations - may
not have been cleared.

To solve this, manually re-run CNI Delete on refresh. If the
container has already been deleted this seems harmless. If not,
it should clear lingering state.

Fixes: containers#3759

Signed-off-by: Matthew Heon <[email protected]>
@sunyitao
Copy link

When two or more containers are started at the same time, the latter always fails

ERRO[0000] Error adding network: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
Error: unable to start container "gitlab": error configuring network namespace for container 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed

@mheon
Copy link
Member

mheon commented Jul 23, 2020

Can you provide a reproducer for this? What Podman version are you on?

@AlexGluck
Copy link

AlexGluck commented Jul 24, 2020

Centos 7 same error, after gracefull reboot some containers don't start.

- Logs begin at Пт 2020-07-24 17:09:33 UTC, end at Пт 2020-07-24 17:10:11 UTC. --
июл 24 17:09:44 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Starting Podman container-pdns-rec-02.service...
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Reading configuration file \"/usr/share/containers/libpod.conf\""
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Merged system config \"/usr/share/containers/libpod.conf\": &{{false false false false false true} 0 {   [] [] []}  docker://  runc map[crun:[/usr/bin/crun /usr/local/bin/crun] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] [crun runc] [crun] [] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] systemd   /var/run/libpod -1 false /etc/cni/net.d/ [/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman []   k8s.gcr.io/pause:3.1 /pause false false  2048 shm    false}"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using conmon: \"/usr/bin/conmon\""
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using graph driver overlay"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using graph root /var/lib/containers/storage"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using run root /var/run/containers/storage"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using static dir /var/lib/containers/storage/libpod"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using tmp dir /var/run/libpod"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Using volume path /var/lib/containers/storage/volumes"
июл 24 17:09:47 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:47Z" level=debug msg="Set libpod namespace to \"\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="[graphdriver] trying provided driver \"overlay\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that overlay is supported"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that metacopy is not being used"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="cached value indicated that native-diff is usable"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="Initializing event backend journald"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=debug msg="using runtime \"/usr/bin/runc\""
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=warning msg="Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument"
июл 24 17:09:48 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:48Z" level=info msg="Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="Made network namespace at /var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="Got pod network &{Name:pdns-rec-02 Namespace:pdns-rec-02 ID:217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 NetNS:/var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:5354 ContainerPort:53 Protocol:udp HostIP:} {HostPort:5354 ContainerPort:53 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]}"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="About to add CNI network cni-loopback (type=loopback)"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="Got pod network &{Name:pdns-rec-02 Namespace:pdns-rec-02 ID:217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 NetNS:/var/run/netns/cni-1c3935ab-eaf9-7c6c-6255-b35aca30a60f Networks:[] RuntimeConfig:map[podman:{IP: PortMappings:[{HostPort:5354 ContainerPort:53 Protocol:udp HostIP:} {HostPort:5354 ContainerPort:53 Protocol:tcp HostIP:}] Bandwidth:<nil> IpRanges:[]}]}"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=info msg="About to add CNI network podman (type=bridge)"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="overlay: mount_data=lowerdir=/var/lib/containers/storage/overlay/l/YGHO4H3I3TNXW65T6OCIN3DC4R:/var/lib/containers/storage/overlay/l/W23YXYEJN45PNRY7CYYCC3KIHG,upperdir=/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/diff,workdir=/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/work"
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="mounted container \"217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805\" at \"/var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/merged\""
июл 24 17:09:51 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:51Z" level=debug msg="Created root filesystem for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 at /var/lib/containers/storage/overlay/0a7d1dce43b21da9c68d7c7e57556b41a541959a2ad7b6d96a8d1b9c7ba8e0a3/merged"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="Error adding network: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="Error while adding pod to CNI network \"podman\": unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="unmounted container \"217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805\""
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Network is already cleaned up, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Cleaning up container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Network is already cleaned up, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=debug msg="Container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805 storage is already unmounted, skipping..."
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local podman[631]: time="2020-07-24T17:09:52Z" level=error msg="unable to start container \"pdns-rec-02\": error configuring network namespace for container 217606977e3670839ce1b56647b776b92aaa36365c2ea2b9a4b3b3afe059c805: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: container-pdns-rec-02.service: control process exited, code=exited status=125
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Failed to start Podman container-pdns-rec-02.service.
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: Unit container-pdns-rec-02.service entered failed state.
июл 24 17:09:52 epp-dev-03-apache-02-pub.12.voskhod.local systemd[1]: container-pdns-rec-02.service failed.

Steps to reproduce:

podman run --name test-01 -d centos:7
podman run --name test-02 -d centos:7
podman run --name test-03 -d centos:7
podman generate systemd -t 5 test-01 > /etc/systemd/system/container-test-01.service
podman generate systemd -t 5 test-02 > /etc/systemd/system/container-test-02.service
podman generate systemd -t 5 test-03 > /etc/systemd/system/container-test-03.service
systemctl daemon-reload
systemctl enable --now container-test-01.service container-test-02.service container-test-03.service
systemctl stop container-test-01.service container-test-02.service container-test-03.service
rm -rf /var/lib/cni/networks/podman
ip link delete cni-podman0
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
iptables -X -t nat
iptables -L -n
iptables -L -n -t nat
systemctl start container-test-01.service & systemctl start container-test-02.service & systemctl start container-test-03.service

@mheon
Copy link
Member

mheon commented Jul 24, 2020

I think your log trace cut off some longer lines - in particular, the actual error message from CNI is missing. Can you try again?

@AlexGluck
Copy link

@mheon Update log and step to reproduce without reboot. Podman version 1.6.4

@AlexGluck
Copy link

AlexGluck commented Jul 24, 2020

podman version

# podman version
Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.12.12
OS/Arch:            linux/amd64

podman info

# podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.8-1.el7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.8, commit: f85c8b1ce77b73bcd48b2d802396321217008762'
  Distribution:
    distribution: '"centos"'
    version: "7"
  MemFree: 584450048
  MemTotal: 1039032320
  OCIRuntime:
    name: runc
    package: runc-1.0.0-67.rc10.el7_8.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 4294963200
  SwapTotal: 4294963200
  arch: amd64
  cpus: 1
  eventlogger: journald
  hostname: epp-dev-03-apache-02-int.12.voskhod.local
  kernel: 3.10.0-1127.13.1.el7.x86_64
  os: linux
  rootless: false
  uptime: 4m 0.05s
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 4
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 6
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

@AlexGluck
Copy link

Basically i thinked when systemd start all containers together CNI network initialisation create race condition. But now i think runtime environment sometime doesn't correct processed. Anyway repeating creating chain in iptables nat table is race.

msg="Error adding network: unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"
msg="Error while adding pod to CNI network \"podman\": unable to create chain CNI-HOSTPORT-SETMARK: running [/usr/sbin/iptables -t nat -N CNI-HOSTPORT-MASQ --wait]: exit status 1: iptables: Chain already exists.\n"

@AlexGluck
Copy link

For qwick fix, i hacked CNI init.

/etc/systemd/system/podman-init.service

[Unit]
Description=Podman hack for init CNI before start containers
Documentation=man:podman-run
[Service]
ExecStartPre=/usr/bin/podman run --rm --name podman-init -d -p 65500:65500/udp centos:centos7 sleep 30
ExecStart=/usr/bin/sleep 10
Type=simple
[Install]
WantedBy=multi-user.target

/etc/systemd/system/container-test-01.service.d/after.conf

[Unit]
After=podman-init.service

@mheon
Copy link
Member

mheon commented Jul 25, 2020 via email

@ephracis
Copy link

ephracis commented Dec 9, 2020

I am seeing that exact error message as root. The problem occurs after a reboot.

My CI labs just got 2.0.5 a couple of days ago but according to @sunyitao this won't fix it, unfortunately.

I have however implemented the WO suggested by @AlexGluck and created a small podman-init.service which seems to have solved the issue for good on some of my Ansible roles.

@mheon
Copy link
Member

mheon commented Dec 9, 2020

@rhatdan This seems like a good candidate for systemd-tmpfiles? Directory not on a tmpfs that we want wiped after each reboot.

@rhatdan
Copy link
Member

rhatdan commented Dec 9, 2020

Sure, what is the path?

@mheon
Copy link
Member

mheon commented Dec 9, 2020

All subfolders of /var/lib/cni/networks

@mheon
Copy link
Member

mheon commented Dec 9, 2020

I think we also need to prevent files named last_reserved_ip.0 and lock from being removed.

@rhatdan
Copy link
Member

rhatdan commented Dec 10, 2020

Won't CNI recreate them if they are removed?

@rhatdan
Copy link
Member

rhatdan commented Dec 10, 2020

# rm -rf  /var/lib/cni/networks/*
# podman run fedora echo hello
hello
# ls /var/lib/cni/networks/*
last_reserved_ip.0  lock

@mheon
Copy link
Member

mheon commented Dec 10, 2020

Nice. OK, safe to remove everything in those directories then.

rhatdan added a commit to rhatdan/podman that referenced this issue Dec 11, 2020
CNI sometimes leaves Network information in /var/lib/cni/networks
when the system crashes or containers do not shut down properly.

This PR will cleanup these left over files, so that container engines
will get a clean enviroment when the system reboots.

Related to: containers#3759

Signed-off-by: Daniel J Walsh <[email protected]>
pmoogi-redhat pushed a commit to pmoogi-redhat/podman that referenced this issue Dec 15, 2020
CNI sometimes leaves Network information in /var/lib/cni/networks
when the system crashes or containers do not shut down properly.

This PR will cleanup these left over files, so that container engines
will get a clean enviroment when the system reboots.

Related to: containers#3759

Signed-off-by: Daniel J Walsh <[email protected]>
@vizv
Copy link

vizv commented Jan 24, 2021

Still occur on Fedora 33, podman 2.2.1

podman info:

host:    
  arch: amd64                 
  buildahVersion: 1.18.0      
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.21-3.fc33.x86_64  
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21, commit: 0f53fb68333bdead5fe4dc5175703e22cf9882ab'
  cpus: 4    
  distribution:
    distribution: fedora
    version: "33"         
  eventLogger: journald
  hostname: valkyrie.viz.network       
  idMappings:                           
    gidmap: null
    uidmap: null             
  kernel: 5.10.9-201.fc33.x86_64
  linkmode: dynamic        
  memFree: 7206068224     
  memTotal: 8145199104
  ociRuntime:
    name: crun                        
    package: crun-0.16-3.fc33.x86_64                                                           
    path: /usr/bin/crun
    version: |-    
      crun version 0.16
      commit: eb0145e5ad4d8207e84a327248af76663d4e50dd
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux          
  remoteSocket: 
    exists: true
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 4071616512
  swapTotal: 4071616512
  uptime: 10m 14.05s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 7
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 1607438270
  BuiltTime: Tue Dec  8 22:37:50 2020
  GitCommit: ""
  GoVersion: go1.15.5
  OsArch: linux/amd64
  Version: 2.2.1

/var/lib/cni/networks/podman won't get cleaned after graceful reboot, so that previously stopped containers won't start (CNI failed to allocate IPs)

Temperately fixed by mounting tmpfs to /var/lib/cni/networks/podman:

# append to /etc/fstab
tmpfs /var/lib/cni/networks/podman tmpfs defaults 0 0

@gunchev
Copy link

gunchev commented Jan 24, 2021

Also happens with podman 3.0.0-0.1.rc1.fc32 (rebuilt src.rpm from koji). Downgraded to podman-2.2.1-1.fc32.x86_64 and had to remove the files by hand. I have custom network:

podman network create --gateway 172.31.13.1 --ip-range 172.31.13.0/24 --subnet 172.31.13.0/24 cni-dmz

what triggered the problem was stopping and starting a container. Same thing locked the IP of another container too. Downgrade did not help, reboot too, so I had to remove the files by hand and the containers started again.

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2021

Could you check if you got this file?

cat /usr/lib/tmpfiles.d/podman.conf
# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks

If it supposed to be deleting content in /var/lib/cni/networks directory?

@vizv
Copy link

vizv commented Jan 25, 2021

Could you check if you got this file?

cat /usr/lib/tmpfiles.d/podman.conf
# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks

If it supposed to be deleting content in /var/lib/cni/networks directory?

I only got the following:

# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-.*
d /run/podman 0700 root root

@vizv
Copy link

vizv commented Jan 25, 2021

You may also install podman in a container with fedora:33 image, and verify content in /usr/lib/tmpfiles.d/podman.conf.
It's from podman-2:2.2.1-1.fc33.x86_64

Since current version doesn't contain D! /var/lib/cni/networks, I might just use my workaround until new version gets released and picked up by fedora devs

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2021

I guess the fix is in podman 3.0

@gunchev
Copy link

gunchev commented Jan 26, 2021

It is:

$ rpm -qf  /usr/lib/tmpfiles.d/podman.conf
podman-3.0.0-0.159.dev.git3ba1a8d.fc32.x86_64

$ cat /usr/lib/tmpfiles.d/podman.conf
# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks

So my case is different, it happened without rebooting. I guess I have to open separate issue if I can reproduce it again...

@rhatdan
Copy link
Member

rhatdan commented Jan 26, 2021

It was working for me the last time I tried it.

@bcarpio
Copy link

bcarpio commented Jan 27, 2021

When two or more containers are started at the same time, the latter always fails

ERRO[0000] Error adding network: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed
Error: unable to start container "gitlab": error configuring network namespace for container 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c: failed to allocate for range 0: 10.88.1.230 has been allocated to 5a920437e66254079842109be75ba0f082ad85761f017e1673c527827abeb91c, duplicate allocation is not allowed

Probably more for the RedHat8 bugtracker but I'll post this here none the less. This problem exists in RedHat8, so if you are trying to start two or more docker containers using systemctl none will start due to this issue, so I was putting:

ExecStartPre=sleep 10

In each of the systemctl start scripts.

@rhatdan
Copy link
Member

rhatdan commented Jan 27, 2021

docker containers? No need for the docker adjective.

podman 3.0 will be in RHEL8.4 release.

@colorsakura
Copy link

colorsakura commented Dec 4, 2021

Error: unable to start container 95bfae788dd9dc358a9b0c1e7513c42cc39a7a040e7df75e8023bb7d9b74fe9d: error configuring network namespace for container 95bfae788dd9dc358a9b0c1e7513c42cc39a7a040e7df75e8023bb7d9b74fe9d: failed to allocate for range 0: 172.16.16.37 has been allocated to 95bfae788dd9dc358a9b0c1e7513c42cc39a7a040e7df75e8023bb7d9b74fe9d, duplicate allocation is not allowed

host:
  arch: amd64
  buildahVersion: 1.19.6
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.25, commit: unknown'
  cpus: 12
  distribution:
    distribution: debian
    version: "11"
  eventLogger: journald
  hostname: debian
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-0.bpo.2-amd64
  linkmode: dynamic
  memFree: 15043907584
  memTotal: 16641290240
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version 0.17
      commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.1
      commit: 6a7b16babc95b6a3056b33fb45b74a6f62262dd4
      libslirp: 4.4.0
  swapFree: 1024454656
  swapTotal: 1024454656
  uptime: 14m 40.67s
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: 9gzrq6w1.mirror.aliyuncs.com
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io
  search:
  - docker.io
store:
  configFile: /home/chauncey/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.10.3
        fuse-overlayfs: version 1.4
        FUSE library version 3.10.3
        using FUSE kernel interface version 7.31
  graphRoot: /home/chauncey/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /run/user/1000/containers
  volumePath: /home/chauncey/.local/share/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 0
  BuiltTime: Thu Jan  1 08:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.9
  OsArch: linux/amd64
  Version: 3.0.1

@mheon
Copy link
Member

mheon commented Dec 6, 2021

This has been fixed on upstream. There are presently no plans to backport this fix to v3.0. You can work around it by mounting a tmpfs on /var/lib/cni/networks.

@soar
Copy link

soar commented Aug 3, 2022

I'm still facing this issue in 3.0:

# podman version
Version:      3.0.1
API Version:  3.0.0
Go Version:   go1.15.15
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64
time="2022-08-03T09:32:59Z" level=error msg="Error adding network: failed to allocate for range 0: 172.16.16.20 has been allocated to 2f38e5cbcd2599febde3c97f799e442edf044f17ec2da1c6c76ab5de6b9babe1, duplicate allocation is not allowed"
time="2022-08-03T09:32:59Z" level=error msg="Error while adding pod to CNI network \"podman\": failed to allocate for range 0: 172.16.16.20 has been allocated to 2f38e5cbcd2599febde3c97f799e442edf044f17ec2da1c6c76ab5de6b9babe1, duplicate allocation is not allowed"
Error: unable to start container "2f38e5cbcd2599febde3c97f799e442edf044f17ec2da1c6c76ab5de6b9babe1": error configuring network namespace for container 2f38e5cbcd2599febde3c97f799e442edf044f17ec2da1c6c76ab5de6b9babe1: failed to allocate for range 0: 172.16.16.20 has been allocated to 2f38e5cbcd2599febde3c97f799e442edf044f17ec2da1c6c76ab5de6b9babe1, duplicate allocation is not allowed

@rhatdan
Copy link
Member

rhatdan commented Aug 3, 2022

As was pointed out by @mheon this is fixed in podman 4.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.