CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts #2240

ghost · 2021-05-11T12:07:46Z

What happened: After creating the cluster with kind create cluster, the kube-proxy pod have a CrashLoopBackOff Error. This happens at the kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts. With kernel versions 5.12.1.arch1-1 and 5.10.34-1-lts I didn't had the issue.

What you expected to happen: All pods in the cluster should start without problems.

How to reproduce it (as minimally and precisely as possible): On a Arch Linux install with kernel version 5.12.2.arch1-1 or 5.10.35-1-lts with docker installed download the latest version of kind and run kind create cluster.

Anything else we need to know?:

Log of kube-proxy pod:

I0511 11:47:28.906526       1 node.go:172] Successfully retrieved node IP: 172.18.0.2                                                                                
I0511 11:47:28.906613       1 server_others.go:142] kube-proxy node IP is an IPv4 address (172.18.0.2), assume IPv4 operation
I0511 11:47:28.953210       1 server_others.go:185] Using iptables Proxier.                                                                                          
I0511 11:47:28.953346       1 server_others.go:192] creating dualStackProxier for iptables.
W0511 11:47:28.960804       1 server_others.go:492] detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for I
I0511 11:47:28.962804       1 server.go:650] Version: v1.20.2                                                                                                        
I0511 11:47:28.965997       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072                                                                
F0511 11:47:28.966114       1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

Events from pod:

Events:                                                                                                                                                              
Type     Reason     Age               From               Message                                                                                                   
----     ------     ----              ----               -------                                                                                                   
Normal   Scheduled  48s               default-scheduler  Successfully assigned kube-system/kube-proxy-s7w5w to kind-control-plane
Normal   Pulled     2s (x4 over 48s)  kubelet            Container image "k8s.gcr.io/kube-proxy:v1.20.2" already present on machine
Normal   Created    2s (x4 over 45s)  kubelet            Created container kube-proxy                                                                              
Normal   Started    2s (x4 over 45s)  kubelet            Started container kube-proxy                                                                              
Warning  BackOff    1s (x5 over 42s)  kubelet            Back-off restarting failed container

tried it with iptables and nftables, same result with both.

Enviroment:

kind version: (use kind version): Tested both:
- v0.11.0-alpha+1d4788dd7461b3 go1.16.4
- v0.10.0 go1.16.4
Kubernetes version: (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-04-09T16:47:30Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-03-11T06:23:38Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Docker version: (use docker info):

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-tp-docker)

Server:
 Containers: 12
  Running: 1
  Paused: 0
  Stopped: 11
 Images: 8
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8c906ff108ac28da23f69cc7b74f8e7a470d1df0.m
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.35-1-lts
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.666GiB
 Name: avocado
 ID: ZNGF:FTZV:6BK6:VPE3:ZGAR:A5A2:VYEI:LUQE:AEU6:6MHN:ZGTZ:WR2V
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

OS (e.g. from /etc/os-release):

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling

Kernel: 5.10.35-1-lts
CPU: Intel i5-7200U (4) @ 3.100GHz

iptables version: v1.8.7 (legacy)
nftables version: v0.9.8 (E.D.S.)

The text was updated successfully, but these errors were encountered:

cubic3d · 2021-05-11T13:50:41Z

I'm getting same results with 5.12.2-arch1-1.

Quick workaround if a cluster is needed fast: Manually set the parameter with sudo sysctl net/netfilter/nf_conntrack_max=131072 before creating the Kind cluster.

BenTheElder · 2021-05-11T17:24:11Z

This workaround works because kubernetes/kubernetes#44919 (kube-proxy will not try to write if the existing value is high enough, despite the logs suggesting that it set it).

We could explicitly configure the max to 0 or some very small value in kind's kube-proxy possibly, but I think you would still want to increase the actual value for things to work well.

In normal usage kind is not setting this and relying on the host kernel to have a suitable value, as we've encountered
so far. I'm guessing arch reduced the default in their latest kernels?

cc @aojea

Juneezee · 2021-05-11T18:14:39Z

@hyutota @BenTheElder I don't think this is an Arch Linux-only issue.

According to the changleog of Linux 5.12.2, this commit (torvalds/linux@671c54e) has changed the behaviour of netfilter conntrack. I believe this is the commit that has caused this issue after upgrading to Linux 5.12.2.

aojea · 2021-05-11T19:28:30Z

@hyutota @BenTheElder I don't think this is an Arch Linux-only issue.

According to the changleog of Linux 5.12.2, this commit (torvalds/linux@671c54e) has changed the behaviour of netfilter conntrack. I believe this is the commit that has caused this issue after upgrading to Linux 5.12.2.

wow, so it seems that we can't set nf_conntrack_max in kind, it will fail for kernels +5.12.2 🤔

the good thing is that jthe fix seems simple, is just enable by default

kind/pkg/cluster/internal/kubeadm/config.go

Lines 414 to 420 in b6bc112

    
           {{if .RootlessProvider}}conntrack: 
        
           # Skip setting sysctl value "net.netfilter.nf_conntrack_max" 
        
             maxPerCore: 0 
        
           # Skip setting "net.netfilter.nf_conntrack_tcp_timeout_established" 
        
             tcpEstablishedTimeout: 0s 
        
           # Skip setting "net.netfilter.nf_conntrack_tcp_timeout_close" 
        
             tcpCloseWaitTimeout: 0s

ghost · 2021-05-12T07:28:45Z

I can confirm that #2241 fixes the issue for me on kernel 5.12.2-arch1-1.

BenTheElder · 2021-05-12T18:58:26Z

thanks all, #2241 should be in shortly, and since we're quite overdue for a release it should be released soon.

tikessler · 2021-05-24T13:26:50Z

sudo sysctl net/netfilter/nf_conntrack_max=131072

Hello, thanks for sharing. Can you elaborate further what todo exactly?

I deleted all cluster configs. (.minikube, .kube)
After deleting, I ran the above command, but on the host system. But the problem still exists, should it be executed in a pod?

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-74ff55c5b-j67mh            0/1     Running   0          5m48s
kube-system   etcd-minikube                      1/1     Running   0          5m57s
kube-system   kube-apiserver-minikube            1/1     Running   0          5m57s
kube-system   kube-controller-manager-minikube   1/1     Running   0          5m57s
kube-system   kube-proxy-d5zbf                   0/1     Error     6          5m48s
kube-system   kube-scheduler-minikube            1/1     Running   0          5m57s
kube-system   storage-provisioner                1/1     Running   5          6m2s

$ kubectl -n kube-system logs kube-proxy-d5zbf     
I0524 13:25:56.346577       1 node.go:172] Successfully retrieved node IP: 192.168.49.2
I0524 13:25:56.346621       1 server_others.go:142] kube-proxy node IP is an IPv4 address (192.168.49.2), assume IPv4 operation
W0524 13:25:56.362150       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0524 13:25:56.362217       1 server_others.go:185] Using iptables Proxier.
I0524 13:25:56.362371       1 server.go:650] Version: v1.20.2
I0524 13:25:56.362572       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0524 13:25:56.362584       1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

I assume there is no other solution atm? I am using Minikube with Kernel 5.10.36-2-MANJARO.

Juneezee · 2021-05-24T13:30:49Z

@muellerti Could you try the following steps and see if it works?

Delete your local cluster first
Set sudo sysctl net/netfilter/nf_conntrack_max=524288
Start a new local cluster again

tikessler · 2021-05-24T13:43:08Z

Man! Thanks! That worked, I should have thought of that myself.

It contains a fix for kubernetes-sigs/kind#2240 We've hit when running GitHub actions actions/runner-images#3673

- use a ubuntu 20.04 - set nf_conntrack_max to avoid CrashLoopBackOff for kube proxy (see kubernetes-sigs/kind#2240 (comment)) - print cluster info as soon as KinD is up Signed-off-by: Mattia Mazzucato <[email protected]>

- use ubuntu 20.04 - set nf_conntrack_max to avoid CrashLoopBackOff for kube proxy (see kubernetes-sigs/kind#2240 (comment)) - print cluster info as soon as KinD is up Signed-off-by: Mattia Mazzucato <[email protected]>

- use a ubuntu 20.04 - set nf_conntrack_max to avoid CrashLoopBackOff for kube proxy (see kubernetes-sigs/kind#2240 (comment)) - print cluster info as soon as KinD is up - use kind-action v1.4.0 - bump KinD to v0.10.0 - use kube-tools v1.5.0 Signed-off-by: Mattia Mazzucato <[email protected]>

- use ubuntu 20.04 - set nf_conntrack_max to avoid CrashLoopBackOff for kube proxy (see kubernetes-sigs/kind#2240 (comment)) - print cluster info as soon as KinD is up - use kind-action v1.4.0 - bump KinD to v0.10.0 - use kube-tools v1.5.0 Signed-off-by: Mattia Mazzucato <[email protected]>

- use ubuntu 20.04 - set nf_conntrack_max to avoid CrashLoopBackOff for kube proxy (see kubernetes-sigs/kind#2240 (comment)) - print cluster info as soon as KinD is up - use kind-action v1.4.0 - bump KinD to v0.10.0 Signed-off-by: Mattia Mazzucato <[email protected]>

For kube-proxy not becoming ready, like this: semaphore@semaphore-vm:~$ kubectl logs kube-proxy-42v55 -n kube-system I0727 19:55:26.230888 1 node.go:135] Successfully retrieved node IP: 172.17.0.2 I0727 19:55:26.230923 1 server_others.go:172] Using ipvs Proxier. I0727 19:55:26.230930 1 server_others.go:174] creating dualStackProxier for ipvs. W0727 19:55:26.232364 1 proxier.go:420] IPVS scheduler not specified, use rr by default W0727 19:55:26.232522 1 proxier.go:420] IPVS scheduler not specified, use rr by default W0727 19:55:26.232538 1 ipset.go:107] ipset name truncated; [KUBE-6-LOAD-BALANCER-SOURCE-CIDR] -> [KUBE-6-LOAD-BALANCER-SOURCE-CID] W0727 19:55:26.232546 1 ipset.go:107] ipset name truncated; [KUBE-6-NODE-PORT-LOCAL-SCTP-HASH] -> [KUBE-6-NODE-PORT-LOCAL-SCTP-HAS] I0727 19:55:26.232648 1 server.go:571] Version: v1.17.0 I0727 19:55:26.232963 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072 F0727 19:55:26.232982 1 server.go:485] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied See kubernetes-sigs/kind#2240 and kubernetes-sigs/kind#2241.

Resolves an crash issue on linux machines noted here: kubernetes-sigs/kind#2240 Co-authored-by: Ken Sipe <[email protected]>

wkjun · 2021-08-11T13:37:06Z

I'm getting same results with 5.12.2-arch1-1.

Quick workaround if a cluster is needed fast: Manually set the parameter with sudo sysctl net/netfilter/nf_conntrack_max=131072 before creating the Kind cluster.

i'm using ubuntu operating system ,add the line to sysctl config file /etc/sysctl.d/99-sysctl.conf will alse be find, if you not want to manually set the parameter every reboot your pc or note book.

net.netfilter.nf_conntrack_max=131072

and let the parameter work imediately and restart minikube

sudo sysctl -p
minikube stop 
minikube start

This addresses #18 kubernetes-sigs/kind#2240 Signed-off-by: Kurt Garloff <[email protected]>

* Inject sysctl changing nf_conntrack_max to 131072. This addresses #18 kubernetes-sigs/kind#2240 * Need to load nf_conntrack kmod for the sysctl setting. * Add nf_conntrack to modules-load.d to ensure sysctl works. This is required to be reboot safe. Signed-off-by: Kurt Garloff <[email protected]>

manchinagarjuna · 2021-08-27T18:21:56Z

Hi All,

With the latest Kind binary and Kubernetes images I am no longer seeing this issue.
However, on one machine we were able to see this issue on multi-node kind setup.

Bumping the max number to the same value we observed in the Kube-proxy logs, solved the issue and we are able to create the cluster fine.
sudo sysctl net/netfilter/nf_conntrack_max=393216

Kind version: 0.11.1
Kubernets node images: 1.20.7
Host os: Debian 10 buster

I'm wondering if there is a long term solution that avoids the need of this?

Thanks in advance!

BenTheElder · 2021-08-30T07:44:01Z

You should not see this issue with any number of nodes in the latest release. Can you confirm that this is minimally reproducible with the latest release and file a new issue if not?

manchinagarjuna · 2021-08-30T16:07:46Z

Thanks for your response Ben.

On further investigation, the old Kind executable is taking precedence in the path on that particular environment. Removing it out showed no issues, cluster is up and running as expected.
The Kind 0.11.1 with node images 1.20.7 works without the additional settings

We also update the base image version to v1.21.1. kubernetes-sigs/kind#2240 Signed-off-by: Hajime Tazaki <[email protected]>

yharish991 · 2021-09-13T22:16:11Z

how do i fix this issue on mac os?

deepak7093 · 2021-09-18T15:55:24Z

change maxPerCore to 0 in configMap of kube-proxy to leave the limit as-is and ignore conntrack-min

https://serverfault.com/questions/1063166/kube-proxy-wont-start-in-minikube-because-of-permission-denied-issue-with-proc#

kubernetes-sigs/kind#2240 Kind v0.11.1 is going to fix this issue, but upgrading breaks the bank-vaults which takes a lot to upgrade to postponed. Instead I added a post-kind-create-clusters step to fix the issue for macOS context where a ConfigMap change is enough.

arkodg · 2021-10-26T18:17:11Z

@yharish991 run brew upgrade kind which will upgrade your kind version to 0.11.1 and fix the issue.

As mentioned in kubernetes-sigs/kind#2240 there was a change in the linux kernel after 5.12.2 that makes nf_conntrack_max read-only in non-init network namespaces, which prevents kind's kube-proxy container from working correctly on kind versions older than v0.11.1. This PS updates the script to download v0.11.1 to avoid this issue. If older versions are needed, the kind url can be set as an environment variable as shown in airshipctl/tools/deployment.provider_common/01_install_kind.sh. Relates-To: #583 Change-Id: Icd9e649fa112e9f6307034ec69dde5d4a7ad613d

Kind was failing to come up silently likely due to kubernetes-sigs/kind#2240 Bumping versions appears to have fixed the issue.

ghost added the kind/bug Categorizes issue or PR as related to a bug. label May 11, 2021

aojea mentioned this issue May 11, 2021

don't set conntrack parameters in kube-proxy #2241

Merged

k8s-ci-robot closed this as completed in #2241 May 12, 2021

BenTheElder assigned aojea May 12, 2021

BenTheElder mentioned this issue May 14, 2021

kube-proxy crash looping #2244

Closed

prezha mentioned this issue May 15, 2021

prevent kube-proxy trying to change nf_conntrack_max kubernetes/minikube#11419

Merged

kpango mentioned this issue May 16, 2021

[BUG] failing on linux kernel >= 5.12.2 (and >= 5.11.19, ...) k3d-io/k3d#607

Closed

iwilltry42 mentioned this issue May 17, 2021

[BUG] k3d on Arch Linux past 5.12.1 kernel fails k3d-io/k3d#604

Closed

mentalblock mentioned this issue May 18, 2021

Minikube's kube-proxy fails NixOS/nixpkgs#123315

Closed

simao mentioned this issue May 31, 2021

storage-provisioner failing with errors. kubernetes/minikube#11513

Closed

bk201 mentioned this issue Jun 7, 2021

[BUG] Integration test fails on newer kernels harvester/harvester#927

Closed

liranbg mentioned this issue Jun 30, 2021

CI - fix latest issues nuclio/nuclio#2240

Merged

lukaszo mentioned this issue Jun 30, 2021

Bump kind version capactio/capact#380

Merged

lukaszo added a commit to capactio/capact that referenced this issue Jul 1, 2021

Bump kind version (#380)

04b7c99

It contains a fix for kubernetes-sigs/kind#2240 We've hit when running GitHub actions actions/runner-images#3673

paologallinaharbur mentioned this issue Jul 2, 2021

[CI/CD] fix e2e tests newrelic/newrelic-infra-operator#86

Merged

kensipe pushed a commit to kudobuilder/kuttl that referenced this issue Aug 9, 2021

Bump kind to v0.11.1 (#313)

25776a2

Resolves an crash issue on linux machines noted here: kubernetes-sigs/kind#2240 Co-authored-by: Ken Sipe <[email protected]>

curx mentioned this issue Aug 17, 2021

kube-proxy in kind-kind ending in Error (CrashloopBackoff) SovereignCloudStack/k8s-cluster-api-provider#18

Closed

garloff added a commit to SovereignCloudStack/k8s-cluster-api-provider that referenced this issue Aug 18, 2021

Inject sysctl changing nf_conntrack_max to 131072.

7d61cba

This addresses #18 kubernetes-sigs/kind#2240 Signed-off-by: Kurt Garloff <[email protected]>

garloff added a commit to SovereignCloudStack/k8s-cluster-api-provider that referenced this issue Aug 18, 2021

Inject sysctl changing nf_conntrack_max to 131072.

01b0dc0

This addresses #18 kubernetes-sigs/kind#2240 Signed-off-by: Kurt Garloff <[email protected]>

brycahta mentioned this issue Aug 19, 2021

Fix mock-ip-count test aws/amazon-ec2-metadata-mock#131

Merged

thehajime added a commit to ukontainer/runu that referenced this issue Sep 6, 2021

k8s: bump up kind version to v0.11.1

ca73a3e

We also update the base image version to v1.21.1. kubernetes-sigs/kind#2240 Signed-off-by: Hajime Tazaki <[email protected]>

thehajime added a commit to ukontainer/runu that referenced this issue Sep 8, 2021

k8s: bump up kind version to v0.11.1

ab50a49

We also update the base image version to v1.21.1. kubernetes-sigs/kind#2240 Signed-off-by: Hajime Tazaki <[email protected]>

thehajime added a commit to ukontainer/runu that referenced this issue Sep 8, 2021

k8s: bump up kind version to v0.11.1

c354cd2

We also update the base image version to v1.21.1. kubernetes-sigs/kind#2240 Signed-off-by: Hajime Tazaki <[email protected]>

adleong mentioned this issue Sep 22, 2021

Upgrade kind to 0.11.1 linkerd/linkerd2#6944

Merged

pregnor mentioned this issue Oct 18, 2021

Fixed cp up --provider kind kube-proxy issue on MacOS banzaicloud/banzai-cli#353

Merged

1 task

jimthompson5802 mentioned this issue Jan 4, 2022

Hang in unit test test_async.py::test_versions dask/dask-kubernetes#379

Closed

SataQiu mentioned this issue Mar 27, 2022

don't set conntrack parameters in kube-proxy kubernetes/kubeadm#2672

Merged

akijakya mentioned this issue Aug 9, 2022

Run fixKind0_9_0KubeProxy method on linux systems banzaicloud/banzai-cli#371

Merged

1 task

margamanterola mentioned this issue Aug 31, 2022

CFP: Better error message when failing to contact the API server cilium/cilium#21148

Open

david-caro mentioned this issue Nov 2, 2022

T321886 toolforge/paws#225

Merged

Zanabazer mentioned this issue Apr 6, 2023

Upgrade to latest version of Kind prometheus/test-infra#528

Open

minu1215 mentioned this issue Feb 21, 2024

minikube dashboard 실행시 url 출력되지 않는 오류 beyond-sw-camp/be01-101#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts #2240

CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts #2240

ghost commented May 11, 2021

cubic3d commented May 11, 2021

BenTheElder commented May 11, 2021

Juneezee commented May 11, 2021

aojea commented May 11, 2021

ghost commented May 12, 2021 •

edited by ghost

Loading

BenTheElder commented May 12, 2021

tikessler commented May 24, 2021 •

edited

Loading

Juneezee commented May 24, 2021

tikessler commented May 24, 2021

wkjun commented Aug 11, 2021 •

edited

Loading

manchinagarjuna commented Aug 27, 2021 •

edited

Loading

BenTheElder commented Aug 30, 2021

manchinagarjuna commented Aug 30, 2021

yharish991 commented Sep 13, 2021

deepak7093 commented Sep 18, 2021

arkodg commented Oct 26, 2021

CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts #2240

CrashLoopBackOff Error in kube-proxy with kernel versions 5.12.2.arch1-1 and 5.10.35-1-lts #2240

Comments

ghost commented May 11, 2021

cubic3d commented May 11, 2021

BenTheElder commented May 11, 2021

Juneezee commented May 11, 2021

aojea commented May 11, 2021

ghost commented May 12, 2021 • edited by ghost Loading

BenTheElder commented May 12, 2021

tikessler commented May 24, 2021 • edited Loading

Juneezee commented May 24, 2021

tikessler commented May 24, 2021

wkjun commented Aug 11, 2021 • edited Loading

manchinagarjuna commented Aug 27, 2021 • edited Loading

BenTheElder commented Aug 30, 2021

manchinagarjuna commented Aug 30, 2021

yharish991 commented Sep 13, 2021

deepak7093 commented Sep 18, 2021

arkodg commented Oct 26, 2021

ghost commented May 12, 2021 •

edited by ghost

Loading

tikessler commented May 24, 2021 •

edited

Loading

wkjun commented Aug 11, 2021 •

edited

Loading

manchinagarjuna commented Aug 27, 2021 •

edited

Loading