You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My master node is running fine quickly, but the worker nodes are not working due to cni plugin not initialized
Extracted from kubectl describe node debian-node-2
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 10 Jun 2024 15:46:00 +0800 Mon, 10 Jun 2024 15:46:00 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Mon, 10 Jun 2024 15:45:56 +0800 Mon, 10 Jun 2024 15:45:50 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 10 Jun 2024 15:45:56 +0800 Mon, 10 Jun 2024 15:45:50 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 10 Jun 2024 15:45:56 +0800 Mon, 10 Jun 2024 15:45:50 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Mon, 10 Jun 2024 15:45:56 +0800 Mon, 10 Jun 2024 15:45:50 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Logs extraced from kubectl logs -n kube-system canal-ftzlb(for debian-node-2)
2024-06-10 08:58:51.323 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/profiles" error=Get "https://172.23.192.1:443/api/v1/namespaces?limit=500": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.323 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get "https://172.23.192.1:443/api/v1/nodes?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.323 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints" error=Get "https://172.23.192.1:443/api/v1/pods?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.323 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices" error=Get "https://172.23.192.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.323 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies" error=Get "https://172.23.192.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.325 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/hostendpoints?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.560 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations"
2024-06-10 08:58:51.587 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/globalnetworksets" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/globalnetworksets?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.731 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/networkpolicies" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/networkpolicies?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:51.900 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
2024-06-10 08:58:51.926 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2024-06-10 08:58:51.947 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/globalnetworkpolicies" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/globalnetworkpolicies?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.160 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/felixconfigurations"
2024-06-10 08:58:52.201 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/networksets" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/networksets?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.323 [INFO][63] status-reporter/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/caliconodestatuses"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/profiles"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice"
2024-06-10 08:58:52.324 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"
2024-06-10 08:58:52.326 [INFO][61] felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
2024-06-10 08:58:52.454 [INFO][63] status-reporter/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/caliconodestatuses" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/caliconodestatuses?limit=500&resourceVersion=23696&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/profiles" error=Get "https://172.23.192.1:443/api/v1/namespaces?limit=500": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/nodes" error=Get "https://172.23.192.1:443/api/v1/nodes?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice" error=Get "https://172.23.192.1:443/api/v1/services?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints" error=Get "https://172.23.192.1:443/api/v1/pods?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices" error=Get "https://172.23.192.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies" error=Get "https://172.23.192.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
2024-06-10 08:58:52.454 [INFO][61] felix/watchercache.go 194: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations" error=Get "https://172.23.192.1:443/apis/crd.projectcalico.org/v1/bgpconfigurations?limit=500&resourceVersion=0&resourceVersionMatch=NotOlderThan": dial tcp 172.23.192.1:443: connect: connection refused
I checked the logs and it seems that the connection to 172.23.192.1:443 was refused.
But when I check iptables, it shows:
and I try use curl and telnet to coonnect 172.23.192.1:443 in debian-node-2,it works
This seems to be working fine, I have been debugging this for days, but still have no luck.
I also tried to reinstall several times, but the problem reappeared stably.
Finally, I solved it by systemctl restart containerd.service on debian-node-2(work node)
Although I solved the problem, I still have a huge doubt, why can I just restart containerd?
At the same time, my master node (i.e., debian-node-1) would randomly reboot, which was very confusing to me.
It usually manifested itself in a way similar to client_loop: send disconnect: Broken pipe, which I initially thought was a problem with the ssh connection, but when I left the server alone overnight and connected again the next day, it would automatically reboot again.
I had no memory issues, and even the memory usage was not high. Command journalctl -xb -p err did not indicate any problems.
I have previously installed Vanilla Kubernetes on debian-node-1, and also installed a Kind-based k8s container. But there was no restart problem. I have already done a system reset before installing kubewarf-enhanced-k8s.
debian-node-2 (worker node) have same problem.
This is debian-node-1 and debian-node-2 summary from neofetch. The two machines are connected to the same router, which has a bypass route 192.168.2.201 set up to handle the proxy.
root@debian-node-1:~# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:56:31Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:51:02Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
debian-node-2:
root@debian-node-2:~/deploy# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:56:31Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:51:02Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
The text was updated successfully, but these errors were encountered:
What happened?
I followed this documentation to install the dev k8s environment.
My master node is running fine quickly, but the worker nodes are not working due to
cni plugin not initialized
Extracted from
kubectl describe node debian-node-2
Logs extraced from
kubectl logs -n kube-system canal-ftzlb
(fordebian-node-2
)I checked the logs and it seems that the connection to
172.23.192.1:443
was refused.But when I check iptables, it shows:
and I try use
curl
andtelnet
to coonnect172.23.192.1:443
indebian-node-2
,it worksThis seems to be working fine, I have been debugging this for days, but still have no luck.
I also tried to reinstall several times, but the problem reappeared stably.
Finally, I solved it by
systemctl restart containerd.service
ondebian-node-2
(work node)Although I solved the problem, I still have a huge doubt, why can I just restart containerd?
At the same time, my master node (i.e.,
debian-node-1
) would randomly reboot, which was very confusing to me.It usually manifested itself in a way similar to
client_loop: send disconnect: Broken pipe
, which I initially thought was a problem with the ssh connection, but when I left the server alone overnight and connected again the next day, it would automatically reboot again.I had no memory issues, and even the memory usage was not high. Command
journalctl -xb -p err
did not indicate any problems.I have previously installed Vanilla Kubernetes on
debian-node-1
, and also installed a Kind-based k8s container. But there was no restart problem. I have already done a system reset before installing kubewarf-enhanced-k8s.debian-node-2
(worker node) have same problem.This is
debian-node-1
anddebian-node-2
summary fromneofetch
. The two machines are connected to the same router, which has a bypass route192.168.2.201
set up to handle the proxy.What did you expect to happen?
Worker nodes can run normally without
systemctl restart containerd.service
Master node will not auto restart
How can we reproduce it (as minimally and precisely as possible)?
followed this documentation
Software version
debian-node-1
:debian-node-2
:The text was updated successfully, but these errors were encountered: