-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rke2-coredns stuck on containercreating, failed to create pod sandbox #5905
Comments
Kubelet is complaining because it can't find calico binary. What cni did you choose? Can you check if the CNI agent pod is running on that node? |
Sure, the pods are running correctly
Also i've checked the logs, in rke2-canal-nm2jh nothing seems to be failing , i can share with you logs , but this is what appears as WARNING
This are the cni folders in the worker nodes
|
Events
|
Sorry, I read the issue too quickly. It can indeed find the calico binary but calico is throwing the error |
Make sure that you've disabled any host firewalls (firewalld/ufw) or other endpoint protection products. It sounds like something is blocking the connection between calico and the apiserver. |
There are no host firewalls or other forms of endpoint protection on the host. However, I discovered a related issue involving the RHEL cloud provider and NetworkManager. Given that I'm utilizing the Debian cloud init image with NetworkManager, could something related to this be causing interference? I've already applied the workaround proposed, and still does not work |
Check the containerd.log to see if there are any additional error messages? Also confirm that kube-proxy is running on this node? |
Kube-proxy is up and running
There is not much more info in containerd.log
|
Kube proxy also indicates communication failure.
|
Something on your node is blocking communication. Please investigate what that might be. |
I just went through something similar and it turned out to be corrupted vxlan packets (bad udp cksum). For my case it was flannel, but I did see some other folks have issues with calico as well. Symptoms pointed to OS firewall or kube-proxy.. but in the end, they were actually okay. Related to vms running on vmware. Check out these discussions: The flannel discussion specifies how to make the change persistent via udev rules. |
Thanks for the info, @burlyunixguy . I'm not sure if it's exactly related to the discussion you mentioned. |
It turns out the problem was the MTU setting of the VXLAN. You can check out more details here: MTU Considerations for VXLAN. Sorry for the misunderstanding! |
Environmental Info:
RKE2 Version:
rke2 version v1.28.9+rke2r1 (07bf87f)
go version go1.21.9 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux rke2-lab-master-1 6.1.0-20-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux
Cluster Configuration:
Describe the bug:
When attempting to deploy the CoreDNS pod on agents nodes after successfully creating an RKE2 cluster.
CoreDNS pods in the agents are stuck on ContainerCreating with the following error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e5580fe9275de080781b229bcb82c0b7dd05af5e2d9b5366af2c978f94ec842d": plugin type="calico" failed (add): unexpected error when reading response body. Please retry. Original error: http2: client connection lost
Steps To Reproduce:
Expected behavior:
CoreDNS Containers should start as expected and provide DNS service to the pods on the agent nodes.
Actual behavior:
CoreDNS pods on the agent nodes remain stuck on "ContainerCreating" state, with the error message mentioned above.
Additional context / logs:
No further logs were found.
Both Hardened-Calico and Hardened-Flannel containers appear to be functioning correctly.
The text was updated successfully, but these errors were encountered: