-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNI plugin: error getting ClusterInformation: connection is unauthorized: Unauthorized #5712
Comments
This looks similar to: #4857 |
In that linked issue, one of the user's reported that they did not see the issue with k8s 1.20 - that might be worth trying if that k8s version is an option for you. |
Thanks for your all for the information. Cannot replace K8S v1.21 to v1.20. Will continue to debug and share process with you guys. Thanks |
k8s 1.21 added token cycling and pro-active token withdrawal when pods are deleted. As far as we can tell, there's something weird about how this interacts with Calico on some systems. I'm attempting to reproduce, but no luck so far. Do you have any clues about how to trigger this behaviour? |
clusterVersion: v1.23 Not sure how useful my comment would be, but I encountered this error when i accidentally rebooted one of the nodes in the cluster. The killing was triggered due to disk pressure event being triggered on the node, reasons of which I'm no entirely sure. Lowered imageGC thresholds a bit before, but from my understanding they shouldn't trigger disk pressure. Maybe I'm wrong. ps: I also recall a similar situation with an api that constnatly got evicted every couple of days (disk pressure) and it's evicted pods were never cleaned up. Didn't really look up into why the pods remained, but maybe they also were supposed to be cleaned up, but never did because of this error. |
@earthlovepython: this may sound silly, but could you try to restart kubelet. Had a little bit of fun today with it.. Spend hours of upgrading from different k8s and Calico versions, in order to reproduce the issue on a second cluster, led my nowhere. I then tried to activate debug logging and restarted kubelet and suddenly all pods (in my case calico-kube-controller and coredns) just became ready... Luckily I had a snapshot of all VMs of the "broken" cluster, so I could verify that, and I can confirm it works like a charm. I guess |
Found a way to reproduce this issue, at least on a "Calico the hard way" setup (haven't tested it with regular deployments): After this the cni logs also print these messages:
Restarting kubelet afterwards again, solves the issue again.
|
When do you plan on releasing 3.23.2 ? |
Aiming to cut that release this week. |
Encountering the same issue, how to solve it?
Examining
Please help |
@winkee01 what version of Calico are you using? |
I installed the latest version of Calico, and kubernetes is 1.24.2 |
@winkee01 I'd recommend opening a new issue and filing it out with the exact version of Calico as well as other platform and environment information as requested by the issue template. |
This just happened to me in an older |
hit the same issue, and lbogdan's workaround fixed it for me. |
in our case, it was the cert expired. |
Same issue in 1.22 with Calico Events: Normal Scheduled 3m47s default-scheduler Successfully assigned default/pod-with-cm to worker-node01 |
v3.23.2
|
did you fix this |
Run into a similar issue and worked around by NTP synchronization :) |
I had slightly different issue, but restarting calico pod on the node with failed pod and then the failed pod helped. Pod moved to another node after restart. MicroK8s v1.26.0 revision 4390, Calico v3.23.5 |
Hi, |
I did restart, NetworkManager, Containerd & Kubelet. Still the problem remains. |
hi @vyom-soft, Is the configuration of |
I just had to |
kubectl delete pod calico-node-xxxx -n kube-system , A new Pod was created and the problem is solved. |
I had a similar issue today and all the pods on my cluster were stuck in Unknown or Terminating status, including the calico-node-xxxx. |
@davidassigbi 's solution worked for me. Was using Microk8s, guess something weird with |
Sorry, I'm still a learner. But |
NO! Do not do this unless you want to completely remove Calico and trash your cluster; it will delete Calico's IP address database.
|
我修改了master 和node的时间,然后出现了相同的报错Jan 25 11:34:46 test-node01 kubelet: E0125 11:34:46.938130 56745 remote_runtime.go:269] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox "969d9ca8a8fef61b41ed6810db45e3d5765e85a62f080ae9135a1d61cf508417": plugin type="calico" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized" podSandboxID="969d9ca8a8fef61b41ed6810db45e3d5765e85a62f080ae9135a1d61cf508417" |
I have the same problem on windows node. |
It Works! I haved synced time in my VM before, maybe it's the reaseon |
it's useful for me!!! I got the same mistake, and I reboot the machine, than it's recovered. |
I met issue again. It was first seen earlier this month. Resolved by rebooting the nodes because they are just my test env on VMware. I paused VMs during nights.
k describe pod prometheus-k8s-1
Your Environment
calico node yaml:
|
rebooting solved my issue (my nodes are for testing and control plan was shut down for some days around holidays). |
I run into this on a fairly regular cadence, maybe 3-4 times per year on a small cluster of 4 nodes. Nuking the calico-node pod on the affected node and waiting for it to restart reliably allows the terminating pod to make progress, but is anyone aware of a permanent fix that doesn't require operator intervention? |
K8S & Calico information
HostOS: RHEL 8.2
K8S: on-premise cluster; version is v1.21.1; "IPVS" mode; IP4/IP6 dual stack; installed using kubespray
Calico: version is v3.18.4; non-BGP mode; enabled "IP6" DNAT.
Our docker image is built on top of "RHEL ubi:8"
We do not setup external ETCD cluster.
"kubectl describe" output
Expected Behavior
Should start POD successfully
Steps to Reproduce
Sorry, the issue happened two times on different K8S cluster in our lab. And I did not keep any logs....
Myself want to know to reproduce too.
My initial thought(maybe wrong)
Since "kubectl describe" has "connection is unauthorized", I searched source code of K8S v1.21.1. K8S code does NOT has it. Then search it in Calico v3.22 (I am using V3.18.4, but there is not be big difference), find that "connection is unauthorized" exist in "libcalico-go/lib/erros/errors.go" . So, looks like the issue is caused by Calico. Then, use "error getting ClusterInformation" as keyword to search in K8S code but cannot find. And search in Calico code, can find it. So, I have confidence to say the issue is 100% related with Calico.
Because "connection is unauthorized" error prompt is related with "type ErrorConnectionUnauthorized struct", and "ErrorConnectionUnauthorized " is related with cooperation with ETCD, looks like that the issue is communication issue between Calico and ETCD.
By the way, /var/log/calico/cni/ does NOT has anything related with "etcd" during POD start/destroy while I did normal operation.
What I expect:
If possible, can you please tell me
1). Which webpage describes control/data flow between Calico and ETCD
2). log files and location that whole Calico uses
3). Did I miss any debug information
Thanks
The text was updated successfully, but these errors were encountered: