-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to create pod sandbox: rpc - error getting ClusterInformation connection is unauthorized: Unauthorized #8379
Comments
I'd recommend using manifests from an official release, as it's possible that master is unstable for some reason. v3.27.0 is the most recent release right now.
This error suggests that the calico-cni-plugin serviceaccount doesn't have permission to get ClusterInformations. Can you share the contents of this command?
|
Here is the ouput of the command
|
Seems like the CNI plugin has permissions to get ClusterInformation, which suggests this isn't an RBAC issue as much as a more general authorization issue. This thread includes a number of potential reasons why this might happen: #5712 Including:
How old is this cluster by the way? |
One thing that might be useful to check is if restarting calico/node on the affected node improves things at all. |
@caseydavenport , if you mean NTP synchronization issues between Master and node, then they show exactly same time no difference. E |
@caseydavenport YES, I rebooted the VM and pod was created succesfuly and I was able to access the mysql container. Can you please tell me what can be the root of this issue that disturbed calico from functioning correctly? |
Restarting the node suggests there was some temporary state in place that had expired and was refreshed on reboot. The most likely thing would be the CNI plugin's bearer token. What version of Calico do you have installed? e.g.,
Newer versions of Calico should automatically update the token to prevent cases like this starting in v3.24 it seems: #5910 However, you would need to be on v3.24 or greater and also have properly updated manifests that volume mount the necessary CNI configuration directory into calico/node so that it can provide refreshed tokens to the CNI plugin. Otherwise, I think the tokens expire after about a year. |
Hi, here is the output of the command
Calico is V 3.26
|
Interesting, looks like a dev build is being run rather than a production release? |
OK, so what should I do? Should I switch to production stable release? If yes, how? Thanks |
@eliassal I'm curious how you ended up installing the newest code from github (back in ~April) rather than a production release - do you remember where you started installing from? Generally speaking everyone should stick with a stable release unless you're testing something out that hasn't been released yet. https://docs.tigera.io/calico/latest/about/ has the docs for the current release (v3.27.0) |
Thanks @matthewdupre but the link you provided does not indicate how to upgrade and if there is any chance to break current config. |
There are upgrade docs in the side bar: https://docs.tigera.io/calico/latest/operations/upgrading/kubernetes-upgrade @eliassal I'm afraid I can't guarantee you won't break your config - you're running an unreleased / unsupported version of Calico. |
@caseydavenport OK, I will go througfh the upgrade doc but tell me whta is |
@eliassal you can ignore the section about Host Endpoints - that's only for upgrades from versions older than v3.14. You can read about calicoctl in the documentation, it's a CLI tool. |
I had same problem after upgrade to 3.27.0, but a complete restart of calico solved it:
|
Nope, seems 3.27 is total fubar. Had to restart calico 4 times already |
@davhdavh what's the error you get in your cluster?
|
I believe the "Unauthorized" error message to be distinct from the typical RBAC error. IIUC, if this was an RBAC issue, we'd see additional context along the lines of this:
(or similar, writing it out from memory) I believe the simple "Unauthorized" means that there is something more fundamental going on - i.e., the certificates in-use have expired or perhaps the token itself has expired. |
Another issue with the same symptom: #7171 Relevant bit:
One thing to check here would be the calico/node pod logs from the affected node - does it contain any logs indicating that it has successfully (or unsuccessfully) refreshed the CNI plugin token? You'll want to look for logs from |
Aha, yes that's important context if it's only happening on Windows nodes. Likely a bug in how the token refresh works on Windows nodes (or perhaps isn't being enabled on Windows nodes?). CC @coutinhop |
Any workaround? Pretty tired of the clusters being half broken every morning |
@davhdavh if I understood you correctly, you're now using the Windows operator install that came out in v3.27.0, right? Could you set |
Yes. We were using the manual host-process setup in 3.26, so it really shouldn't be a very big change.
sure, will send next time it is stuck.
No, we should be running with the most basic setup there is that includes windows.
Yes, but it is long enough that I haven't figured out the timing yet.
Happens on both our dev cluster (1 main linux worker + control-plane and 2 micro control-planes and 1 windows node)
Yes.
v1.29.0
1.7.2 for dev and 1.7.11 for preprod. |
@coutinhop Here are the logs... |
@coutinhop any workarounds? it is getting quite annoying to have to fix this manually every single day |
Here is a small workaround script to monitor the problem, and kill the pods
|
@davhdavh sorry for the delay! While I could not find anything relevant in the logs you provided, that lead me to look into the exact reason why I couldn't find any token refresher messages in the logs, and it turns out it doesn't run on windows 😢
I'll get started right away on working that into the Windows scripts... In the meantime, I'm glad you found a work around. I'll keep you posted on a fix... |
@caseydavenport @matthewdupre Hi, I decided to reinstall K8s on a new fresh ubuntu, I am a little bit confused about instructions at https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart |
@eliassal please open a new issue - sounds unrelated to the original problem here and best to keep separate concerns separated for anyone looking in the future. |
I have K8S up and running and able to deploy and run different Pods/containers. Today, I tried to deply mysql to it with PVC and PV
After deploying, container get stuck in "ContainerCreating" status, gets terminated and recreated
When I d describe Pod I see this
Expected Behavior
Pod should run with persistent volume
Current Behavior
Pod get stuck in "ContainerCreating" status
Context
Enclosed the yaml files
mysql-storage.txt
mysql-deployment.txt
for PV, PVC and mysql deployment
Your Environment
Calico version : as indicated above I used https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
Orchestrator version: kubernetes 1.26
Operating System and version: ubuntu 22.04
The text was updated successfully, but these errors were encountered: