Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods stuck in ContainerCreating - error getting ClusterInformation: connection is unauthorized: Unauthorized #9271

Closed
bilaloslo92 opened this issue Sep 23, 2024 · 6 comments

Comments

@bilaloslo92
Copy link

bilaloslo92 commented Sep 23, 2024

All of the pods in the cluster are stuck in ContainerCreating and are not able to startup due to the CSI error: error getting ClusterInformation: connection is unauthorized: Unauthorized. This happend during migration fra manifest to operator installation of Calico. Following the guide.

Expected Behavior

All Pods should we be in running state.

Current Behavior

Pods are stuck in ContainerCreating and are not able to startup due to:

  Normal   Scheduled               6s    default-scheduler  Successfully assigned calico-system/csi-node-driver-v24qb to kamparas-master-20240909-1-test-001a
  Warning  FailedCreatePodSandBox  6s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4827b3ac17192a95cf9446aeb3956b222dca057c603a199be7b933b3a5be35ac": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Normal   SandboxChanged          5s    kubelet            Pod sandbox changed, it will be killed and re-created.
$ k get po
NAME                                      READY   STATUS              RESTARTS   AGE
calico-kube-controllers-c695fcb67-fch9w   0/1     ContainerCreating   0          2d20h
calico-kube-controllers-c695fcb67-hhp5l   0/1     Terminating         0          10d
calico-node-2dx42                         1/1     Running             0          2m55s
calico-node-5r52n                         1/1     Running             0          51s
calico-node-6wvs6                         0/1     Running             0          21s
calico-node-fvg4w                         1/1     Running             0          113s
calico-node-rg28n                         1/1     Running             0          82s
calico-node-t7krm                         1/1     Running             0          2m24s
calico-typha-67444b77cc-6sdgn             1/1     Running             0          2d19h
calico-typha-67444b77cc-grdht             1/1     Running             0          2d19h
calico-typha-67444b77cc-skk2d             1/1     Running             0          2d19h
csi-node-driver-28msg                     0/2     ContainerCreating   0          23s
csi-node-driver-52dcr                     0/2     ContainerCreating   0          23s
csi-node-driver-7jpbt                     0/2     ContainerCreating   0          23s
csi-node-driver-jgtdt                     0/2     ContainerCreating   0          23s
csi-node-driver-qff5v                     0/2     ContainerCreating   0          23s
csi-node-driver-xvtm6                     0/2     ContainerCreating   0          23s

The kube-apiserver logs:

E0923 07:57:28.409062       1 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account calico-system/calico-cni-plugin has been deleted]"
E0923 07:57:28.415617       1 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account calico-system/calico-cni-plugin has been deleted]"
E0923 07:57:28.688217       1 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account calico-system/calico-cni-plugin has been deleted]"
E0923 07:57:28.722983       1 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account calico-system/calico-cni-plugin has been deleted]"

The /etc/cni/net.d/calico-kubeconfig does not seem to be authorized:

root@/etc/cni/net.d# kubectl --kubeconfig /etc/cni/net.d/calico-kubeconfig auth whoami
error: You must be logged in to the server (Unauthorized)

However the jwt token of (/etc/cni/net.d/calico-kubeconfig) seems to be valid:

{
  "aud": [
    "https://kubernetes.default.svc.[REDACTED]"
  ],
  "exp": 1727164326,
  "iat": 1727077926,
  "iss": "https://kubernetes.default.svc.[REDACTED]",
  "kubernetes.io": {
    "namespace": "calico-system",
    "serviceaccount": {
      "name": "calico-cni-plugin",
      "uid": "45684d76-c78b-4de5-a337-cfc43ebf5ca0"
    }
  },
  "nbf": 1727077926,
  "sub": "system:serviceaccount:calico-system:calico-cni-plugin"
}
   iat: 1727077926 9/23/2024, 9:52:06 AM
   nbf: 1727077926 9/23/2024, 9:52:06 AM
   exp: 1727164326 9/24/2024, 9:52:06 AM

Possible Solution

Context

Migration from manifest installation to operator.

Your Environment

  • Calico version: v3.27.4
  • Orchestrator version (e.g. kubernetes, mesos, rkt): v1.29.8
  • Operating System and version: Ubuntu 20.04 LTS
  • Link to your project (optional):
@caseydavenport
Copy link
Member

service account calico-system/calico-cni-plugin has been deleted

Sounds like something has deleted the CNI service account which has rendered the token invalid.

Are you able to see if that serviceaccount exists?

@bilaloslo92
Copy link
Author

bilaloslo92 commented Oct 1, 2024

service account calico-system/calico-cni-plugin has been deleted

Sounds like something has deleted the CNI service account which has rendered the token invalid.

Are you able to see if that serviceaccount exists?

Thank you for the reply @caseydavenport.

Yes, the serviceAccount exists (it also exists in kube-system:

$ k get sa -n calico-system
NAME                      SECRETS   AGE
calico-cni-plugin         0         20d
calico-kube-controllers   0         19d
calico-node               0         12d
calico-typha              0         19d
default                   0         19d
tigera-operator           0         12d

@caseydavenport
Copy link
Member

@bilaloslo92 could you check if it is in the process of terminating?

e.g.,

kubectl get serviceaccount -n calico-system calico-cni-plugin  -o yaml

And check for whether or not a deletionTimestamp is present? I've seen it before where the object is terminating, but is stuck, so Kubernetes believes that it is deleted but it hasn't actually been removed yet.

@bilaloslo92
Copy link
Author

bilaloslo92 commented Oct 1, 2024

kubectl get serviceaccount -n calico-system calico-cni-plugin  -o yaml

Ah, I see. It has the deletionTimestamp present:

$ kubectl get serviceaccount -n calico-system calico-cni-plugin  -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2024-09-10T12:29:31Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-09-10T13:18:35Z"
  finalizers:
  - tigera.io/cni-protector
  name: calico-cni-plugin
  namespace: calico-system

@bilaloslo92
Copy link
Author

After deleting the finalizers and the tigera operator recreated the serviceAccount. That fixed my issue. All pods are now running.

@caseydavenport
Copy link
Member

Perfect, thanks for following up!

I am not sure how it got in that state, but FWIW I believe we have a fix for this merged to later versions of Calico.

Namely, this PR: tigera/operator#3381

Which should have been released in Calico v3.28.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants