You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
calico-node container reboot missed the serviceaccount token update event and cannot update /host/etc/cni/net.d/calico-kubeconfig. This leads to following calico cni error.
2023-01-16 05:06:47.002 [ERROR][48795] plugin.go 121: Final result of CNI ADD was an error. error=error getting ClusterInformation: connection is unauthorized: Unauthorized
2023-01-16 05:06:47.111 [ERROR][48860] plugin.go 518: Final result of CNI DEL was an error. error=error getting ClusterInformation: connection is unauthorized: Unauthorized
Expected Behavior
calico cni should access apiserver successfully after calico-node container reboot and no need to wait for the next token refresh event.
Current Behavior
calico cni cannot access apiserver until the next token refresh event which is about 1 hour by default.
Possible Solution
calico-node container boot process should try to update /host/etc/cni/net.d/calico-kubeconfig as soon as possible。
Steps to Reproduce (for bugs)
1.scale Typha replicas to be zero
2.calico-node container begins to reboot because of Typha access error.
3.kubelet will try to reboot calico-node container with an exponential back-off delay
4.until the serviceaccount token is updated in host. ("inspect" container to find the mounted host directory)
5.restore Typha and let calico-node reboot successfully
6.compare /etc/cni/net.d/calico-kubeconfig with the serviceaccount token in host directory.
Context
Our cluster depends on OpenYurt to implement node autonomy. When node autonomy is enabled, the Pods in this node won't be rescheduled and still keep running. And if node network is down, the calico-node container will reboot because of Typha access error. After node network is recovered, calico-node becomes running but cni isn't allowed to access apiserver. This leads to the CNI ADD or DEL error.
Your Environment
Calico version: v3.21.4
Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.22
Operating System and version: CentOS 8.4
The text was updated successfully, but these errors were encountered:
The good news is that this looks very much like another issue already on our radar which is currently being worked on: #7171
The bad news is that the fix will not be ported into the next maintenance release of v3.21 since that's outside of our support window (current - 3).
Going to close this issue but please do take a look at the issue I linked and if you think what you're seeing is sufficiently different and warrants a separate look we can re-open and evaluate from there.
calico-node container reboot missed the serviceaccount token update event and cannot update /host/etc/cni/net.d/calico-kubeconfig. This leads to following calico cni error.
2023-01-16 05:06:47.002 [ERROR][48795] plugin.go 121: Final result of CNI ADD was an error. error=error getting ClusterInformation: connection is unauthorized: Unauthorized
2023-01-16 05:06:47.111 [ERROR][48860] plugin.go 518: Final result of CNI DEL was an error. error=error getting ClusterInformation: connection is unauthorized: Unauthorized
Expected Behavior
calico cni should access apiserver successfully after calico-node container reboot and no need to wait for the next token refresh event.
Current Behavior
calico cni cannot access apiserver until the next token refresh event which is about 1 hour by default.
Possible Solution
calico-node container boot process should try to update /host/etc/cni/net.d/calico-kubeconfig as soon as possible。
Steps to Reproduce (for bugs)
1.scale Typha replicas to be zero
2.calico-node container begins to reboot because of Typha access error.
3.kubelet will try to reboot calico-node container with an exponential back-off delay
4.until the serviceaccount token is updated in host. ("inspect" container to find the mounted host directory)
5.restore Typha and let calico-node reboot successfully
6.compare /etc/cni/net.d/calico-kubeconfig with the serviceaccount token in host directory.
Context
Our cluster depends on OpenYurt to implement node autonomy. When node autonomy is enabled, the Pods in this node won't be rescheduled and still keep running. And if node network is down, the calico-node container will reboot because of Typha access error. After node network is recovered, calico-node becomes running but cni isn't allowed to access apiserver. This leads to the CNI ADD or DEL error.
Your Environment
The text was updated successfully, but these errors were encountered: