-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calico/node token is invalidated by Kubernetes when the pod is evicted, leading to CNI failures #4857
Comments
Well that's certainly bizarre. Not sure what has changed in that area that might cause this issue, especially given the RBAC seems to have the correct permissions present! Will see what I can find.. |
@dghubble could you confirm that the One thing that might be relevant here is that Kubernetes recently enabled service account projection by default: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume You may need manifest changes so that Calico can maintain the credentials on disk as they are rotated. I see that you have the correct volume mount here: https://github.com/poseidon/terraform-render-bootstrap/blob/master/resources/calico/daemonset.yaml#L174-L177 You may also need to set CALICO_MANAGE_CNI=true in the calico/node env to enable the right logic, though. |
When That seems to match what Calico's release Setting
|
we're facing the same issue here . is there any suggested solutions ? Our Environment: |
The error is certainly unusual. I haven't been able to reproduce this at all in my own rigs. Generally, and RBAC issue would show something more precise - e.g., "serviceaccount X is not allowed to Y resouce Z". Issues with bad TLS credentials would also show up more clearly. I'm not really sure the root cause of this, and I think to figure it out we probably need to dig into the API server logs to see why it is rejecting the request as unauthorized. |
To pass Kubernetes v1.22 conformance testing, Typhoon used flannel instead of Calico. I'll re-run with Calico during the v1.23 cycle. |
@caseydavenport we have rollbacked to k8s 1.20.11 , it's seems to be releated to the k8s version , we didn't have any problems since . |
I had a go at reproing this.
That hit the same issue, I think:
Looking at calico-node logs from that node, I see that the calico-node pod has a start time that is after the end of the test. I wonder if the problem is that calico-node doesn't tolerate the taint that's being added and is killed? |
If I manually add the same taint that the test uses ( |
If I add a "tolerate everything" to the calico-node daemonset:
and re-run the e2e test, I see the test pass:
|
@dghubble So I think the problem lies with the calico manifest - not tolerating the taint that that test uses. Where does Typhoon get the calico manifest from? Does it vendor the manifest or get it from the calico docs? Oh - unless there's a way to ensure that the CNIs kubeconfig doesn't get deleted when calico-node gets killed? @caseydavenport ? |
Just realised that my sonobuoy run doesn't use the same settings as the OP provided.
Running that with the "tolerate everything" setting - both tests pass. |
Hm, I didn't think that deleting the calico/node pod would remove the CNI kubeconfig. At least I don't think anything in Calico does that. It might be that the token is invalidated and since calico/node isn't running it can't update the config on the host? |
How often does the token get cycled? Because we hit this every time when I run the test (without the toleration). For that behaviour, the token cycling time would have to be ~1 min |
From https://kubernetes.slack.com/archives/C0EN96KUY/p1647856984220609, it appears that kubernetes actively revokes service account credentials when pods are deleted. |
Ah interesting. I found the vendored calico manifests in https://github.com/poseidon/terraform-render-bootstrap/blob/master/resources/calico/daemonset.yaml They have:
Whereas https://docs.projectcalico.org/manifests/calico.yaml has:
@dghubble Perhaps the Typhoon manifest needs to add
|
Thanks for looking into this folks! I know CNI's docs often show an "allow everywhere" toleration (i.e. Choosing on behalf of users that a Calico DaemonSet should be on ALL nodes would limit use cases. For example,
Instead, Typhoon allows
From your investigation, it sounds like having this conformance test pass will require listing what those expected taints are, and provisioning the cluster so that Calico tolerates them. I suppose the reason Cilium and flannel don't hit this is because they're not relying on credentials in the same way. |
I don't think that's true any more? @caseydavenport could probably confirm. |
Yep, this one at least is no longer the case (manifests are multi-arch now).
Yeah, this would only be hit if the CNI plugin on the host needs to make API calls and is doing so using the serviceaccount token of the daemonset pod that installed it. One option here might be to use the TokenRequest API directly to provision a separate token not bound to the life of the calico/node pod. In general I agree that we can't expect "Tolerate all" to be acceptable for every single cluster in existence, but I do think it is the correct default for what we ship because it will be right for the vast majority of clusters.
I believe for use cases like this we have switched to using node affinities rather than taints/tolerations. For example, this node affinity prevents us from running on fargate nodes: https://github.com/tigera/operator/blob/master/pkg/render/node.go#L711 |
Awesome to see the multi-arch manifest images. I'll check those out, that'll help remove one case. I agree, for the vast majority of clusters your example seems great. I wouldn't advocate changing it either. I may look at node affinities, but having those be conditional is a lot more tricky just with the Terraform logic available to us. And taints do seem to express the situation fairly clearly. I'm not sure affinities would have a different end result (e.g. Kubernetes E2E might have an equivalent test there that gets thrown off in the same way). Thanks to your investigation @lwr20, this looks like a detail of how CNCF conformance tests run, you'd agree? That wouldn't affect real clusters as far as I can tell. We could just say, if we need to pass conformance, you need to tolerate
|
I agree. I certainly don't think that particular conformance test is intended to mandate that "CNIs must work without talking to the kubernetes apiserver" for example.
Of course if we could do this, that would be ideal. But will need some prototyping and testing of course. |
Adding the DaemonSet toleration for
It would be nice to not hit this if that's a reasonable thing to do. Presumably that would be a separate issue if its desired. |
@caseydavenport We have run into this issue in one of our clusters in a slightly different scenario. For bin packing reasons, we scale the resource requests of Is there a plan to resolve this issue by for example using the token api directly or otherwise decoupling the validity of the token used for CNI from the |
As we would like to have this issue fixed properly, we would like to contribute a solution to this.
Instead of using the token of the @caseydavenport Would you be open to such a contribution? |
Hi, You can find more about this issue here |
Yes, this is the approach that I was musing on as well. I think it is worth exploring this to see what it would look like and what limitations it might have (hopefully none) |
@caseydavenport Should I create a corresponding pull request or do you plan to explore it yourself? |
@ScheererJ I'd be happy to review a PR for this if you have one. Otherwise I will take a look at it myself once v3.23 is out the door (so in a couple of weeks). |
@caseydavenport Feel free to take a look at #5910 if you have some time to spare. I will be on vacation next week, though. Hence, there is no hurry from my side. |
Expected Behavior
Calico CNI plugin tears down Pod in a timely manner.
Current Behavior
Calico CNI plugin shows errors terminating Pods, and therefore eviction takes too long. Especially relevant in Kubernetes conformance testing.
The natural things to check are RBAC permissions, which match recommendations:
To be certain, we can use the actual kubeconfig Calico writes to the host's
/etc/cni/net.d
. It does indeed seem to have permission to get clusterinformations. The error above is unusual.Steps to Reproduce (for bugs)
Context
This issue affects Kubernetes Conformance tests:
The test in question creates two Pods that don't tolerate a taint, and expects them to be terminated within certain times. In Kubelet logs, the Calico CNI plugin is complaining with the logs above and termination takes too long.
Your Environment
The text was updated successfully, but these errors were encountered: