-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taint nodes before deletion #621
Comments
This is a bit confusing to me -- the term Consolidation relies on a unified "termination controller" in karpenter, so its cordon+drain logic is identical to other forms of termination (e.g. expiry). Every node we terminate undergoes the following process:
It's not clear to me which of these steps you're falling under, but I don't believe tainting the node would solve your problem. We should be issuing an evict for any running Consul pod and allowing it to clean up in its GracefulTerminationPeriod. Have you been able to observe this in action for more details? |
Yes correct, sorry should've been clearer when i was referring to the kube cluster and when the consul cluster. Overloaded terms!
The Consul Agent pods are part of a DaemonSet and therefore won't be explicitly evicted. In the specific cases I've seen the node is considered empty because the DaemonSet pods are filtered out But even if the node were being replaced Karpenter won't explicitly evict DaemonSet pods, a very similar problem to the Just deleting the node might be fine? Applying a On second glance I have Kubernetes nodes created by Karpenter that have correctly and gracefully left the Consul cluster, but they were nodes that did actual work. |
I was about to recommend using K8s graceful node shutdown to give your consul pods time to deregister. However, it appears it doesn't work with the version of systemd (219) shipped with Amazon Linux 2 (kubernetes/kubernetes#107043 (comment)). Seems like Graceful node shutdown would be the way to go for this type of issue though if we can get systemd updated or when AL2022 is supported. I believe it does work with Ubuntu. |
Do the consul agents tolerate NotReady/Unreachable taints? If you remove that toleration, then things should just work. Karpenter only looks at pods that reschedule, not at whether or not its owned by a daemonset. |
Oh, interesting! That does look like the more correct solution. Tainting is definitely a hack/workaround. In addition to having a systemd version that actually works with this the
Yeah daemonsets magically tolerate taints for unschedulable / not-ready etc. The tolerations aren't in the spec for the DS but are for the pods, which is why the NTH can add a custom taint that isn't tolerated.
|
It's pretty trivial to plumb the config through the AWSNodeTemplate, but we'll probably wait on that until the EKS Optimized AMI supports the configuration which will probably be when we migrate to AL2022. I don't believe Bottlerocket supports it either. It's pretty easy to patch it within your user-data in the AWSNodeTemplate if you wanted to try it with Ubuntu or build your own AMI with an updated Systemd. I think this would work (although haven't tested yet):
|
I think we'd also like to taint nodes the moment that Karpenter thinks those nodes have become eligible for consolidation. This lets us quickly untaint them if we see unschedulable Pods that we think might fit there. Otherwise, we'd leave the taint in place through the node drain, shutdown and termination. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
This didn't really get triaged one way or the other. /remove-lifecycle rotten |
Tell us about your request
When removing nodes due to consolidation I would like to be able to apply a taint to the node before it is removed.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Reason for this is to be able to gracefully stop DaemonSet pods, see related issues below
I have consul agents running on nodes via DaemonSet, these agents join the cluster.
If they are just killed then they sit around in the cluster as
failed
, if the pod is given a stop signal then it will gracefully leave the cluster and then exit.When a node is just deleted it leaves a bunch of hanging agents in my Consul cluster.
Applying a
NoExecute
taint prior to deletion will evict those pods.System DaemonSets (e.g. Kube-proxy) tolerate all taints and so this won't evict those.
Are you currently working around this issue?
Without Karpenter nodes are generally only removed
a) Manually, in which case I manually taint the node with a
noExecute
taintb) By the node-termination-handler which is configured to add a taint as well
With Karpenter... well the workaround is to manually clear out failed nodes from my Consul cluster or get this feature added!
Additional Context
aws/aws-node-termination-handler#273
kubernetes/kubernetes#75482
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: