Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter should cordon a node once it expires #4613

Closed
yangwwei opened this issue Sep 12, 2023 · 5 comments
Closed

Karpenter should cordon a node once it expires #4613

yangwwei opened this issue Sep 12, 2023 · 5 comments
Labels
deprovisioning Issues related to node deprovisioning feature New feature or request

Comments

@yangwwei
Copy link

yangwwei commented Sep 12, 2023

Description

Observed Behavior:

When a node expires (reaches the value defined in ttlSecondsUntilExpired) and the node has pods with do-not-evict annotation set, Karpenter doesn't cordon the node causing new pods to keep being scheduled onto the expired node. This causes the problem that such nodes can never be terminated because there are new pods continuously allocated to this node.

Expected Behavior:

Karpenter should either cordon or taint the nodes once they expire.

Proposed fix:

Taint the node with karpenter.sh/nodeExpired=true:NoSchedule, only pods that tolerate this taint can be scheduled onto an expired node. I understand there is some concern about utilization, when a node is not considered deprovisionable, we can allow some pods allocated to it. However, I think it is reasonable to limit what kind of pods can be scheduled to such expired nodes. We rely heavily on the node "expiration" feature to roll out AMI versions, this helps us to catch up with the latest node runtime automatically. Without this, the expired nodes can potentially stay there forever.

Miro

Reproduction Steps (Please include YAML):

  • add ttlSecondsUntilExpired to a small value, e.g 10m
  • launch a pod with do-not-evict=true annotation

Versions:

  • Chart Version: 0.27.0
  • Kubernetes Version (kubectl version): 1.24
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@yangwwei yangwwei added the bug Something isn't working label Sep 12, 2023
@ellistarn ellistarn added feature New feature or request and removed bug Something isn't working labels Sep 12, 2023
@ellistarn
Copy link
Contributor

We've considered this, or the variant where we add a PreferNoSchedule taint instead. @jonathan-innis @njtran, FYI.

@ellistarn
Copy link
Contributor

Related to kubernetes-sigs/karpenter#621

@yangwwei
Copy link
Author

Thanks @ellistarn . I looked at kubernetes-sigs/karpenter#621. And similar issues like kubernetes-sigs/karpenter#622 and kubernetes-sigs/karpenter#623 .
Should we add one extra step for all the deprovisioners, that taints a node when deprovisioners' ShouldDeprovision returns true? This way it solves all these problems. I think NoSchedule taint makes more sense here because that's a stronger semantics the scheduler will follow.
I can try to submit a PR for this if we reach a consensus, we are hitting this issue on our clusters and wish to have a solution soon. Thanks

@ellistarn
Copy link
Contributor

@njtran is currently working on changes in this space, so it probably makes sense to sit tight. Feel free to come discuss at working group, and thanks a ton for the offer to contribute!

@njtran njtran added the deprovisioning Issues related to node deprovisioning label Sep 27, 2023
@njtran
Copy link
Contributor

njtran commented Sep 27, 2023

Closing this as a duplicate of kubernetes-sigs/karpenter#622.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprovisioning Issues related to node deprovisioning feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants