Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Consider unready nodes as in flight #2224

Merged
merged 1 commit into from
Aug 2, 2022
Merged

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Jul 29, 2022

Fixes ##2164

Description
During large scale ups, nodes sometimes flip/flop from ready to not read as things come online. This change considers unready nodes as in flight to avoid overscaling.

In the case of hardware, networking, or other failure, it's possible for a node to transition from ready to unready. Previously, Kubernetes would evict the pods on the node after 5 minutes (default), triggering additional scale out. Instead, Karpenter will consider these taints as ephemeral and the node as in flight. Intervention (automated or otherwise) is required to remove these nodes.

This may also be considered as a safety feature. If some issue caused nodes to consistently transition Ready -> NotReady, Karpenter would keep scaling out to provisioner limits. Instead, this waits for the nodes to recover or an operator to intervene. Additional automation (e.g. kubernetes-sigs/karpenter#750) can address this case.

How was this change tested?

  • TEST_FILTER=TestUtilization make e2etests

Does this change impact docs?

WIP

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

None

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ellistarn ellistarn requested a review from a team as a code owner July 29, 2022 22:33
@ellistarn ellistarn requested a review from njtran July 29, 2022 22:33
@netlify
Copy link

netlify bot commented Jul 29, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 60e225f
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/62e461fd08cac500085b9c37

@ellistarn ellistarn marked this pull request as draft July 29, 2022 22:43
@ellistarn ellistarn marked this pull request as ready for review August 1, 2022 16:53
Copy link
Contributor

@tzneal tzneal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ellistarn ellistarn merged commit 59f2315 into aws:main Aug 2, 2022
@ellistarn ellistarn deleted the raceflight branch August 3, 2022 21:33
njtran pushed a commit to njtran/karpenter-provider-aws that referenced this pull request Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants