Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: address disruption taint race condition #1180

Merged
merged 5 commits into from
Apr 24, 2024

Conversation

jmdeal
Copy link
Member

@jmdeal jmdeal commented Apr 10, 2024

Fixes #1167

Description
Implements a short-term, zero-API mitigation for the consolidation race condition called out in issues #651 and #1167. This PR will not close out issue #651 since we still may change the exact mechanism as part of the holistic taint redesign for #624.

How was this change tested?
Manual validation + karpenter-provider-aws E2E suite

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 10, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 10, 2024
@jmdeal jmdeal force-pushed the race-fix branch 2 times, most recently from 9a7d654 to 223cbfd Compare April 10, 2024 18:15
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 11, 2024
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 11, 2024
@jmdeal jmdeal force-pushed the race-fix branch 3 times, most recently from e95d76b to 3482c66 Compare April 17, 2024 06:32
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 17, 2024
@coveralls
Copy link

coveralls commented Apr 17, 2024

Pull Request Test Coverage Report for Build 8821837855

Details

  • 59 of 79 (74.68%) changed or added relevant lines in 4 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.09%) to 78.826%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/disruption/emptynodeconsolidation.go 9 10 90.0%
pkg/controllers/disruption/multinodeconsolidation.go 5 6 83.33%
pkg/controllers/disruption/validation.go 39 57 68.42%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/expiration.go 2 90.91%
Totals Coverage Status
Change from base Build 8796364550: 0.09%
Covered Lines: 8350
Relevant Lines: 10593

💛 - Coveralls

pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 23, 2024
pkg/controllers/disruption/consolidation_test.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/emptynodeconsolidation.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/emptynodeconsolidation.go Outdated Show resolved Hide resolved
pkg/controllers/disruption/validation.go Outdated Show resolved Hide resolved
@jmdeal jmdeal force-pushed the race-fix branch 2 times, most recently from d348441 to 888c16a Compare April 23, 2024 18:54
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 23, 2024
@jmdeal jmdeal force-pushed the race-fix branch 3 times, most recently from 859021b to d0a73fa Compare April 24, 2024 17:09
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 24, 2024
@jonathan-innis
Copy link
Member

/hold Wait for @njtran to review as well

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2024
Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

pkg/controllers/disruption/validation.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@njtran
Copy link
Contributor

njtran commented Apr 24, 2024

/unhold
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmdeal, jonathan-innis, njtran

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jonathan-innis,njtran]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod gets evicted still even with "karpenter.sh/do-not-disrupt" annotation
6 participants