chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

vadasambar · 2024-01-30T18:01:28Z

What type of PR is this?

/kind cherry-pick

What this PR does / why we need it:

cherry-picks #5729 to 1.29

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added: `--scale-down-delay-type-local` flag. It specifies if `--scale-down-delay-after-*` flags should be applied locally per nodegroup or globally across all nodegroups)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Signed-off-by: vadasambar <[email protected]> feat: update scale down status after every scale up - move scaledown delay status to cluster state/registry - enable scale down if `ScaleDownDelayTypeLocal` is enabled - add new funcs on cluster state to get and update scale down delay status - use timestamp instead of booleans to track scale down delay status Signed-off-by: vadasambar <[email protected]> refactor: use existing fields on clusterstate - uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus` - changed the above existing fields a little to make them more convenient for use - moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go) Signed-off-by: vadasambar <[email protected]> refactor: remove note saying only `scale-down-after-add` is supported - because we are supporting all the flags Signed-off-by: vadasambar <[email protected]> fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false` Signed-off-by: vadasambar <[email protected]> refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add` - because it is not true anymore - we are supporting all `--scale-down-delay-after-*` flags per nodegroup Signed-off-by: vadasambar <[email protected]> test: fix clusterstate tests failing Signed-off-by: vadasambar <[email protected]> refactor: move back initializing processors logic to from static autoscaler to main - we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors - and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice Signed-off-by: vadasambar <[email protected]> refactor: revert changes related to `clusterstate` - since I am going with observer pattern Signed-off-by: vadasambar <[email protected]> feat: add observer interface for state of scaling - to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same) - refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer Signed-off-by: vadasambar <[email protected]> refactor: remove params passed to `clearScaleUpFailures` - not needed anymore Signed-off-by: vadasambar <[email protected]> refactor: revert clusterstate tests - approach has changed - I am not making any changes in clusterstate now Signed-off-by: vadasambar <[email protected]> refactor: add accidentally deleted lines for clusterstate test Signed-off-by: vadasambar <[email protected]> feat: implement `Add` fn for scale state observer - to easily add new observers - re-word comments - remove redundant params from `NewDefaultScaleDownCandidatesProcessor` Signed-off-by: vadasambar <[email protected]> fix: CI complaining because no comments on fn definitions Signed-off-by: vadasambar <[email protected]> feat: initialize parent `ScaleDownCandidatesProcessor` - instead of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately Signed-off-by: vadasambar <[email protected]> refactor: add scale state notifier to list of default processors - initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn - this allows more flexibility Signed-off-by: vadasambar <[email protected]> refactor: add observer interface - create a separate observer directory - implement `RegisterScaleUp` function in the clusterstate - TODO: resolve syntax errors Signed-off-by: vadasambar <[email protected]> feat: use `scaleStateNotifier` in place of `clusterstate` - delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory) - register `clustertstate` with `scaleStateNotifier` - use `Register` instead of `Add` function in `scaleStateNotifier` - fix `go build` - wip: fixing tests Signed-off-by: vadasambar <[email protected]> test: fix syntax errors - add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable) Signed-off-by: vadasambar <[email protected]> feat: wip track scale down failures along with scale up failures - I was tracking scale up failures but not scale down failures - fix copyright year 2017 -> 2023 for the new `pointers` package Signed-off-by: vadasambar <[email protected]> feat: register failed scale down with scale state notifier - wip writing tests for `scale_down_candidates_delay_processor` - fix CI lint errors - remove test file for `scale_down_candidates_processor` (there is not much to test as of now) Signed-off-by: vadasambar <[email protected]> test: wip tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <[email protected]> test: add unit tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <[email protected]> refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor` - not needed Signed-off-by: vadasambar <[email protected]> test: better doc comments for `TestGetScaleDownCandidates` Signed-off-by: vadasambar <[email protected]> refactor: don't ignore error in `NGChangeObserver` - return it instead and let the caller decide what to do with it Signed-off-by: vadasambar <[email protected]> refactor: change pointers to values in `NGChangeObserver` interface - easier to work with - remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now) - add tests for clusterstate's `RegisterScaleUp` Signed-off-by: vadasambar <[email protected]> refactor: conditions in `GetScaleDownCandidates` - set scale down in cool down if the number of scale down candidates is 0 Signed-off-by: vadasambar <[email protected]> test: use `ng1` instead of `ng2` in existing test Signed-off-by: vadasambar <[email protected]> feat: wip static autoscaler tests Signed-off-by: vadasambar <[email protected]> refactor: assign directly instead of using `sdProcessor` variable - variable is not needed Signed-off-by: vadasambar <[email protected]> test: first working test for static autoscaler Signed-off-by: vadasambar <[email protected]> test: continue working on static autoscaler tests Signed-off-by: vadasambar <[email protected]> test: wip second static autoscaler test Signed-off-by: vadasambar <[email protected]> refactor: remove `Println` used for debugging Signed-off-by: vadasambar <[email protected]> test: add static_autoscaler tests for scale down delay per nodegroup flags Signed-off-by: vadasambar <[email protected]> chore: rebase off the latest `master` - change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master` Signed-off-by: vadasambar <[email protected]> test: fix clusterstate test failing Signed-off-by: vadasambar <[email protected]> test: fix failing orchestrator test Signed-off-by: vadasambar <[email protected]> refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor` - describes the processor better Signed-off-by: vadasambar <[email protected]> refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver` - makes it easier to understand for someone not familiar with the codebase Signed-off-by: vadasambar <[email protected]> docs: reword code comment `after` -> `for which` Signed-off-by: vadasambar <[email protected]> refactor: don't return error from `RegisterScaleDown` - not needed as of now (no implementer function returns a non-nil error for this function) Signed-off-by: vadasambar <[email protected]> refactor: address review comments around ng change observer interface - change dir structure of nodegroup change observer package - stop returning errors wherever it is not needed in the nodegroup change observer interface - rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand) Signed-off-by: vadasambar <[email protected]> refactor: make nodegroupchange observer thread-safe Signed-off-by: vadasambar <[email protected]> docs: add TODO to consider using multiple mutexes in nodegroupchange observer Signed-off-by: vadasambar <[email protected]> refactor: use `time.Now()` directly instead of assigning a variable to it Signed-off-by: vadasambar <[email protected]> refactor: share code for checking if there was a recent scale-up/down/failure Signed-off-by: vadasambar <[email protected]> test: convert `ScaleDownCandidatesDelayProcessor` into table tests Signed-off-by: vadasambar <[email protected]> refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()` - makes it easier to understand what the function does Signed-off-by: vadasambar <[email protected]> test: replace scale state notifier `Register` -> `RegisterForNotifications` in test - to fix syntax errors since it is already renamed in the actual code Signed-off-by: vadasambar <[email protected]> refactor: remove `clusterStateRegistry` from `delete_in_batch` tests - not needed anymore since we have `scaleStateNotifier` Signed-off-by: vadasambar <[email protected]> refactor: address PR review comments Signed-off-by: vadasambar <[email protected]> fix: add empty `RegisterFailedScaleDown` for clusterstate - fix syntax error in static autoscaler test Signed-off-by: vadasambar <[email protected]> (cherry picked from commit 5de49a1)

k8s-ci-robot · 2024-01-30T18:01:31Z

@vadasambar: The label(s) kind/cherry-pick cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind cherry-pick

What this PR does / why we need it:

cherry-picks #5729 to 1.29

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?
Added: `--scale-down-delay-type-local` flag. It specifies if `--scale-down-delay-after-*` flags should be applied locally per nodegroup or globally across all nodegroups)
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

linux-foundation-easycla · 2024-01-30T18:01:33Z

The committers listed above are authorized under a signed CLA.

✅ login: vadasambar / name: Suraj Banakar(बानकर) | スラジ (e05d34e)

Shubham82 · 2024-01-30T18:25:26Z

Thanks @vadasambar
/lgtm

vadasambar · 2024-02-02T18:07:37Z

/assign @MaciekPytel

because Maciek is the maintainer doing releases this month

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#schedule

mwielgus

/lgtm
/approve

k8s-ci-robot · 2024-03-18T10:59:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mwielgus, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [mwielgus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2024

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 30, 2024

vadasambar changed the base branch from master to cluster-autoscaler-release-1.29 January 30, 2024 18:01

k8s-ci-robot added the area/cluster-autoscaler label Jan 30, 2024

k8s-ci-robot requested review from feiskyer and x13n January 30, 2024 18:01

vadasambar mentioned this pull request Jan 30, 2024

feat: support --scale-down-delay-after-* per nodegroup #5729

Merged

vadasambar marked this pull request as ready for review January 30, 2024 18:10

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2024

k8s-ci-robot requested a review from BigDarkClown January 30, 2024 18:10

vadasambar mentioned this pull request Jan 30, 2024

Jan 2024 vadafoss/daily-updates#17

Closed

k8s-ci-robot assigned Shubham82 Jan 30, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024

k8s-ci-robot assigned MaciekPytel Feb 2, 2024

mwielgus approved these changes Mar 18, 2024

View reviewed changes

k8s-ci-robot assigned mwielgus Mar 18, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 18, 2024

k8s-ci-robot merged commit aec9e1e into kubernetes:cluster-autoscaler-release-1.29 Mar 18, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

vadasambar commented Jan 30, 2024

k8s-ci-robot commented Jan 30, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla bot commented Jan 30, 2024 •

edited

Loading

Shubham82 commented Jan 30, 2024

vadasambar commented Feb 2, 2024

mwielgus left a comment

k8s-ci-robot commented Mar 18, 2024

chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

chore: cherry-pick scale-down-delay-* per nodegroup to 1.29 #6484

Conversation

vadasambar commented Jan 30, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jan 30, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla bot commented Jan 30, 2024 • edited Loading

Shubham82 commented Jan 30, 2024

vadasambar commented Feb 2, 2024

mwielgus left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 18, 2024

linux-foundation-easycla bot commented Jan 30, 2024 •

edited

Loading