-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Oldest and Newest delete policies and annotation-based Simple delete policy #726
Adds Oldest and Newest delete policies and annotation-based Simple delete policy #726
Conversation
Hi @erstaples. Thanks for your PR. I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/assign @maisem |
@roberthbailey: GitHub didn't allow me to assign the following users: maisem. Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution! Looking promising and definitely a welcome feature. Left some comments throughout the code.
case clusterv1alpha1.OldestMachineSetDeletePolicy: | ||
deletePriorityFunc = oldestDeletePriority | ||
default: | ||
klog.Errorf("Unsupported delete policy %s. Defaulting to Simple delete policy.", msdp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if defaulting to the Simple policy might be the way to go. Users might expect a different behavior and need to debug through the manager logs to find this out. I'd rather return an error here and make the Set not scale at all and add some basic validation to the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few pointers:
- This seems to be introducing a configuration-dependency !?!, eg. Autoscaler works only if
Simple
policy is set. We could try to avoid this if possible.- Probable solution while scaling-down:
default-policy
: prefers unhealthy machines first, then random.oldest-policy
: prefers oldest machines firstnewest-policy
: prefers latest machines first.- machines with certain annotation[
some-keyword
] gets deleted first in any case, irrespective of the deletion-policy mentioned.
- Probable solution while scaling-down:
- The concern of atomicity: Autoscaler annotates the machine and dies before it really attempts to scale-down machine-set/deployment. By the time it gets back, the state of the workload is slightly changed and the machine is not required to be deleted.
- Probable solution:
- Live with it, it's a rare case. If we agree on the philosophy that, VMs are ephemeral resources and could go down and come back again unless explicitly instructed through autoscaler-means, controller-logic will reconcile the
desired-state
anyways later. - Feature request from autoscaler, where DeleteNodes interface, cross-checks/updates the current-state where only intended set of machines have
some-keyword
annotation on them - before every-call. This can also have side-effects of being authoritative while updating annotations !!
- Live with it, it's a rare case. If we agree on the philosophy that, VMs are ephemeral resources and could go down and come back again unless explicitly instructed through autoscaler-means, controller-logic will reconcile the
- Probable solution:
Anyways, let's discuss this in next meeting.
Could we please get a status update on this PR? Is there consensus to proceed with it for v1alpha1? |
Yes, I am on it. I am not sure about keeping it in v1alpha1, but we wanted to align with few folks from auto-scaler before merging it. |
/milestone v1alpha1 |
@erstaples it looks like a rebase is still needed for this PR |
get registered as Kubernetes nodes. With cluster-api as a | ||
generic out-of-tree provider for autoscaler, this field is | ||
required by autoscaler to be able to have a provider view | ||
of the list of machines. Another list of nodes is queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queries -> queried?
- Add validation to deletePolicy field on MachineSet CRD - Misc cleanup and comments
- Renamed Simple deletepolicy to Random - Updated Newest and Oldest policies to prioritize annotations and unhealthy nodes - Removed unnecessary validation - Renamed delete annotation
@erstaples do you think you'll have time to rebase + address comments soon? Thanks! |
- Added godoc to MachineSetDeletePolicy type
Hey @ncdc, just did! |
if d.Seconds() < 0 { | ||
return mustNotDelete | ||
} | ||
return deletePriority(float64(mustDelete) * (1.0 - math.Exp(-d.Seconds()/secondsPerTenDays))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious - why seconds per 10 days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function maps [0, Inf) -> [0, 1). With exponential decay, the parameter puts the half-life at just about a week, i.e. 1 week ~= 0.5 priority, 2 weeks ~= 0.75 priority etc. Arbitrary but reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean 0-100, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scaled by float64(mustDelete)
to the appropriate range, yeah.
case v1alpha1.OldestMachineSetDeletePolicy: | ||
return oldestDeletePriority, nil | ||
default: | ||
return nil, errors.Errorf("Unsupported delete policy %s. Must be one of 'Random', 'Newest', or 'Oldest'", msdp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late comment, but we have to default to a policy if the value is not set, i.e.,
case "":
return randomDeletePolicy, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ntfrnzn (not sure if this change went in after your comment, but) if you look a couple of lines up, there is a default for unset/""
. The default
case is for when the user specifies an invalid policy name such as abc123
.
@vincepri any other comments on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thank you for the contribution!
/approve
I noticed that some BAZEL.build files are being deleted, could you run make generate
and make sure that no updates are shown in git status?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: erstaples, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@erstaples did you get a chance to check out Vince's comment about the Bazel files? |
@ncdc @vincepri Sorry about that, Github failed to notify me about these comments, or they got lost in the noise. I re-ran
|
Please do! :) |
@vincepri Done! |
/lgtm |
…lete policy (kubernetes-sigs#726) * add oldest and newest delete policy implementations * delete policy checkpoint * wire up spec.deletePolicy to sort delegate * gofmt * - Refactor deletePriorityFunc selector to separate function - Add validation to deletePolicy field on MachineSet CRD - Misc cleanup and comments * Address PR feedback: - Renamed Simple deletepolicy to Random - Updated Newest and Oldest policies to prioritize annotations and unhealthy nodes - Removed unnecessary validation - Renamed delete annotation * - Cleaned up grammar on providerID field comments - Added godoc to MachineSetDeletePolicy type * Add empty string case * Add bazel files
What this PR does / why we need it:
This PR adds new delete policies to the MachineSet spec: "Newest" and "Oldest," and it modifies the "Simple" delete policy to add an annotation check, which addresses the issue of defining specific machines when scaling down (#75).
Simple delete policy has been modified to select a Machine for scale down with the annotation
machineset.clusters.k8s.io/delete-me=yes
.Oldest and Newest delete policies map the CreationTimestamp of the Machine to a 0-100 delete priority score.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #75
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: