Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Oldest and Newest delete policies and annotation-based Simple delete policy #726

Merged
merged 9 commits into from
Mar 27, 2019
Merged

Adds Oldest and Newest delete policies and annotation-based Simple delete policy #726

merged 9 commits into from
Mar 27, 2019

Conversation

erstaples
Copy link
Contributor

What this PR does / why we need it:
This PR adds new delete policies to the MachineSet spec: "Newest" and "Oldest," and it modifies the "Simple" delete policy to add an annotation check, which addresses the issue of defining specific machines when scaling down (#75).

  • Simple delete policy has been modified to select a Machine for scale down with the annotation machineset.clusters.k8s.io/delete-me=yes.

  • Oldest and Newest delete policies map the CreationTimestamp of the Machine to a 0-100 delete priority score.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #75

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

Added "Oldest" and "Newest" delete policies on MachineSet spec and added "machineset.clusters.k8s.io/delete-me=yes" annotation check on nodes to delete specific machines when scaling down machine sets

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 4, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @erstaples. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2019
@vincepri
Copy link
Member

vincepri commented Feb 4, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 4, 2019
@roberthbailey
Copy link
Contributor

/assign @maisem

@k8s-ci-robot
Copy link
Contributor

@roberthbailey: GitHub didn't allow me to assign the following users: maisem.

Note that only kubernetes-sigs members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @maisem

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Looking promising and definitely a welcome feature. Left some comments throughout the code.

pkg/apis/cluster/v1alpha1/machineset_types.go Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha1/machineset_types.go Show resolved Hide resolved
pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
pkg/controller/machineset/controller.go Show resolved Hide resolved
case clusterv1alpha1.OldestMachineSetDeletePolicy:
deletePriorityFunc = oldestDeletePriority
default:
klog.Errorf("Unsupported delete policy %s. Defaulting to Simple delete policy.", msdp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if defaulting to the Simple policy might be the way to go. Users might expect a different behavior and need to debug through the manager logs to find this out. I'd rather return an error here and make the Set not scale at all and add some basic validation to the field.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2019
Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few pointers:

  • This seems to be introducing a configuration-dependency !?!, eg. Autoscaler works only if Simple policy is set. We could try to avoid this if possible.
    • Probable solution while scaling-down:
      • default-policy : prefers unhealthy machines first, then random.
      • oldest-policy: prefers oldest machines first
      • newest-policy: prefers latest machines first.
      • machines with certain annotation[some-keyword] gets deleted first in any case, irrespective of the deletion-policy mentioned.
  • The concern of atomicity: Autoscaler annotates the machine and dies before it really attempts to scale-down machine-set/deployment. By the time it gets back, the state of the workload is slightly changed and the machine is not required to be deleted.
    • Probable solution:
      • Live with it, it's a rare case. If we agree on the philosophy that, VMs are ephemeral resources and could go down and come back again unless explicitly instructed through autoscaler-means, controller-logic will reconcile the desired-state anyways later.
      • Feature request from autoscaler, where DeleteNodes interface, cross-checks/updates the current-state where only intended set of machines have some-keyword annotation on them - before every-call. This can also have side-effects of being authoritative while updating annotations !!

Anyways, let's discuss this in next meeting.

@ncdc
Copy link
Contributor

ncdc commented Mar 7, 2019

Could we please get a status update on this PR? Is there consensus to proceed with it for v1alpha1?

@hardikdr
Copy link
Member

hardikdr commented Mar 7, 2019

Yes, I am on it. I am not sure about keeping it in v1alpha1, but we wanted to align with few folks from auto-scaler before merging it.

@ncdc
Copy link
Contributor

ncdc commented Mar 13, 2019

/milestone v1alpha1

@k8s-ci-robot k8s-ci-robot added this to the v1alpha1 milestone Mar 13, 2019
@detiber
Copy link
Member

detiber commented Mar 15, 2019

@erstaples it looks like a rebase is still needed for this PR

config/crds/cluster_v1alpha1_machineset.yaml Outdated Show resolved Hide resolved
config/crds/cluster_v1alpha1_machineset.yaml Outdated Show resolved Hide resolved
get registered as Kubernetes nodes. With cluster-api as a
generic out-of-tree provider for autoscaler, this field is
required by autoscaler to be able to have a provider view
of the list of machines. Another list of nodes is queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queries -> queried?

config/crds/cluster_v1alpha1_machineset.yaml Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha1/machineset_types.go Show resolved Hide resolved
pkg/controller/machineset/delete_policy.go Outdated Show resolved Hide resolved
- Renamed Simple deletepolicy to Random
- Updated Newest and Oldest policies to prioritize annotations and unhealthy nodes
- Removed unnecessary validation
- Renamed delete annotation
@ncdc
Copy link
Contributor

ncdc commented Mar 18, 2019

@erstaples do you think you'll have time to rebase + address comments soon? Thanks!

- Added godoc to MachineSetDeletePolicy type
@erstaples
Copy link
Contributor Author

Hey @ncdc, just did!

if d.Seconds() < 0 {
return mustNotDelete
}
return deletePriority(float64(mustDelete) * (1.0 - math.Exp(-d.Seconds()/secondsPerTenDays)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious - why seconds per 10 days?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function maps [0, Inf) -> [0, 1). With exponential decay, the parameter puts the half-life at just about a week, i.e. 1 week ~= 0.5 priority, 2 weeks ~= 0.75 priority etc. Arbitrary but reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean 0-100, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scaled by float64(mustDelete) to the appropriate range, yeah.

@vincepri vincepri self-assigned this Mar 20, 2019
case v1alpha1.OldestMachineSetDeletePolicy:
return oldestDeletePriority, nil
default:
return nil, errors.Errorf("Unsupported delete policy %s. Must be one of 'Random', 'Newest', or 'Oldest'", msdp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late comment, but we have to default to a policy if the value is not set, i.e.,

	case "":
		return randomDeletePolicy, nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ntfrnzn (not sure if this change went in after your comment, but) if you look a couple of lines up, there is a default for unset/"". The default case is for when the user specifies an invalid policy name such as abc123.

@ncdc
Copy link
Contributor

ncdc commented Mar 25, 2019

@vincepri any other comments on this?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thank you for the contribution!

/approve

I noticed that some BAZEL.build files are being deleted, could you run make generate and make sure that no updates are shown in git status?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: erstaples, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2019
@ncdc
Copy link
Contributor

ncdc commented Mar 26, 2019

@erstaples did you get a chance to check out Vince's comment about the Bazel files?

@erstaples
Copy link
Contributor Author

@ncdc @vincepri Sorry about that, Github failed to notify me about these comments, or they got lost in the noise. I re-ran make generate and there are quite a few changes. Want me to push them?

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   pkg/controller/machineset/BUILD.bazel

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	pkg/client/clientset_generated/clientset/BUILD.bazel
	pkg/client/clientset_generated/clientset/fake/BUILD.bazel
	pkg/client/clientset_generated/clientset/scheme/BUILD.bazel
	pkg/client/clientset_generated/clientset/typed/cluster/v1alpha1/BUILD.bazel
	pkg/client/clientset_generated/clientset/typed/cluster/v1alpha1/fake/BUILD.bazel
	pkg/client/informers_generated/externalversions/BUILD.bazel
	pkg/client/informers_generated/externalversions/cluster/BUILD.bazel
	pkg/client/informers_generated/externalversions/cluster/v1alpha1/BUILD.bazel
	pkg/client/informers_generated/externalversions/internalinterfaces/BUILD.bazel
	pkg/client/listers_generated/cluster/v1alpha1/BUILD.bazel

@vincepri
Copy link
Member

Please do! :)

@erstaples
Copy link
Contributor Author

@vincepri Done!

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 27, 2019
@vincepri
Copy link
Member

@detiber @ncdc leaving final LGTM to you

@detiber
Copy link
Member

detiber commented Mar 27, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 27, 2019
@k8s-ci-robot k8s-ci-robot merged commit f6aa237 into kubernetes-sigs:master Mar 27, 2019
serbrech pushed a commit to serbrech/cluster-api that referenced this pull request Apr 8, 2019
…lete policy (kubernetes-sigs#726)

* add oldest and newest delete policy implementations

* delete policy checkpoint

* wire up spec.deletePolicy to sort delegate

* gofmt

* - Refactor deletePriorityFunc selector to separate function
- Add validation to deletePolicy field on MachineSet CRD
- Misc cleanup and comments

* Address PR feedback:
- Renamed Simple deletepolicy to Random
- Updated Newest and Oldest policies to prioritize annotations and unhealthy nodes
- Removed unnecessary validation
- Renamed delete annotation

* - Cleaned up grammar on providerID field comments
- Added godoc to MachineSetDeletePolicy type

* Add empty string case

* Add bazel files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MachineSet: support for defining specific machines when scaling down
9 participants