Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cluster autoscaler as bootstrap addon #9787

Merged
merged 1 commit into from
Sep 5, 2020

Conversation

olemarkus
Copy link
Member

We see that when managing a larger number of clusters, getting the cluster autoscaler version correct is a bit of a hassle. We also need to take care to properly update CAS after cluster upgrades etc. So adding CAS as a configurable bootstrap addon seems like a good idea to make this more manageable.

If no one objects to this, I will finish this and add documentation and validation.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 20, 2020
@olemarkus
Copy link
Member Author

/cc @hakman

@k8s-ci-robot k8s-ci-robot requested a review from hakman August 20, 2020 11:06
@moshevayner
Copy link
Member

That's a great idea, thanks for adding that @olemarkus !!

@k8s-ci-robot
Copy link
Contributor

@olemarkus: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@olemarkus
Copy link
Member Author

/retest

@moshevayner
Copy link
Member

BTW, I guess it would be also worth adding some documentation for it? Or will that be done through a separate PR?

@olemarkus
Copy link
Member Author

Docs will come as part of this PR. Just want to ensure there are no objections before starting on that.

@olemarkus olemarkus changed the title WIP Implement cluster autoscaler as bootstrap addon Implement cluster autoscaler as bootstrap addon Aug 21, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 21, 2020
@olemarkus olemarkus requested a review from johngmyers August 21, 2020 11:15
docs/cluster_spec.md Outdated Show resolved Hide resolved
pkg/apis/kops/cluster.go Outdated Show resolved Hide resolved
pkg/apis/kops/componentconfig.go Outdated Show resolved Hide resolved
pkg/apis/kops/componentconfig.go Outdated Show resolved Hide resolved
pkg/apis/kops/componentconfig.go Outdated Show resolved Hide resolved
pkg/apis/kops/v1alpha2/cluster.go Outdated Show resolved Hide resolved
pkg/apis/kops/componentconfig.go Outdated Show resolved Hide resolved
pkg/apis/kops/v1alpha2/componentconfig.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/bootstrapchannelbuilder.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/template_functions.go Outdated Show resolved Hide resolved
@olemarkus olemarkus force-pushed the cas branch 3 times, most recently from b1b88e1 to d598472 Compare August 27, 2020 07:55
@olemarkus
Copy link
Member Author

/retest

1 similar comment
@olemarkus
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2020
@olemarkus
Copy link
Member Author

/retest

@olemarkus
Copy link
Member Author

/test pull-kops-e2e-cni-cilium

@olemarkus
Copy link
Member Author

/retest

pkg/apis/kops/validation/validation.go Outdated Show resolved Hide resolved
pkg/apis/kops/validation/validation.go Show resolved Hide resolved
pkg/model/components/clusterautoscaler.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/template_functions.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/template_functions.go Outdated Show resolved Hide resolved
upup/pkg/fi/cloudup/template_functions.go Outdated Show resolved Hide resolved
@olemarkus olemarkus force-pushed the cas branch 3 times, most recently from e00a2d4 to feacaf1 Compare September 2, 2020 19:44
@olemarkus olemarkus force-pushed the cas branch 2 times, most recently from 38f5fdf to ce5a2cf Compare September 3, 2020 07:48
Use provider-agnostic node definition for cas instead of aws auto-discovery

Validate clusterAutoscalerSpec

Add spec documentation

Add cas docs

Make CRDs

Apply suggestions from code review

Co-authored-by: John Gardiner Myers <[email protected]>

Add enabled flag to cas config

Apply suggestions from code review

Co-authored-by: Guy Templeton <[email protected]>

Add support for custom cas image

Support more k8s versions

Use full image names
@hakman
Copy link
Member

hakman commented Sep 3, 2020

@johngmyers leaving the approval to you, in case there is anything left from your point of view.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2020
Copy link
Member

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of the review comments is blocking. Holding to give opportunity to address.

/approve
/hold


if cas.Image == nil {

image := "k8s.gcr.io/autoscaling/cluster-autoscaler:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of mutable tags such as "latest"; I'd prefer using "v1.19.0" (or whatever the highest known version is) for newer Kubernetes versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any preference here.

@@ -521,6 +521,27 @@ func (b *BootstrapChannelBuilder) buildAddons() *channelsapi.Addons {
}
}

if b.Cluster.Spec.ClusterAutoscaler != nil && fi.BoolValue(b.Cluster.Spec.ClusterAutoscaler.Enabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should default Enabled to true if ClusterAutoscaler is non-nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to keep it disabled by default at last office hours.
Maybe in 1.20 we materialise it into to the cluster spec so that people know it's enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But default is to have a nil ClusterAutoscaler field. Why would someone specify a ClusterAutoscaler field yet want Enabled to default to false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it may be a good idea in general, just don't know how used this pattern is in kops.
Sometimes when I test or just want to disable something I just comment the enabled: true line, without actually but keep all other setting to make it easy to reenable.

One of the reasons for starting the discussion on #9661 was to establish some guidelines for enabled/disabled and managed/unmanaged, but seems not very popular topic. Maybe should open a more generic one.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johngmyers, olemarkus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 4, 2020
@olemarkus
Copy link
Member Author

I am a bit back and forth on what I prefer on the above. So I think I'll just let this one go in and then do a follow-up. Especially if cluster-autoscaler should prove to be slower with their support for newer k8s in the future.
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2020
@k8s-ci-robot k8s-ci-robot merged commit d8b7310 into kubernetes:master Sep 5, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Sep 5, 2020
@paalkr
Copy link

paalkr commented Sep 21, 2020

Thanks for contributing with this addition, that makes deploying the CAS easier. We have a use case that uses a mix of different cluster AutoScaling methods (external and CAS) for various node pools. I think it would be vice to add a keyword to the instance group spec to either include or exclude the group with CAS.

@olemarkus olemarkus deleted the cas branch September 21, 2020 11:06
@olemarkus
Copy link
Member Author

Thanks. I'd add this as a separate issue. I may not be able to implement this in the near future, but it should be a pretty easy "good first issue".

@avdhoot
Copy link
Contributor

avdhoot commented Oct 2, 2020

Might late to join the party. Thanks @olemarkus for this feature. It will remove extra step to manage autoscaler version separately . But it will be good if we add support for priority expander. When you have multiple instance group mix of on-demand & spot instance group. priority expander help to set priority to spot instance group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api area/documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants