Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use annotations to set labels and taints for clusterapi nodegroups #5382

Conversation

cnmcavoy
Copy link
Contributor

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

Updates the clusterapi provider to use label or taint capacity annotations on MachineDeployments to supply information about the nodegroup shape when the nodegroup replicas are scaled to zero.

Currently, the clusterapi provider does not support nodegroup shapes with labels or taints and scaling up from zero replicas, which can cause nodegroups to be skipped in evaluation and fail to scale up. This scenario occurs when a pod is created that tolerates the nodegroup's taints and has a node selector that can only be satisfied by those nodegroups. The cluster-autoscaler needs the taint and label information about a nodegroup to correctly match such a pod to the nodegroup.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added support for supplying labels and taints when scaling to and from zero nodes for the cluster autoscaler's Cluster API provider. Enabling this feature will require changes by the user, for instruction please see the Cluster API (clusterapi) provider README file in the autoscaler repository.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 22, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @cnmcavoy!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 22, 2022
@k8s-ci-robot k8s-ci-robot requested a review from elmiko December 22, 2022 18:20
@elmiko
Copy link
Contributor

elmiko commented Dec 22, 2022

thanks @cnmcavoy , i think this is a novel and simple solution to the issue. i'd like to get a wider consensus from the cluster-api community about adding these annotations. assuming the community agrees, i don't see any reason why we can't use these.

related to kubernetes-sigs/cluster-api#7685

i'm putting a hold here just to ensure we get wider discussion before merging
/hold for wider cluster api community review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2022
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code generally looks good to me, thanks for adding the unit test =)

@sbueringer
Copy link
Member

sbueringer commented Jan 2, 2023

The idea looks good to me as well (didn't review the implementation).

As noted in kubernetes-sigs/cluster-api#7685 (comment) with an upcoming ClusterAPI release there should be a way to get the labels from MD/MS, but those might be incomplete (as the bootstrap provider can add more labels) and that doesn't cover taints.

Even if we solve this in Cluster API eventually for labels and taints and including bootstrap providers I think it's fine to support this with autoscaler-specific annotations today. We can still deprecate and remove them eventually once a solution in Cluster API has been implemented.

Also the annotation approach will be compatible with all Cluster API versions already.

@elmiko
Copy link
Contributor

elmiko commented Jan 3, 2023

Even if we solve this in Cluster API eventually for labels and taints and including bootstrap providers I think it's fine to support this with autoscaler-specific annotations today. We can still deprecate and remove them eventually once a solution in Cluster API has been implemented.

Also the annotation approach will be compatible with all Cluster API versions already.

+1, given that we already have the annotations as a way to override the values from the capi objects, i think having these label/taint annotations is a great addition.

@cnmcavoy i am going to bring this PR up as a topic at tomorrow's cluster api meeting, if there are no objections to the approach here i will remove the hold.

cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
capacity.cluster-autoscaler.kubernetes.io/memory: "128G"
capacity.cluster-autoscaler.kubernetes.io/cpu: "16"
capacity.cluster-autoscaler.kubernetes.io/labels: "key1=value1,key2=value2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can avoid asking the users to specify the list of labels by inferring if from the MachineDeployment itself.

According to what is defined in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220927-label-sync-between-machine-and-nodes.md and in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20221003-In-place-propagation-of-Kubernetes-objects-only-changes.md

  • Labels that will be propagated to nodes derive from labels defined in MachineDeployment.spec.template.metadata.labels
  • Among those labels only the one node-role.kubernetes.io/* label, node-restriction.kubernetes.io/ domain, node.cluster.x-k8s.io domain are going to be propagated

(NOTE: this will apply to CAPI >=1.4 if we manage to complete the design proposal implementation during the current release cycle)
(NOTE: this is only the subset of labels that CAPI will apply, it doesn't include labels Kubelet adds automatically)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can avoid asking the users to specify the list of labels by inferring if from the MachineDeployment itself.

i think we definitely want to do this once that implementation is released, but i think we could still keep these annotations as a continuation of the overrides we currently have (eg for cpu, memory, etc).

so, ultimately, the cluster autoscaler would inspect the labels from MachineDeployment.spec.template.metadata.labels, and use some of those labels (based on the rules in the enhancement), and then inspect the annotations on the machinedeployment to see if any labels should be overridden.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini just a follow up here, is there an objection to adding these override annotations in addition to whatever we do with reading the labels from the MachineDeployment?

@elmiko
Copy link
Contributor

elmiko commented Jan 4, 2023

@cnmcavoy just as a followup, we discussed this PR at the meeting today (recording), there were no major objections, but some questions about interactions with cluster api controllers were brought up. i'd like to give folks a few days to discuss here, and then if there are no objections we could move forward with this PR.

as a followup, i would like to make sure that we update the autoscaling from zero enhancement to include the changes here. i am happy to help update that document.

@x13n
Copy link
Member

x13n commented Jan 9, 2023

I'm looking at unassigned PRs, looks like this is one is already taken care of:

/assign @elmiko

@elmiko
Copy link
Contributor

elmiko commented Jan 31, 2023

i'm removing the hold here, i think the capi community has discussed this and there were no objections to moving forward on this.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2023
@elmiko
Copy link
Contributor

elmiko commented Jan 31, 2023

i've created kubernetes-sigs/cluster-api#8036 to cover the docs update in cluster-api

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2023
@cnmcavoy cnmcavoy force-pushed the cmcavoy/scale-from-zero-with-labels-taints branch from ef99e32 to 75ba088 Compare February 3, 2023 17:14
@cnmcavoy cnmcavoy force-pushed the cmcavoy/scale-from-zero-with-labels-taints branch from 75ba088 to e63244a Compare February 3, 2023 17:18
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2023
@elmiko
Copy link
Contributor

elmiko commented Feb 3, 2023

@cnmcavoy thanks for the update, just waiting to make sure there are no objections before merging this. i don't think there will be but i want to make sure.

@cnmcavoy
Copy link
Contributor Author

Any updates on this @elmiko? We've been using this in our clusters since January and would love to get off our fork.

@elmiko
Copy link
Contributor

elmiko commented Feb 21, 2023

@cnmcavoy apologies for the delay, i'll poke @fabriziopandini to see if he had any further requests here

@elmiko
Copy link
Contributor

elmiko commented Feb 22, 2023

discussed in the meeting today, no objections to merging. thanks again @cnmcavoy !

/approve
/lgtm

cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Aug 29, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 2, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 5, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 9, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 12, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 16, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 19, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Sep 23, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 7, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 10, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 14, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this pull request Oct 14, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 17, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 21, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 24, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 28, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Oct 31, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 4, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 7, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 11, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 14, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 18, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 21, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 25, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Nov 28, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Dec 2, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Dec 5, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Dec 9, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Dec 12, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
cloud-team-bot bot pushed a commit to openshift-cloud-team/kubernetes-autoscaler that referenced this pull request Dec 16, 2024
This change re-adds the machine api support for labels and taints on
node groups. The code was removed upstream as it is openshift specific,
see this pull request[0].

It also adds in the functionality of the upstream override annotation
for labels and taints[1] to support
https://issues.redhat.com/browse/MIXEDARCH-259

[0]: kubernetes#5249
[1]: kubernetes#5382
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants