Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow balancing by labels exclusively #4174

Merged

Conversation

jsravn
Copy link
Contributor

@jsravn jsravn commented Jun 29, 2021

Adds a new flag --balancing-label which allows users to balance between node groups exclusively via labels for when --balance-similar-node-groups is being used.

This gives users the flexibility to specify the similarity logic themselves rather than trying to work within the existing heuristics.

This is a POC based on my idea in #4165.

An example use case is a heavily diversified AWS deployment with per-AZ node groups to accommodate zone affinity restrictions of EBS volumes. The current heuristic logic makes this difficult to accomplish in combination with the least-waste expander, as the template node info for empty node groups rarely matches existing nodes (for many reasons, as discussed in the linked issue and elsewhere). This change makes it very simple - just specify the labels you want to compare on, such as node.kubernetes.io/instance-type, and that's it.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 29, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @jsravn!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 29, 2021
Copy link
Contributor

@bpineau bpineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea!
Unpredictable kernel memory reservations on AWS instances is a real issue for balance-similar-node-groups. We unsuccessfully tried to other approaches here, here, here and a few more times.


import (
klog "k8s.io/klog/v2"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master/HEAD uses k8s.io/kubernetes/pkg/scheduler/framework (not v1alpha1 anymore, since 0fb897b)

val1 := n1.Node().ObjectMeta.Labels[label]
val2 := n2.Node().ObjectMeta.Labels[label]
if val1 != val2 {
klog.V(4).Infof("%s label did not match. %s: %s, %s: %s", label, n1.Node().Name, val1, n2.Node().Name, val2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that log emitted (at that low log level)? that will be very verbose on large clusters

Copy link
Contributor Author

@jsravn jsravn Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be helpful to debug why things are not being balanced. Mostly from my struggles trying to debug the existing balancing code :). But it may not be necessary given you control the labels used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What level would be good? Or should I remove it completely?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually like to see this loop through all labels and log all mismatches. I really like to see all the reasons, not just the first one. This often saves time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending half a day debugging label mismatches, this log output would be great.

@jsravn
Copy link
Contributor Author

jsravn commented Jul 5, 2021

I've been testing this in my dev clusters (100+ nodes) and it is working really well. I'm getting perfect balancing across node groups with same instance types.

@tsunamishaun
Copy link

tsunamishaun commented Jul 16, 2021

I've had to disable balance similar with priority expander as it ignores the priority list. I have two node groups one for spot, one for on demand, same taint and workergroup labels. Only difference is market option label which I dont use for affinity etc. I'm pretty sure this would fix my problem letting me re-enable balance similar and use -balance-label set to my workergroup label/topology?

@jsravn
Copy link
Contributor Author

jsravn commented Jul 20, 2021

I've had to disable balance similar with priority expander as it ignores the priority list. I have two node groups one for spot, one for on demand, same taint and workergroup labels. Only difference is market option label which I dont use for affinity etc. I'm pretty sure this would fix my problem letting me re-enable balance similar and use -balance-label set to my workergroup label/topology?

I haven't tested it, but assuming balancing works w/ the priority expander, then yes this should help your usecase.

@syscod3
Copy link

syscod3 commented Aug 23, 2021

Hello. I was wondering if this PR can be merged now? It would solve a lot of people's issues.

@jsravn
Copy link
Contributor Author

jsravn commented Aug 24, 2021

I pushed changes for the review comments. I suppose it needs to be discussed at the SIG meeting?

@artificial-aidan
Copy link

This is working well so far in my testing. Any updates?

Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsravn I support this addition. I think it's a simple solution to the problem you describe in #4165 and simple == good, at least for me!

Couple things I'd like to see, however:

  1. Please add at least a unit test for the new functionality.
  2. A discussion, either in the areLabelsSame code comment or in the AWS FAQ section about multiple instance type node groups about how this technique will not work with ASGs with a mixed instance policy because the NodeInfo object will be returned for the first instance in the InstanceGroup and that instance will obviously only have a single instance-type label and therefore depending on which instance type different NodeGroups happened to launch first, the instance-type label may be different for them, which will cause the areLabelsSame call to return false.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2021
@jsravn jsravn force-pushed the allow-balancing-by-labels-exclusively branch from 15d07a3 to 756d9d8 Compare November 23, 2021 15:23
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 23, 2021
@jsravn
Copy link
Contributor Author

jsravn commented Nov 23, 2021

@jaypipes I've addressed your comment and rebased. Let me know what you think.

Copy link
Contributor

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, @jsravn, much appreciated!

@@ -304,6 +304,10 @@ spec:
- i3.2xlarge
```

Similarly, if using the `balancing-label` flag, you should only choose labels which have the same value for all nodes in
the node group. Otherwise you may get unexpected results, as the flag values will vary based on the nodes created by
the ASG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for adding this note, @jsravn

checkNodesSimilar(t, node1, node2, comparator, tc.isSimilar)
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ nice tests.

@jsravn
Copy link
Contributor Author

jsravn commented Nov 23, 2021

/assign @towca

@TBBle
Copy link

TBBle commented Jan 26, 2022

To avoid any confusion for users, perhaps there should there be some kind of failure or complaint if both balancing-ignore-label and balancing-label are provided, even if it's just a log warning that the former will be ignored. A start-up failure with error seems reasonable to me too. The semantic of both flags provided is pretty clear (since 'ignored labels' includes all labels not listed in balancing-labels by default, so the ignore is redundant) except if the user provides the same label in both lists for some reason, which implies a misunderstanding of the intended behaviour.

As a stray thought, since this is actually replacing the behaviour of --balance-similar-node-groups to bypass the check for similarity, does it contrast with that option instead, e.g., should the flag be something like --balance-labelled-node-groups?

The current implementation wouldn't need to change, i.e., something like

BalanceSimilarNodeGroups:           (*balanceSimilarNodeGroupsFlag || len(*balancingLabelsFlag) > 0)

would achieve this by making --balancing-label automatically activate --balance-similar-node-groups, and the rest of the implementation continues to work.

Comment on lines 33 to 34
val1 := n1.Node().ObjectMeta.Labels[label]
val2 := n2.Node().ObjectMeta.Labels[label]
Copy link

@TBBle TBBle Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the specified label is not present on either NodeInfo? Wouldn't this panic? (Edit: sorry, brainfart. It'll return "", so behaves as "absence is a value" below.)

This seems like an easy use-case to hit if I'm using a custom example.com/balance-group-name label for when I have duplicated a nodegroup across AZs, but am not using it for nodegroups which are not set up that way, e.g., they are AZ-local for reasons other than EBS-selection such as locality to other resources. Such nodegroupsets would still be fed into this comparison function if they otherwise matched the pod-to-be-created, if the pod didn't require that it be deployed on a balanced nodegroup.

And once this isn't panicking, should the behaviour of 'balancing label is not present' be:

  • "Don't use this nodeset", i.e. the balancing should not even be attempted if the pod-match-chosen n1 doesn't have the desired label, and if the potential-match n2 lacks the label, it always returns false; or
  • "Absence is also a value", i.e. only match if both NodeInfos lack the label, or if both have the label and the same value for that label; or
  • something else?

Copy link

@TBBle TBBle Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., to implement the "missing label excludes from balancing completely", I guess it'd be:

Suggested change
val1 := n1.Node().ObjectMeta.Labels[label]
val2 := n2.Node().ObjectMeta.Labels[label]
val1 := n1.Node().ObjectMeta.Labels[label]
val2 := n2.Node().ObjectMeta.Labels[label]
if val1 == "" {
klog.V(8).Infof("%s label not present on %s.", label, n1.Node().Name)
return false
}
if val2 == "" {
klog.V(8).Infof("%s label not present on %s.", label, n2.Node().Name)
return false
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it'd be good to see the behaviour in the case of one, the other, or both node groups lacking the relevant label in the tests, so we know what was intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! My initial thinking is that if the label is missing completely, it wouldn't be be considered similar to another node group with also a missing label. The absence of the label would indicate "never consider this similar".

Copy link
Contributor Author

@jsravn jsravn Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests and your suggestion. I made a slight change to use an existence check instead of comparing on the empty string - so if a user does explicitly set an empty label value, it will be considered, allowing users to use label keys alone as ways to group nodes.

@artificial-aidan
Copy link

@jsravn are you still working on getting this PR in? Anything I can do to help with the comments from TBBle?

@jsravn
Copy link
Contributor Author

jsravn commented Apr 19, 2022

@artificial-aidan I've yet to have a single OWNER review this. I'm hesitant to keep working on this PR until I get some official feedback first. ping @towca

@njablonski
Copy link

This, as written, would solve for some challenges I have with predicting and controlling the existing similarity logic.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2022
@jsravn jsravn force-pushed the allow-balancing-by-labels-exclusively branch from ab04299 to 6710b4c Compare July 6, 2022 08:27
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 6, 2022
@jsravn
Copy link
Contributor Author

jsravn commented Jul 6, 2022

As a stray thought, since this is actually replacing the behaviour of --balance-similar-node-groups to bypass the check for similarity, does it contrast with that option instead, e.g., should the flag be something like --balance-labelled-node-groups?

@TBBle interesting point. In my original thinking, we are still balancing similar node groups. It's just we're replacing the decision logic about what makes node groups similar. If we were to bubble this up into the top level option, I'd rather rename the existing option something like "balance-similar-node-groups-by-size" or something to indicate how it works (although the actual heuristics are more complex than that, and compare many things including labels). Then the user would choose either "balance-similar-node-groups-by-size" or "balance-similar-node-groups-by-labels". Given that breaks backwards compatibility though, not sure it's a great idea. I'm okay with introducing this as a new flag "balance-similar-nodes-by-labels" I think as you suggest. It is pretty straightforward to change it if the maintainers agree.

Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.

This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
@jsravn jsravn force-pushed the allow-balancing-by-labels-exclusively branch from 6710b4c to 1b98b38 Compare July 6, 2022 09:34
@jsravn
Copy link
Contributor Author

jsravn commented Jul 6, 2022

I've rebased and addressed the edge cases @tbbie pointed out. @mwielgus would you or another reviewer be able to have a look?

@jsravn
Copy link
Contributor Author

jsravn commented Jul 6, 2022

/assign @mwielgus

(just noticed towca is no longer in the OWNERS file)

@grosser
Copy link
Contributor

grosser commented Jul 19, 2022

FYI used this patch and it works as expected, splitting a 8 node scale up into 3,3,2 🎉

{"level":"INFO","message":"Estimated 8 nodes needed in usw2a-AutoscalingGroup-a"}
{"level":"INFO","message":"Splitting scale-up between 3 similar node groups: {usw2a-AutoscalingGroup-a, usw2b-AutoscalingGroup-b, usw2c-AutoscalingGroup-c}"}
{"level":"INFO","message":"Final scale-up plan: [{usw2a-AutoscalingGroup-a 1->4 (max: 10)} {usw2b-AutoscalingGroup-b 1->4 (max: 10)} {usw2c-AutoscalingGroup-c 1->3 (max: 10)}]"}

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a nice feature, we should probably discuss it at a SIG meeting if we can't get more review here.

@elmiko
Copy link
Contributor

elmiko commented Jul 25, 2022

we talked about this PR at today's meeting, hopefully we'll get a few more reviews. i think it's a nice feature, code looks good to me.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2022
@towca
Copy link
Collaborator

towca commented Aug 1, 2022

Everything looks good to me here, thanks for a very useful contribution!

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jaypipes, jsravn, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2022
@k8s-ci-robot k8s-ci-robot merged commit 8ba1853 into kubernetes:master Aug 1, 2022
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
…els-exclusively

Allow balancing by labels exclusively
@jbilliau-rcd
Copy link

I just switched to this, since my existing use of --balance-ignore-labels apparently broke at some point (it just spun up 11 nodes in one node group, RIP ). Using balancing-label: eks.amazonaws.com/nodegroup-image, I am not getting proper balancing across 3 node groups, 3 AZ's again. Thanks @jsravn !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.