-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow balancing by labels exclusively #4174
Allow balancing by labels exclusively #4174
Conversation
Welcome @jsravn! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
import ( | ||
klog "k8s.io/klog/v2" | ||
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master/HEAD uses k8s.io/kubernetes/pkg/scheduler/framework
(not v1alpha1
anymore, since 0fb897b)
val1 := n1.Node().ObjectMeta.Labels[label] | ||
val2 := n2.Node().ObjectMeta.Labels[label] | ||
if val1 != val2 { | ||
klog.V(4).Infof("%s label did not match. %s: %s, %s: %s", label, n1.Node().Name, val1, n2.Node().Name, val2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need that log emitted (at that low log level)? that will be very verbose on large clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be helpful to debug why things are not being balanced. Mostly from my struggles trying to debug the existing balancing code :). But it may not be necessary given you control the labels used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What level would be good? Or should I remove it completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually like to see this loop through all labels and log all mismatches. I really like to see all the reasons, not just the first one. This often saves time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending half a day debugging label mismatches, this log output would be great.
I've been testing this in my dev clusters (100+ nodes) and it is working really well. I'm getting perfect balancing across node groups with same instance types. |
I've had to disable balance similar with priority expander as it ignores the priority list. I have two node groups one for spot, one for on demand, same taint and workergroup labels. Only difference is market option label which I dont use for affinity etc. I'm pretty sure this would fix my problem letting me re-enable balance similar and use -balance-label set to my workergroup label/topology? |
I haven't tested it, but assuming balancing works w/ the priority expander, then yes this should help your usecase. |
Hello. I was wondering if this PR can be merged now? It would solve a lot of people's issues. |
I pushed changes for the review comments. I suppose it needs to be discussed at the SIG meeting? |
This is working well so far in my testing. Any updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsravn I support this addition. I think it's a simple solution to the problem you describe in #4165 and simple == good, at least for me!
Couple things I'd like to see, however:
- Please add at least a unit test for the new functionality.
- A discussion, either in the
areLabelsSame
code comment or in the AWS FAQ section about multiple instance type node groups about how this technique will not work with ASGs with a mixed instance policy because the NodeInfo object will be returned for the first instance in the InstanceGroup and that instance will obviously only have a singleinstance-type
label and therefore depending on which instance type different NodeGroups happened to launch first, theinstance-type
label may be different for them, which will cause theareLabelsSame
call to returnfalse
.
15d07a3
to
756d9d8
Compare
@jaypipes I've addressed your comment and rebased. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, @jsravn, much appreciated!
@@ -304,6 +304,10 @@ spec: | |||
- i3.2xlarge | |||
``` | |||
|
|||
Similarly, if using the `balancing-label` flag, you should only choose labels which have the same value for all nodes in | |||
the node group. Otherwise you may get unexpected results, as the flag values will vary based on the nodes created by | |||
the ASG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for adding this note, @jsravn
checkNodesSimilar(t, node1, node2, comparator, tc.isSimilar) | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ nice tests.
/assign @towca |
To avoid any confusion for users, perhaps there should there be some kind of failure or complaint if both As a stray thought, since this is actually replacing the behaviour of The current implementation wouldn't need to change, i.e., something like BalanceSimilarNodeGroups: (*balanceSimilarNodeGroupsFlag || len(*balancingLabelsFlag) > 0) would achieve this by making |
val1 := n1.Node().ObjectMeta.Labels[label] | ||
val2 := n2.Node().ObjectMeta.Labels[label] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the specified label is not present on either NodeInfo? Wouldn't this panic? (Edit: sorry, brainfart. It'll return ""
, so behaves as "absence is a value" below.)
This seems like an easy use-case to hit if I'm using a custom example.com/balance-group-name
label for when I have duplicated a nodegroup across AZs, but am not using it for nodegroups which are not set up that way, e.g., they are AZ-local for reasons other than EBS-selection such as locality to other resources. Such nodegroupsets would still be fed into this comparison function if they otherwise matched the pod-to-be-created, if the pod didn't require that it be deployed on a balanced nodegroup.
And once this isn't panicking, should the behaviour of 'balancing label is not present' be:
- "Don't use this nodeset", i.e. the balancing should not even be attempted if the pod-match-chosen
n1
doesn't have the desired label, and if the potential-matchn2
lacks the label, it always returns false; or - "Absence is also a value", i.e. only match if both NodeInfos lack the label, or if both have the label and the same value for that label; or
- something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g., to implement the "missing label excludes from balancing completely", I guess it'd be:
val1 := n1.Node().ObjectMeta.Labels[label] | |
val2 := n2.Node().ObjectMeta.Labels[label] | |
val1 := n1.Node().ObjectMeta.Labels[label] | |
val2 := n2.Node().ObjectMeta.Labels[label] | |
if val1 == "" { | |
klog.V(8).Infof("%s label not present on %s.", label, n1.Node().Name) | |
return false | |
} | |
if val2 == "" { | |
klog.V(8).Infof("%s label not present on %s.", label, n2.Node().Name) | |
return false | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, it'd be good to see the behaviour in the case of one, the other, or both node groups lacking the relevant label in the tests, so we know what was intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! My initial thinking is that if the label is missing completely, it wouldn't be be considered similar to another node group with also a missing label. The absence of the label would indicate "never consider this similar".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added tests and your suggestion. I made a slight change to use an existence check instead of comparing on the empty string - so if a user does explicitly set an empty label value, it will be considered, allowing users to use label keys alone as ways to group nodes.
@jsravn are you still working on getting this PR in? Anything I can do to help with the comments from TBBle? |
@artificial-aidan I've yet to have a single OWNER review this. I'm hesitant to keep working on this PR until I get some official feedback first. ping @towca |
This, as written, would solve for some challenges I have with predicting and controlling the existing similarity logic. |
ab04299
to
6710b4c
Compare
@TBBle interesting point. In my original thinking, we are still balancing similar node groups. It's just we're replacing the decision logic about what makes node groups similar. If we were to bubble this up into the top level option, I'd rather rename the existing option something like "balance-similar-node-groups-by-size" or something to indicate how it works (although the actual heuristics are more complex than that, and compare many things including labels). Then the user would choose either "balance-similar-node-groups-by-size" or "balance-similar-node-groups-by-labels". Given that breaks backwards compatibility though, not sure it's a great idea. I'm okay with introducing this as a new flag "balance-similar-nodes-by-labels" I think as you suggest. It is pretty straightforward to change it if the maintainers agree. |
Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.
6710b4c
to
1b98b38
Compare
/assign @mwielgus (just noticed towca is no longer in the OWNERS file) |
FYI used this patch and it works as expected, splitting a 8 node scale up into 3,3,2 🎉
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a nice feature, we should probably discuss it at a SIG meeting if we can't get more review here.
we talked about this PR at today's meeting, hopefully we'll get a few more reviews. i think it's a nice feature, code looks good to me. /lgtm |
Everything looks good to me here, thanks for a very useful contribution! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jaypipes, jsravn, towca The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…els-exclusively Allow balancing by labels exclusively
I just switched to this, since my existing use of |
Adds a new flag
--balancing-label
which allows users to balance between node groups exclusively via labels for when--balance-similar-node-groups
is being used.This gives users the flexibility to specify the similarity logic themselves rather than trying to work within the existing heuristics.
This is a POC based on my idea in #4165.
An example use case is a heavily diversified AWS deployment with per-AZ node groups to accommodate zone affinity restrictions of EBS volumes. The current heuristic logic makes this difficult to accomplish in combination with the least-waste expander, as the template node info for empty node groups rarely matches existing nodes (for many reasons, as discussed in the linked issue and elsewhere). This change makes it very simple - just specify the labels you want to compare on, such as
node.kubernetes.io/instance-type
, and that's it.