Allow small tolerance on memory capacity when comparing nodegroups #3124

JoelSpeed · 2020-05-11T10:03:52Z

This allows a small tolerance in the memory capacity of nodes to allow better matching of similar node groups. There are differences in the memory values that Kubernetes interprets due to variances in the instances that a cloud provider provides.

Also adds tests that match real values from a real set of nodes that would be expected to be the same (the same instance type across multiple availability zones within a given region)

Eg. In testing I saw AWS m5.xlarge nodes with capacities such as 16116152Ki and 15944120Ki not only across availability zones, but within the same availability zone after a few cycles through machines. This is a difference on 168Mi which is much larger than the original tolerance of 128000 Bytes which was preventing BalanceSimilarNodeGroups from balancing across these availability zones.

enxebre · 2020-05-11T10:09:14Z

cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go

+
+var (
+	// MaxMemoryDifference describes how much memory capacity can differ but still be considered equal.
+	MaxMemoryDifference = resource.MustParse("256Mi")


Moving discussion from openshift#152 (comment).

Wouldn't this MaxMemoryDifference also apply to much more smaller instances to the point of making the check loosing its value?
i.e If the possible diff range increase with the instance size, should we may be make our tolerance window a percentage of the given total size?

Yes it would. For this example the difference was just over 1%, so perhaps making it percentage based would be better. I think we will need to investigate a few more real world examples to come up with a sensible way to do this

enxebre · 2020-05-13T08:18:21Z

This lgtm. Can we drop the first commit and include the reasoning about choosing 0.015 ratio in the final commit desc?

enxebre · 2020-05-13T10:27:47Z

/lgtm

elmiko

this lgtm, thanks Joel!
/approve

elmiko · 2020-05-13T13:15:48Z

hehe, guess i'm not an approver on this part. so i'll just add
/lgtm

=)

wwentland · 2020-05-14T17:51:27Z

We encountered unbalanced scaling on AWS EKS due to memory differences in the underlying nodes.

I backported the changes in this PR to 1.16.5 which allowed CA to balance scaling events between all nodegroups with similar instance types (as expected).

JoelSpeed · 2020-05-15T10:56:54Z

CC @Jeffwan @jaypipes, It was suggested to me that you may have opinions on this PR

Jeffwan · 2020-05-16T22:28:36Z

I am thinking the potential diff for large instances. Could you explain the magic number you use here? I think it's more for a specific case? Is there a way to make sure it covers most of the cases? Sorry, I don't have lots of data points now. @JoelSpeed

JoelSpeed · 2020-05-18T10:01:39Z

@Jeffwan I posted some details in a comment on a related issue about some observations I had made.

In particular, the main culprits for memory differences are m5, c5 and r5 instances based on my experimenting and conversations with others. The differences I have managed to observe are approximately 1%, independent of the size of the instances (eg I saw a 1.05% difference on an instance with 256Gi of memory). I chose 1.5% to allow a small tolerance on the differences I was seeing, but I didn't want to make it the 5% difference that the other values allow as this seemed massively overkill for the real world results.

Having spoken to an engineer from AWS, they have told me that they suspect the issue comes from the fact that some 4th and 5th gen instance types are actually a mixture of hardware specs with slight differences on them. This coupled with the Nitro hypervisor on the 5th gen instances can accentuate the apparent differences in the hardware to the OS (it gives a more realistic picture I think). Unfortunately due to the way that AWS is working with these instances types, you can see this approximate 1% difference in memory across instances of the same type, both within and across availability zones.

A potential further fix would be to not compare memory/cpu etc for nodegroups backed by the same instance type, but that feels like it will lead to other issues.

Jeffwan · 2020-05-18T17:19:28Z

@JoelSpeed I think it makes sense to me and also looks good to me on AWS. However, this logic is shared and used by others as well. Not sure it's acceptable to anyone else? There's another option is to rewrite this a comparator in https://github.com/kubernetes/autoscaler/blob/972e30a5d9eece175a54fa5dfc0ed902b34f02b1/cluster-autoscaler/processors/nodegroupset/aws_nodegroups.go.

@losipiuk @aleksandra-malinowska @vivekbagade opnions?

JoelSpeed · 2020-05-19T09:42:16Z

Not sure it's acceptable to anyone else? There's another option is to rewrite this a comparator in https://github.com/kubernetes/autoscaler/blob/972e30a5d9eece175a54fa5dfc0ed902b34f02b1/cluster-autoscaler/processors/nodegroupset/aws_nodegroups.go.

That is an alternative, but I'm actually coming at this from the perspective of the clusterapi provider. We would need to maintain an independent comparator for CAPI that basically is the same as the standard one, but with this one minor difference to cater for the AWS case. If that's what we must do then I can do that, but I'd prefer not to have a separate CAPI comparator

Also, I'm not sure how that would work since the the comparator just changes the labels presently, we would have to change a lot to allow this difference, plumbing through tolerances for different providers may work?

Jeffwan · 2020-05-19T17:41:12Z

@JoelSpeed

Yeah, I don't see data points from other cloud providers. If there's a big concern, we might need to move to aws comparator. If not, let's keep the current implementation and I think it makes sense.

JoelSpeed · 2020-06-08T16:04:12Z

I mentioned this PR on the community call earlier today, it was suggested that I ping approvers from the other providers to see if they have any strong opposition to this. Starting a lazy consensus, if there is no opposition to this by end of day on the 15th, then we should seek approval and get this merged

Ping @ringtail @feiskyer @nilo19 @marwanad @hello2mao @andrewsykim @tghartland, do you have concerns about this?

ringtail · 2020-06-10T02:05:09Z

I mentioned this PR on the community call earlier today, it was suggested that I ping approvers from the other providers to see if they have any strong opposition to this. Starting a lazy consensus, if there is no opposition to this by end of day on the 15th, then we should seek approval and get this merged

Ping @ringtail @feiskyer @nilo19 @marwanad @hello2mao @andrewsykim @tghartland, do you have concerns about this?

I agree. But I think there should also be a magic number for GPU.

ringtail · 2020-06-10T02:06:52Z

I mentioned this PR on the community call earlier today, it was suggested that I ping approvers from the other providers to see if they have any strong opposition to this. Starting a lazy consensus, if there is no opposition to this by end of day on the 15th, then we should seek approval and get this merged
Ping @ringtail @feiskyer @nilo19 @marwanad @hello2mao @andrewsykim @tghartland, do you have concerns about this?

I agree. But I think there should also be a magic number for GPU.

So should we set up a better way to solve such problem. Magic number may not be the best choice.

JoelSpeed · 2020-06-10T07:54:19Z

@ringtail Why would there be a magic number for GPU, I haven't got much experience with them personally but I assumed the amount of GPUs would always be a small discrete number, are you suggesting there can be 1.05 GPUs attached to a system?

MaciekPytel · 2020-06-10T08:52:51Z

cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go

+	}
+	larger := math.Max(float64(qtyList[0].MilliValue()), float64(qtyList[1].MilliValue()))
+	smaller := math.Min(float64(qtyList[0].MilliValue()), float64(qtyList[1].MilliValue()))
+	if larger-smaller > larger*maxDifferenceRatio {


nit: return larger-smaller <= larger * maxDifferenceRatio ?

MaciekPytel · 2020-06-10T08:56:23Z

cluster-autoscaler/processors/nodegroupset/compare_nodegroups.go

 			return false
 		}
 	}
 	return true
 }

+func compareResourceListWithTolerance(qtyList []resource.Quantity, maxDifferenceRatio float64) bool {


It's a minor detail, but for a function that returns a bool I'd prefer a name that makes it obvious what true/false means.

I was trying to keep consistent with the compareResourceMapsWithTolerance, though I agree, it's not a great name. If I rename one, should I rename the other too? How about resourceMapsWithinTolerance and resourceListWithinTolerance?

Fair point on consistency. Renaming both sgtm.

MaciekPytel · 2020-06-10T09:05:47Z

As discussed on sig meeting I'm fine with this change and I will approve it once lazy consensus is reached.

Re: GPUs - I also thought GPU number is discrete?

Also in general this part of logic is designed to be easily replaceable if a provider has specific requirements. There are already implementations of NodeInfoComparator for AWS and Azure. They both call IsCloudProviderNodeInfoSimilar with different parameters, but that is not a hard requirement. I don't see a strong reason for why we couldn't have more customized comparators if needed.

In testing, AWS M5 instances can on occasion display approximately a 1% difference in memory capacity between availability zones, deployed with the same launch configuration and same AMI. Allow a 1.5% tolerance to give some buffer on the actual amount of memory discrepancy since in testing, some examples were just over 1% (eg 1.05%, 1.1%). Tests are included with capacity values taken from real instances to prevent future regression.

JoelSpeed · 2020-06-10T11:01:48Z

@MaciekPytel I resolved your two comments, PTAL

DWSR · 2020-06-23T19:01:58Z

Will this be backported to 1.16 as that's the latest version supported on EKS?

JoelSpeed · 2020-06-23T19:07:12Z

@MaciekPytel @elmiko Can we get this merged now, there's been no objections in the last two weeks?

elmiko · 2020-06-23T19:08:59Z

@JoelSpeed thanks for the reminder, i believe we decided at the meeting 2 weeks ago to give this 1 week lazy consensus, so yeah we should move forward.
/approve

elmiko · 2020-06-23T19:10:08Z

oh, oops. forgot i can't approve here.
/lgtm

MaciekPytel · 2020-06-24T11:23:26Z

/lgtm
/approve

k8s-ci-robot · 2020-06-24T11:24:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JoelSpeed · 2020-06-30T10:12:44Z

/cherry-pick release-1.18

This did not seem to work, guess we have to manually do it

… when comparing nodegroups

[CA-1.17] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

[CA-1.18] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

… when comparing nodegroups

elmiko · 2023-02-21T17:54:59Z

i think this problem might be rearing its head again, i am seeing balance failures when using the clustapi / azure provider combination, it's rejecting due to memory differences.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 11, 2020

k8s-ci-robot requested review from aleksandra-malinowska and Jeffwan May 11, 2020 10:04

JoelSpeed mentioned this pull request May 11, 2020

BUG 1824215: Allow small tolerance on memory capacity when comparing nodegroups openshift/kubernetes-autoscaler#152

Merged

enxebre reviewed May 11, 2020

View reviewed changes

JoelSpeed mentioned this pull request May 11, 2020

Uneven scale-up of AWS ASG's #2020

Closed

JoelSpeed force-pushed the memory-tolerance-quantity branch from 2595cee to 25cf22c Compare May 13, 2020 09:56

k8s-ci-robot assigned enxebre May 13, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2020

JoelSpeed changed the title ~~Use quantities for memory capacity differences~~ Allow small tolerance on memory capacity when comparing nodegroups May 13, 2020

elmiko approved these changes May 13, 2020

View reviewed changes

k8s-ci-robot assigned elmiko May 13, 2020

MaciekPytel reviewed Jun 10, 2020

View reviewed changes

JoelSpeed force-pushed the memory-tolerance-quantity branch from 25cf22c to be1d9cb Compare June 10, 2020 11:00

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2020

k8s-ci-robot assigned MaciekPytel Jun 24, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2020

k8s-ci-robot merged commit 67dce2e into kubernetes:master Jun 24, 2020

JoelSpeed deleted the memory-tolerance-quantity branch June 30, 2020 09:08

nigelellis mentioned this pull request Jul 24, 2020

Cluster Autoscaler patch releases #3317

Closed

wwentland pushed a commit to wwentland/autoscaler that referenced this pull request Jul 26, 2020

Cherry-pick kubernetes#3124: Allow small tolerance on memory capacity…

0e78e0a

… when comparing nodegroups

wwentland mentioned this pull request Jul 26, 2020

[CA-1.18] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups #3356

Merged

wwentland pushed a commit to wwentland/autoscaler that referenced this pull request Jul 26, 2020

Cherry-pick kubernetes#3124: Allow small tolerance on memory capacity…

e9f5dba

… when comparing nodegroups

wwentland pushed a commit to wwentland/autoscaler that referenced this pull request Jul 26, 2020

Cherry-pick kubernetes#3124: Allow small tolerance on memory capacity…

c22f86c

… when comparing nodegroups

wwentland mentioned this pull request Jul 26, 2020

[CA-1.17] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups #3357

Merged

k8s-ci-robot added a commit that referenced this pull request Jul 27, 2020

Merge pull request #3357 from babilen5/cherry-pick-3124-to-1.17

ad68bcd

[CA-1.17] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

k8s-ci-robot added a commit that referenced this pull request Jul 27, 2020

Merge pull request #3356 from babilen5/cherry-pick-3124-to-1.18

6438180

[CA-1.18] Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

k8s-ci-robot added a commit that referenced this pull request Jul 27, 2020

Merge pull request #3248 from DWSR/cherry-pick-3124-to-1.16

630e152

Cherry-pick #3124: Allow small tolerance on memory capacity when comparing nodegroups

carlosjgp mentioned this pull request Aug 5, 2020

[AWS][Cluster Autoscale] Cluster Autoscaler is randomly adding and deleting nodes in the node groups, results in uneven node distribution across different zones #3082

Closed

binman-docker mentioned this pull request Sep 15, 2020

Cluster-autoscaler not balancing similar node groups on AWS #3515

Closed

colin-welch pushed a commit to Paperspace/autoscaler that referenced this pull request Mar 5, 2021

Cherry-pick kubernetes#3124: Allow small tolerance on memory capacity…

0eb0b99

… when comparing nodegroups

CCOLLOT mentioned this pull request Dec 22, 2022

Increase or Allow controlling the value of MaxCapacityMemoryDifferenceRatio when comparing Node Groups #5381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow small tolerance on memory capacity when comparing nodegroups #3124

Allow small tolerance on memory capacity when comparing nodegroups #3124

JoelSpeed commented May 11, 2020 •

edited

Loading

enxebre May 11, 2020 •

edited

Loading

JoelSpeed May 11, 2020

enxebre commented May 13, 2020

enxebre commented May 13, 2020

elmiko left a comment

elmiko commented May 13, 2020

wwentland commented May 14, 2020 •

edited

Loading

JoelSpeed commented May 15, 2020

Jeffwan commented May 16, 2020 •

edited

Loading

JoelSpeed commented May 18, 2020

Jeffwan commented May 18, 2020 •

edited

Loading

JoelSpeed commented May 19, 2020

Jeffwan commented May 19, 2020

JoelSpeed commented Jun 8, 2020

ringtail commented Jun 10, 2020

ringtail commented Jun 10, 2020

JoelSpeed commented Jun 10, 2020

MaciekPytel Jun 10, 2020

MaciekPytel Jun 10, 2020

JoelSpeed Jun 10, 2020

MaciekPytel Jun 10, 2020

MaciekPytel commented Jun 10, 2020

JoelSpeed commented Jun 10, 2020

DWSR commented Jun 23, 2020

JoelSpeed commented Jun 23, 2020

elmiko commented Jun 23, 2020

elmiko commented Jun 23, 2020

MaciekPytel commented Jun 24, 2020

k8s-ci-robot commented Jun 24, 2020

JoelSpeed commented Jun 30, 2020 •

edited

Loading

elmiko commented Feb 21, 2023

Allow small tolerance on memory capacity when comparing nodegroups #3124

Allow small tolerance on memory capacity when comparing nodegroups #3124

Conversation

JoelSpeed commented May 11, 2020 • edited Loading

enxebre May 11, 2020 • edited Loading

Choose a reason for hiding this comment

JoelSpeed May 11, 2020

Choose a reason for hiding this comment

enxebre commented May 13, 2020

enxebre commented May 13, 2020

elmiko left a comment

Choose a reason for hiding this comment

elmiko commented May 13, 2020

wwentland commented May 14, 2020 • edited Loading

JoelSpeed commented May 15, 2020

Jeffwan commented May 16, 2020 • edited Loading

JoelSpeed commented May 18, 2020

Jeffwan commented May 18, 2020 • edited Loading

JoelSpeed commented May 19, 2020

Jeffwan commented May 19, 2020

JoelSpeed commented Jun 8, 2020

ringtail commented Jun 10, 2020

ringtail commented Jun 10, 2020

JoelSpeed commented Jun 10, 2020

MaciekPytel Jun 10, 2020

Choose a reason for hiding this comment

MaciekPytel Jun 10, 2020

Choose a reason for hiding this comment

JoelSpeed Jun 10, 2020

Choose a reason for hiding this comment

MaciekPytel Jun 10, 2020

Choose a reason for hiding this comment

MaciekPytel commented Jun 10, 2020

JoelSpeed commented Jun 10, 2020

DWSR commented Jun 23, 2020

JoelSpeed commented Jun 23, 2020

elmiko commented Jun 23, 2020

elmiko commented Jun 23, 2020

MaciekPytel commented Jun 24, 2020

k8s-ci-robot commented Jun 24, 2020

JoelSpeed commented Jun 30, 2020 • edited Loading

elmiko commented Feb 21, 2023

JoelSpeed commented May 11, 2020 •

edited

Loading

enxebre May 11, 2020 •

edited

Loading

wwentland commented May 14, 2020 •

edited

Loading

Jeffwan commented May 16, 2020 •

edited

Loading

Jeffwan commented May 18, 2020 •

edited

Loading

JoelSpeed commented Jun 30, 2020 •

edited

Loading