Imbalance with balance similar node groups #2892

JulienBalestra · 2020-03-04T15:41:18Z

In the current AWS implementation, for example, the comparaison of node templates vs real world nodes produce imbalance scale ups when using the option --balance-similar-node-groups.

For example, when having 3 ASGs (1 per zone) with one ASG sets to 0, the autoscaler is always going to scale the ASG with real world nodes instead of balancing the under provisioned ASG (the one at 0).

I suppose this is due to how we build node template and compare it to the real world ones.

To avoid this issue, we've been running these experimental changes:

These experimental changes have been running properly for few months and gave us the expected results.
I'd like to open a conversation, why we initially used a mix comparaison of real and templated nodes.

This is also a potential issue when users change the instance type of the running ASG (#2840).

Let me know if you need more details I might have forgotten.

The text was updated successfully, but these errors were encountered:

MaciekPytel · 2020-03-06T13:35:06Z

At a glance, this basically always uses TemplateNodeInfo, right? This was discussed at length in #1021.

tl;dr TemplateNodeInfo is always wrong. Even in controlled and uniform env like GKE it's impossible to get it 100% right (and, believe me, I tried). If you ever have a pod that barely fits or barely doesn't fit on a node (within the margin of error of TemplateNodeInfo) and you use a patch that always uses TemplateNodeInfo the effects are devastating (including infinite scale-up and / or pods remaining pending forever).

I think a possibly better solution to the original problem is to add logic, where if you have multiple ~identical NodeGroup you use an existing node in one of the groups as a template for all the groups (overriding only the zone label).

fejta-bot · 2020-06-04T14:00:34Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-07-04T14:42:38Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-08-03T15:23:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-08-03T15:23:56Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Use real-world nodeInfo from similar groups when BalanceSimilarNodeGroups. Where available, the CA evaluates node groups capacities and labels using real-world nodes, for optimal accuracy (accounting for kubelet and system reserved resources, for instance). It fallbacks to synthetic nodeInfos infered from ASG/MIG/VMSs templates when no real nodes are available yet (ie. upscaling from zero/empty node group). That asymetric evaluation can prevent `BalanceSimilarNodeGroups` from working reliably when upscaling from zero: * The first upscaled nodeGroup will get a node (capacity initially evaluated from ASG/MIG/VMSS template, then from the first real node, runtime labels) * Other empty nodeGroups will likely be considered dissimilar to the first one: we're comparing ASG templates with real-world nodeInfo * In the case of `least-waste` expander for instance, CA might favor the nodegroup from real-world node, accounting for reserved resources (best usage) over the others This change set implements [Maciek Pytel suggestion (from a previous attempt at fixing this issue)](kubernetes#2892 (comment)): we try to use real-world nodes examples from ~similar node groups, before a last resort fallback to synthetic nodeInfos built from ASG templates. We compare nodeInfos using the `FindSimilarNodeGroups` processor primitives in order to leverage per cloud provider comparators (eg. label exclusions). We look for similar nodegroups using synthetic nodeInfo from ASG/MIG (where available): comparing their nodeInfo from real-world nodes to a synthetic nodeInfo from the empty nodegroup wouldn't work (that's the root cause of this issue). When ~similar nodegroups are found that way, we use their nodeInfos built from real-world nodes as models for empty nodegroups (but keeping original location related labels). Tested with AWS ASGs, GCP MIGs and Azure VMSS.

This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).

Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)

This is a second attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608). Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Some uses cases includes: * Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach) * Edited/updated ASGs/MIGs taints and labels * Updated instance type Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).

Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)

This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered. `GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.

This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.

Comparing synthetic NodeInfos obtained from nodegroup's TemplateInfo() to NodeInfos obtained from real-world nodes is bound to fail, even with kube reservations provided through nodegroups labels/annotations (for instance: kernel mem reservation is hard to predict). This makes `balance-similar-node-groups` likely to misbehave when `scale-up-from-zero` is enabled (and a first nodegroup gets a real node), for instance. Following [Maciek Pytel suggestion](kubernetes#3608 (comment)) (from discussions on a previous attempt at solving this), we can implement a NodeInfo Processor that would improve template-generated NodeInfos whenever a node was created off a similar nodegroup. We're storing node's virtual origin through machineid, which works fine but is a bit ugly (suggestions welcome). Tested this solves balance-similar-node-groups + scale-up-from-zero, with various instance types on AWS and GCP. Previous attempts to solve that issue/discussions: * kubernetes#2892 (comment) * kubernetes#3608 (comment)

This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609). Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type. Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods). Per kubernetes#3609 discussion, using a NodeInfo processor is prefered. `GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2020

k8s-ci-robot closed this as completed Aug 3, 2020

bpineau mentioned this issue Oct 13, 2020

Fix BalanceSimilarNodeGroups when scaling from zero #3608

Closed

bpineau mentioned this issue Oct 13, 2020

Flag for exclusive usage of template infos (with warnings) #3609

Closed

bpineau mentioned this issue Dec 14, 2020

NodeInfo processor to refine template-based NodeInfos #3761

Closed

bpineau mentioned this issue Apr 8, 2021

NodeInfo Processor for exclusive usage of template infos #4000

Closed

bpineau mentioned this issue Mar 7, 2023

scaling from 0 only scales one of the ASG's detected in spite of balance-similar-node-groups #5352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imbalance with balance similar node groups #2892

Imbalance with balance similar node groups #2892

JulienBalestra commented Mar 4, 2020 •

edited

Loading

MaciekPytel commented Mar 6, 2020

fejta-bot commented Jun 4, 2020

fejta-bot commented Jul 4, 2020

fejta-bot commented Aug 3, 2020

k8s-ci-robot commented Aug 3, 2020

Imbalance with balance similar node groups #2892

Imbalance with balance similar node groups #2892

Comments

JulienBalestra commented Mar 4, 2020 • edited Loading

MaciekPytel commented Mar 6, 2020

fejta-bot commented Jun 4, 2020

fejta-bot commented Jul 4, 2020

fejta-bot commented Aug 3, 2020

k8s-ci-robot commented Aug 3, 2020

JulienBalestra commented Mar 4, 2020 •

edited

Loading