Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS - CA does not tolerate balance-similar-node-groups when ASG min and desired capacity is 0 #2503

Closed
lkoniecz opened this issue Oct 31, 2019 · 29 comments
Assignees
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@lkoniecz
Copy link

There are 3 autoscaling groups tagged with k8s.io/cluster-autoscaler/SandboxEksCluster and k8s.io/cluster-autoscaler/enabled. SandboxEksCluster is my the name of my cluster.

Screenshot from 2019-10-31 10-33-09

Autoscaler is started up with the balance flag
I1031 08:41:01.352107 1 flags.go:52] FLAG: --balance-similar-node-groups="true"

I am operating on simple nginx deployment
kubectl run nginx --image=nginx --replicas=10

and scaling it accordingly so that new worker nodes should be added
kubectl scale deployment/nginx --replicas xx

Each time CA picks a node from the most loaded ASG

I1031 09:00:36.572074       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-8zhq2 is unschedulable
I1031 09:00:36.572081       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-7kvwn is unschedulable
I1031 09:00:36.572089       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-skm9v is unschedulable
I1031 09:00:36.572095       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-c6hjx is unschedulable
I1031 09:00:36.572132       1 scale_up.go:300] Upcoming 0 nodes
I1031 09:00:36.572607       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572624       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572632       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 09:00:36.572644       1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0
I1031 09:00:36.572656       1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0
I1031 09:00:36.572698       1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 2->3 (max: 5)}] 
I1031 09:00:36.572722       1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 size to 3 
I1031 09:00:36.572751       1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 size to 3 

the same happens two times more until ASG reaches its maximum size of 5

I resized the deployment once again and this time CA picked a node from remaining ASGs as the previous is already full

I1031 10:05:37.255691       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-2mhvj is unschedulable
I1031 10:05:37.255696       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-lk7gx is unschedulable
I1031 10:05:37.255737       1 scale_up.go:300] Upcoming 0 nodes
I1031 10:05:37.255751       1 scale_up.go:338] Skipping node group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 - max size reached
I1031 10:05:37.256404       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:05:37.256420       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:05:37.256432       1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:05:37.256439       1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:05:37.256492       1 scale_up.go:521] Splitting scale-up between 2 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}
I1031 10:05:37.256503       1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW 0->1 (max: 5)}]
I1031 10:05:37.256518       1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 1
I1031 10:05:37.256547       1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 1

Repeating the scenario from the beginning with all 3 ASG having min set to 1

I1031 10:23:12.752831       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-wb9p8 is unschedulable
I1031 10:23:12.752847       1 scale_up.go:263] Pod lky/nginx-7cdbd8cdc9-g87s8 is unschedulable
I1031 10:23:12.752914       1 scale_up.go:300] Upcoming 0 nodes
I1031 10:23:12.753948       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0 would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754003       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754026       1 waste.go:57] Expanding Node Group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K would waste 100.00% CPU, 100.00% Memory, 100.00% Blended
I1031 10:23:12.754048       1 scale_up.go:423] Best option to resize: sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:23:12.754080       1 scale_up.go:427] Estimated 1 nodes needed in sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW
I1031 10:23:12.754185       1 scale_up.go:521] Splitting scale-up between 3 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}
I1031 10:23:12.754221       1 scale_up.go:529] Final scale-up plan: [{sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW 1->2 (max: 5)}]
I1031 10:23:12.754249       1 scale_up.go:694] Scale-up: setting group sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 2
I1031 10:23:12.754298       1 auto_scaling_groups.go:211] Setting asg sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW size to 2

Please note the log entry:
I1031 10:23:12.754185 1 scale_up.go:521] Splitting scale-up between 3 similar node groups: {sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1BASG4E9ED460-PGHSCBA70FJW, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1AASGA5AFF92F-1RZHULCJIW9D0, sandbox-eks-cluster-SandboxEksClusterEksClusterAsgUsEast1CASG93B5A949-1V77ZIQFJJY7K}

which could indicate CA really respects --balance-similar-node-groups property.

@Jeffwan
Copy link
Contributor

Jeffwan commented Nov 7, 2019

/area provider/aws

@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Nov 7, 2019
@joekohlsdorf
Copy link

What expander do you have configured?

@lkoniecz
Copy link
Author

@joekohlsdorf least-waste

@joekohlsdorf
Copy link

We had the same problem, changing the expander to random solved it.
That might not be what you want but at least it's a starting point.

@Jeffwan
Copy link
Contributor

Jeffwan commented Nov 18, 2019

/assign @Jeffwan

@chinthakagodawita
Copy link

@joekohlsdorf Cheers for that, we were facing the same issue and setting it to random (from least-waste) has sorted it out for us.

@alexmbird
Copy link

I've just run into this with our clusters. Even with the random expander, all nodes were placed within a single AZ rather than distributed between ASGs in three. Setting asg_min_size / asg_desired_capacity to 1 causes CA to distribute further nodes across the three ASGs correctly, but it means we need to keep a baseline of 3 instances running when at times we could go lower.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2020
@mtb-xt
Copy link

mtb-xt commented Apr 20, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2020
@mtb-xt
Copy link

mtb-xt commented Jul 19, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2020
@lkoniecz
Copy link
Author

Hello,

any updates on it?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2020
@mtb-xt
Copy link

mtb-xt commented Dec 13, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2020
@Jeffwan
Copy link
Contributor

Jeffwan commented Jan 20, 2021

/unassign @Jeffwan
/assign @jaypipes @ellistarn

@k8s-ci-robot k8s-ci-robot assigned ellistarn and jaypipes and unassigned Jeffwan Jan 20, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2021
@mtb-xt
Copy link

mtb-xt commented Apr 20, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2021
@jsravn
Copy link
Contributor

jsravn commented Jun 25, 2021

It seems like a fundamental bug with CA, the templated node infos never match the actual node infos as far as I can tell. #1676 (comment) describes all the reasons why. To me, it seems the main problem is capacity/allocatable fields. I think this could be fixed by ignoring capacity fields that don't exist when comparing two node groups.

@jsravn
Copy link
Contributor

jsravn commented Jun 25, 2021

^ More than that, the actual memory capacities are way off the templates for the AWS provider. I see there was a PR to tolerate up to 1.5%, but in my testing the variance can be much greater. For instance, spawning an m5.4xlarge in eu-west-1 with amazon linux 2 gives 62Gi of memory - 2Gi off of what the template value provides, which is more than 1.5% (~3%). For an m5.large, I saw around a 6% divergence. So the toleration needs to be much larger than it is currently, in addition to my prior suggestion to ignore mismatched capacity/allocatable fields.

@awprice
Copy link
Contributor

awprice commented Oct 28, 2021

Seeing this issue as well. I've dug into the logic that determines if a node group is "similar" and am experiencing what @jsravn is seeing - it ignores a node group due to the node's memory being off the template's memory.

@jaypipes
Copy link
Contributor

@jsravn @awprice You both seem to be describing a different issue than the one originally described in this GH issue.

The original issue is about spreading nodes properly across ASGs having similar sizing. The solution to the original issue was to use the random expander instead of the least-cost expander. @alexmbird then noted a different issue related to spreading across node groups in multiple AZs.

However, @jsravn and @awprice, you seem to be describing a different problem altogether, which is that the calculation of allocatable resources for a Kubernetes Node (which takes into account the amount of memory reserved for the kubelet on the EC2 worker node instance) is different from the Cluster Autoscaler's TemplateNodeInfo which, for the AWS cloud provider inside CA, is created here and here.

You will note this in the aws_manager.go file:

// TODO: use proper allocatable!!
node.Status.Allocatable = node.Status.Capacity

So it's clear that someone, sometime in the past, realized that this allocatable versus capacity calculation was going to be problematic...

@jsravn @awprice if I have correctly summarized your issue (and how it's different from the original poster's issue), would you mind creating a new GH issue describing your specific problem and we'll track it separately. I'm thinking we should close this particular issue out because the solution to the original problem is to use the random expander and not the least-cost expander.

@jsravn
Copy link
Contributor

jsravn commented Oct 29, 2021

@jaypipes I created #4165 a while ago along with a suggested PR.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2022
@jsravn
Copy link
Contributor

jsravn commented Jan 27, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 27, 2022
@ellistarn ellistarn removed their assignment May 27, 2022
@mtb-xt
Copy link

mtb-xt commented Jun 6, 2022

Seriously, for everyone who uses K8S autoscaler in AWS - try Karpenter (https://karpenter.sh/)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests