Scale from 0, unwanted nodes #2165

okgolove · 2019-07-05T10:24:11Z

Hello. I have three ASGs:
main [min:1, max: 1]
spots [min: 1, max:10]
test-asg [min:0, max:0, tainted]

Taint is specified for ASG and instances tags.

CA always creates a new node in the test-asg group there are no pods being scheduled on the test ASG nodes though. Then, it deletes the node (after unneeded period) and creates again (loop).

How can I fix this?

I0705 09:32:44.591122       1 auto_scaling_groups.go:245] Regenerating instance to ASG map for ASGs: [spots test-asg]
W0705 09:32:44.802636       1 clusterstate.go:539] Readiness for node group test-asg not found
W0705 09:32:44.804225       1 clusterstate.go:321] Failed to find readiness information for test-asg
W0705 09:32:44.804240       1 clusterstate.go:377] Failed to find readiness information for test-asg
W0705 09:32:44.804245       1 clusterstate.go:321] Failed to find readiness information for test-asg

The text was updated successfully, but these errors were encountered:

okgolove · 2019-07-11T09:06:38Z

relates #2008

Jeffwan · 2019-07-16T23:01:38Z

what's the expand strategy are you using?

Jeffwan · 2019-07-16T23:01:53Z

Sorry for late response. just come back from vacation :D

okgolove · 2019-07-17T08:26:02Z

I'm using the default expander (i.e. random).

exdx · 2019-08-10T14:56:13Z

What settings are you using when running the autoscaler (the flags)? And which version? Could be a something in the configuration causing this

okgolove · 2019-08-10T15:11:36Z

cluster-autoscaler:v1.3.3

      --cloud-provider=aws
      --namespace=kube-system
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/prod.test.com
      --balance-similar-node-groups=true
      --logtostderr=true
      --skip-nodes-with-local-storage=false
      --skip-nodes-with-system-pods=false
      --stderrthreshold=info
      --v=4

exdx · 2019-08-12T12:05:23Z

Maybe something to do with balance-similar-node-groups=true? By default its false. This flag attempts to balance similar node groups which is somewhat like the behavior you're seeing. I would try with it set to false, just to see.

MaciekPytel · 2019-08-12T12:52:59Z

My guess would be scale-from-0 logic incorrectly guessing how the node would look like. CA sees the template node that would help the pending pods, so it scales up. Once the node is created it turns out it looks differently than expected and it doesn't really fit the pods. So CA deletes it. Once there are 0 nodes it goes back to using scale-from-0 template and the situation repeats.

okgolove · 2019-08-12T16:07:24Z

I don't like the message:
Failed to find readiness information for test-asg

It seems something is going wrong.
As I wrote it looks like #2008

Jeffwan · 2019-10-11T00:38:53Z

/area provider/aws

fejta-bot · 2020-01-09T01:25:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

okgolove · 2020-01-09T07:38:44Z

It seems it got fixed.

okgolove mentioned this issue Jul 11, 2019

AWS Ondemand not scaled up if Spot requests remain "Open" #1795

Closed

k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Oct 11, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2020

okgolove closed this as completed Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale from 0, unwanted nodes #2165

Scale from 0, unwanted nodes #2165

okgolove commented Jul 5, 2019 •

edited

Loading

okgolove commented Jul 11, 2019

Jeffwan commented Jul 16, 2019

Jeffwan commented Jul 16, 2019

okgolove commented Jul 17, 2019

exdx commented Aug 10, 2019

okgolove commented Aug 10, 2019

exdx commented Aug 12, 2019 •

edited

Loading

MaciekPytel commented Aug 12, 2019

okgolove commented Aug 12, 2019 •

edited

Loading

Jeffwan commented Oct 11, 2019

fejta-bot commented Jan 9, 2020

okgolove commented Jan 9, 2020

Scale from 0, unwanted nodes #2165

Scale from 0, unwanted nodes #2165

Comments

okgolove commented Jul 5, 2019 • edited Loading

okgolove commented Jul 11, 2019

Jeffwan commented Jul 16, 2019

Jeffwan commented Jul 16, 2019

okgolove commented Jul 17, 2019

exdx commented Aug 10, 2019

okgolove commented Aug 10, 2019

exdx commented Aug 12, 2019 • edited Loading

MaciekPytel commented Aug 12, 2019

okgolove commented Aug 12, 2019 • edited Loading

Jeffwan commented Oct 11, 2019

fejta-bot commented Jan 9, 2020

okgolove commented Jan 9, 2020

okgolove commented Jul 5, 2019 •

edited

Loading

exdx commented Aug 12, 2019 •

edited

Loading

okgolove commented Aug 12, 2019 •

edited

Loading