cluster autoscaler crashes when master api is unavailable #4776

matti · 2022-03-30T10:08:10Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.23.0

What k8s version are you using (kubectl version)?:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"b631974d68ac5045e076c86a5c66fba6f128dc72", GitTreeState:"clean", BuildDate:"2022-01-19T17:51:12Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

aws eks 1.21

What did you expect to happen?:

cluster autoscaler not to crash

What happened instead?:

I0330 09:57:35.062929       1 flags.go:57] FLAG: --write-status-configmap="true"
I0330 09:57:35.062935       1 main.go:401] Cluster Autoscaler 1.23.0
F0330 09:58:05.064761       1 main.go:430] Failed to get nodes from apiserver: Get "https://[fd9f:3e81:ae5b::1]:443/api/v1/nodes": dial tcp [fd9f:3e81:ae5b::1]:443: i/o timeout
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1038 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x611e4e0, 0x3, 0x0, 0xc0005bb570, 0x0, {0x4d57ec2, 0x1}, 0xc0005f2350, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:987 +0x5fd
k8s.io/klog/v2.(*loggingT).printf(0x0, 0x0, 0x0, {0x0, 0x0}, {0x3cb5678, 0x26}, {0xc0005f2350, 0x1, 0x1})
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:753 +0x1c5
k8s.io/klog/v2.Fatalf(...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1532
main.main()
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:430 +0x4ce

goroutine 18 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:1181 +0x6a
created by k8s.io/klog/v2.init.0
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/klog/v2/klog.go:420 +0xfb

goroutine 29 [chan receive]:
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/exoscale/internal/k8s.io/klog.(*loggingT).flushDaemon(0xc00007f3e0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/exoscale/internal/k8s.io/klog/klog.go:1026 +0x6a
created by k8s.io/autoscaler/cluster-autoscaler/cloudprovider/exoscale/internal/k8s.io/klog.init.0
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/exoscale/internal/k8s.io/klog/klog.go:427 +0xf4

goroutine 30 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc0001c2000)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/go.opencensus.io/stats/view/worker.go:276 +0xb9
created by go.opencensus.io/stats/view.init.0
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/go.opencensus.io/stats/view/worker.go:34 +0x92

goroutine 56 [IO wait]:
internal/poll.runtime_pollWait(0x7f91522bb4f0, 0x72)
	/usr/local/go/src/runtime/netpoll.go:234 +0x89
internal/poll.(*pollDesc).wait(0xc000309c00, 0xc00005e000, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000309c00)
	/usr/local/go/src/internal/poll/fd_unix.go:402 +0x22c
net.(*netFD).accept(0xc000309c00)
	/usr/local/go/src/net/fd_unix.go:173 +0x35
net.(*TCPListener).accept(0xc0004a9f20)
	/usr/local/go/src/net/tcpsock_posix.go:140 +0x28
net.(*TCPListener).Accept(0xc0004a9f20)
	/usr/local/go/src/net/tcpsock.go:262 +0x3d
net/http.(*Server).Serve(0xc0001c8620, {0x421e1c0, 0xc0004a9f20})
	/usr/local/go/src/net/http/server.go:3002 +0x394
net/http.(*Server).ListenAndServe(0xc0001c8620)
	/usr/local/go/src/net/http/server.go:2931 +0x7d
net/http.ListenAndServe(...)
	/usr/local/go/src/net/http/server.go:3185
main.main.func1()
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:413 +0x1fa
created by main.main
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2d7

How to reproduce it (as minimally and precisely as possible):

Related to #2556, #4464 and 518

The text was updated successfully, but these errors were encountered:

matti · 2022-03-30T10:16:14Z

the problem is that restarting pods go to crashLoopBackOff and when master api becomes available again, the cluster runs without autoscaler that could work until backoff has been suffered

k8s-triage-robot · 2022-07-03T09:53:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-08-02T10:49:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-09-01T10:50:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-09-01T10:51:16Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

matti added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2022

jbartosik added the area/cluster-autoscaler label Apr 4, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 2, 2022

k8s-ci-robot closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster autoscaler crashes when master api is unavailable #4776

cluster autoscaler crashes when master api is unavailable #4776

matti commented Mar 30, 2022

matti commented Mar 30, 2022 •

edited

Loading

k8s-triage-robot commented Jul 3, 2022

k8s-triage-robot commented Aug 2, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

cluster autoscaler crashes when master api is unavailable #4776

cluster autoscaler crashes when master api is unavailable #4776

Comments

matti commented Mar 30, 2022

matti commented Mar 30, 2022 • edited Loading

k8s-triage-robot commented Jul 3, 2022

k8s-triage-robot commented Aug 2, 2022

k8s-triage-robot commented Sep 1, 2022

k8s-ci-robot commented Sep 1, 2022

matti commented Mar 30, 2022 •

edited

Loading