CA 1.15.3 frequent SIGSEGV on cluster state refresh after scaling event #2491

bpinske · 2019-10-28T16:27:02Z

Issue occurs commonly after an ASG scaling event, both up and down.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x22248a5]

goroutine 3526 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc000d40dc0, 0xbf64c89e313d3909, 0x231b1ce4443, 0x4cb6be0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc000d40dc0, 0xc0009c3a80, 0x8, 0x8, 0xc00094e870, 0xbf64c89e313d3909, 0x231b1ce4443, 0x4cb6be0, 0x0, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0011efc20, 0xc0009c3a80, 0x8, 0x8, 0xc00094e870, 0xbf64c89e313d3909, 0x231b1ce4443, 0x4cb6be0, 0xc0009c3ac0, 0x6)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:569 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0011efc20, 0xbf64c89e313d3909, 0x231b1ce4443, 0x4cb6be0, 0x0, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc0002c8050)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff78e0, 0xc00032cb80)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec

Flags

I1028 16:16:10.010017       1 flags.go:52] FLAG: --alsologtostderr="false"
I1028 16:16:10.010028       1 flags.go:52] FLAG: --balance-similar-node-groups="true"
I1028 16:16:10.010032       1 flags.go:52] FLAG: --cloud-config=""
I1028 16:16:10.010035       1 flags.go:52] FLAG: --cloud-provider="aws"
I1028 16:16:10.010040       1 flags.go:52] FLAG: --cloud-provider-gce-lb-src-cidrs="130.211.0.0/22,209.85.152.0/22,209.85.204.0/22,35.191.0.0/16"
I1028 16:16:10.010049       1 flags.go:52] FLAG: --cluster-name=""
I1028 16:16:10.010054       1 flags.go:52] FLAG: --cores-total="0:320000"
I1028 16:16:10.010059       1 flags.go:52] FLAG: --estimator="binpacking"
I1028 16:16:10.010063       1 flags.go:52] FLAG: --expander="least-waste"
I1028 16:16:10.010067       1 flags.go:52] FLAG: --expendable-pods-priority-cutoff="-10"
I1028 16:16:10.010071       1 flags.go:52] FLAG: --filter-out-schedulable-pods-uses-packing="true"
I1028 16:16:10.010075       1 flags.go:52] FLAG: --gpu-total="[]"
I1028 16:16:10.010080       1 flags.go:52] FLAG: --ignore-daemonsets-utilization="false"
I1028 16:16:10.010084       1 flags.go:52] FLAG: --ignore-mirror-pods-utilization="false"
I1028 16:16:10.010089       1 flags.go:52] FLAG: --ignore-taint="[]"
I1028 16:16:10.010094       1 flags.go:52] FLAG: --kubeconfig=""
I1028 16:16:10.010099       1 flags.go:52] FLAG: --kubernetes=""
I1028 16:16:10.010103       1 flags.go:52] FLAG: --leader-elect="true"
I1028 16:16:10.010111       1 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I1028 16:16:10.010117       1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I1028 16:16:10.010122       1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints"
I1028 16:16:10.010128       1 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I1028 16:16:10.010133       1 flags.go:52] FLAG: --log-backtrace-at=":0"
I1028 16:16:10.010423       1 flags.go:52] FLAG: --log-dir=""
I1028 16:16:10.010429       1 flags.go:52] FLAG: --log-file=""
I1028 16:16:10.010447       1 flags.go:52] FLAG: --log-file-max-size="1800"
I1028 16:16:10.010453       1 flags.go:52] FLAG: --logtostderr="true"
I1028 16:16:10.010458       1 flags.go:52] FLAG: --max-autoprovisioned-node-group-count="15"
I1028 16:16:10.010471       1 flags.go:52] FLAG: --max-bulk-soft-taint-count="10"
I1028 16:16:10.010476       1 flags.go:52] FLAG: --max-bulk-soft-taint-time="3s"
I1028 16:16:10.010481       1 flags.go:52] FLAG: --max-empty-bulk-delete="10"
I1028 16:16:10.010486       1 flags.go:52] FLAG: --max-failing-time="15m0s"
I1028 16:16:10.010492       1 flags.go:52] FLAG: --max-graceful-termination-sec="600"
I1028 16:16:10.010497       1 flags.go:52] FLAG: --max-inactivity="10m0s"
I1028 16:16:10.010507       1 flags.go:52] FLAG: --max-node-provision-time="15m0s"
I1028 16:16:10.010512       1 flags.go:52] FLAG: --max-nodes-total="0"
I1028 16:16:10.010517       1 flags.go:52] FLAG: --max-total-unready-percentage="45"
I1028 16:16:10.010523       1 flags.go:52] FLAG: --memory-total="0:6400000"
I1028 16:16:10.010529       1 flags.go:52] FLAG: --min-replica-count="2"
I1028 16:16:10.010533       1 flags.go:52] FLAG: --namespace="kube-system"
I1028 16:16:10.010544       1 flags.go:52] FLAG: --new-pod-scale-up-delay="0s"
I1028 16:16:10.010549       1 flags.go:52] FLAG: --node-autoprovisioning-enabled="false"
I1028 16:16:10.010554       1 flags.go:52] FLAG: --node-deletion-delay-timeout="2m0s"
I1028 16:16:10.010559       1 flags.go:52] FLAG: --node-group-auto-discovery="[asg:tag=k8s.io/cluster-autoscaler/dev-demo-main-p-devcia2,k8s.io/cluster-autoscaler/enabled]"
I1028 16:16:10.010584       1 flags.go:52] FLAG: --nodes="[]"
I1028 16:16:10.010595       1 flags.go:52] FLAG: --ok-total-unready-count="3"
I1028 16:16:10.010600       1 flags.go:52] FLAG: --regional="false"
I1028 16:16:10.010605       1 flags.go:52] FLAG: --scale-down-candidates-pool-min-count="50"
I1028 16:16:10.010613       1 flags.go:52] FLAG: --scale-down-candidates-pool-ratio="0.1"
I1028 16:16:10.010619       1 flags.go:52] FLAG: --scale-down-delay-after-add="1m0s"
I1028 16:16:10.010626       1 flags.go:52] FLAG: --scale-down-delay-after-delete="0s"
I1028 16:16:10.010637       1 flags.go:52] FLAG: --scale-down-delay-after-failure="3m0s"
I1028 16:16:10.010642       1 flags.go:52] FLAG: --scale-down-enabled="true"
I1028 16:16:10.010647       1 flags.go:52] FLAG: --scale-down-gpu-utilization-threshold="0.5"
I1028 16:16:10.010652       1 flags.go:52] FLAG: --scale-down-non-empty-candidates-count="1"
I1028 16:16:10.010657       1 flags.go:52] FLAG: --scale-down-unneeded-time="30s"
I1028 16:16:10.010662       1 flags.go:52] FLAG: --scale-down-unready-time="20m0s"
I1028 16:16:10.010671       1 flags.go:52] FLAG: --scale-down-utilization-threshold="0.5"
I1028 16:16:10.010676       1 flags.go:52] FLAG: --scan-interval="10s"
I1028 16:16:10.010681       1 flags.go:52] FLAG: --skip-headers="false"
I1028 16:16:10.010686       1 flags.go:52] FLAG: --skip-log-headers="false"
I1028 16:16:10.010691       1 flags.go:52] FLAG: --skip-nodes-with-local-storage="false"
I1028 16:16:10.010696       1 flags.go:52] FLAG: --skip-nodes-with-system-pods="true"
I1028 16:16:10.010705       1 flags.go:52] FLAG: --stderrthreshold="0"
I1028 16:16:10.010711       1 flags.go:52] FLAG: --test.bench=""
I1028 16:16:10.010715       1 flags.go:52] FLAG: --test.benchmem="false"
I1028 16:16:10.010720       1 flags.go:52] FLAG: --test.benchtime="1s"
I1028 16:16:10.010725       1 flags.go:52] FLAG: --test.blockprofile=""
I1028 16:16:10.010730       1 flags.go:52] FLAG: --test.blockprofilerate="1"
I1028 16:16:10.010740       1 flags.go:52] FLAG: --test.count="1"
I1028 16:16:10.010745       1 flags.go:52] FLAG: --test.coverprofile=""
I1028 16:16:10.010764       1 flags.go:52] FLAG: --test.cpu=""
I1028 16:16:10.010769       1 flags.go:52] FLAG: --test.cpuprofile=""
I1028 16:16:10.010774       1 flags.go:52] FLAG: --test.failfast="false"
I1028 16:16:10.010785       1 flags.go:52] FLAG: --test.list=""
I1028 16:16:10.010790       1 flags.go:52] FLAG: --test.memprofile=""
I1028 16:16:10.010795       1 flags.go:52] FLAG: --test.memprofilerate="0"
I1028 16:16:10.010800       1 flags.go:52] FLAG: --test.mutexprofile=""
I1028 16:16:10.010805       1 flags.go:52] FLAG: --test.mutexprofilefraction="1"
I1028 16:16:10.010810       1 flags.go:52] FLAG: --test.outputdir=""
I1028 16:16:10.010820       1 flags.go:52] FLAG: --test.parallel="4"
I1028 16:16:10.010825       1 flags.go:52] FLAG: --test.run=""
I1028 16:16:10.010830       1 flags.go:52] FLAG: --test.short="false"
I1028 16:16:10.010835       1 flags.go:52] FLAG: --test.testlogfile=""
I1028 16:16:10.010839       1 flags.go:52] FLAG: --test.timeout="0s"
I1028 16:16:10.010844       1 flags.go:52] FLAG: --test.trace=""
I1028 16:16:10.010853       1 flags.go:52] FLAG: --test.v="false"
I1028 16:16:10.010858       1 flags.go:52] FLAG: --unremovable-node-recheck-timeout="5m0s"
I1028 16:16:10.010863       1 flags.go:52] FLAG: --v="4"
I1028 16:16:10.010868       1 flags.go:52] FLAG: --vmodule=""
I1028 16:16:10.010874       1 flags.go:52] FLAG: --write-status-configmap="true"
I1028 16:16:10.010887       1 main.go:354] Cluster Autoscaler 1.15.3```

The text was updated successfully, but these errors were encountered:

frobware · 2019-10-28T22:37:39Z

I believe this is fixed by #2096

bpinske · 2019-10-28T22:45:30Z

I would agree that is the same issue. It mustn't have been backported yet.

bpinske closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA 1.15.3 frequent SIGSEGV on cluster state refresh after scaling event #2491

CA 1.15.3 frequent SIGSEGV on cluster state refresh after scaling event #2491

bpinske commented Oct 28, 2019 •

edited

Loading

frobware commented Oct 28, 2019

bpinske commented Oct 28, 2019

CA 1.15.3 frequent SIGSEGV on cluster state refresh after scaling event #2491

CA 1.15.3 frequent SIGSEGV on cluster state refresh after scaling event #2491

Comments

bpinske commented Oct 28, 2019 • edited Loading

frobware commented Oct 28, 2019

bpinske commented Oct 28, 2019

bpinske commented Oct 28, 2019 •

edited

Loading