Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nil pointer panic in CA 1.15.1 #2214

Closed
mboersma opened this issue Jul 25, 2019 · 6 comments
Closed

Nil pointer panic in CA 1.15.1 #2214

mboersma opened this issue Jul 25, 2019 · 6 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@mboersma
Copy link
Contributor

Running some tests to see if I can trigger #2154 eventually, and I got a nil pointer panic:

I0724 23:38:52.424553       1 scale_up.go:420] No expansion options
I0724 23:38:52.425447       1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
I0724 23:39:02.590388       1 static_autoscaler.go:255] 1 unregistered nodes present
I0724 23:39:02.598031       1 utils.go:619] No pod using affinity / antiaffinity found in cluster, disabling affinity predicate for this loop
I0724 23:39:02.599269       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-qtvzm is unschedulable
I0724 23:39:02.599297       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-hz4vh is unschedulable
I0724 23:39:02.599305       1 scale_up.go:265] Pod kube-system/keyvault-flexvolume-bh55h is unschedulable
I0724 23:39:02.599311       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9kvvr is unschedulable
I0724 23:39:02.599346       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-78rlf is unschedulable
I0724 23:39:02.599386       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-d9c7p is unschedulable
I0724 23:39:02.599395       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-2g9jm is unschedulable
I0724 23:39:02.599434       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-jwhvw is unschedulable
I0724 23:39:02.599458       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9r9d6 is unschedulable
I0724 23:39:02.599472       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-w84zk is unschedulable
I0724 23:39:02.599483       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-jdrqq is unschedulable
I0724 23:39:02.599525       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-t68dc is unschedulable
I0724 23:39:02.599548       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-5p7rp is unschedulable
I0724 23:39:02.599556       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-vnx44 is unschedulable
I0724 23:39:02.599566       1 scale_up.go:265] Pod kube-system/blobfuse-flexvol-installer-2rwxf is unschedulable
I0724 23:39:02.599573       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-9mjn5 is unschedulable
I0724 23:39:02.599622       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-l28rl is unschedulable
I0724 23:39:02.599644       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-4648h is unschedulable
I0724 23:39:02.599651       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-8d4bg is unschedulable
I0724 23:39:02.599659       1 scale_up.go:265] Pod default/autoscaler-demo-6d8dbf8b76-bvjbr is unschedulable
I0724 23:39:02.599713       1 scale_up.go:268] 7 other pods are also unschedulable
I0724 23:39:02.606144       1 scale_up.go:420] No expansion options
I0724 23:39:02.702743       1 scale_down.go:428] Scale-down calculation: ignoring 10 nodes unremovable in the last 5m0s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 50 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc000197a20, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc000197a20, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc000ea1360, 0xc000863280, 0xb, 0x10, 0xc001b1fda0, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0xc001fc28c0, 0x5)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000ea1360, 0xbf4658e82af9bedd, 0x6901bb56857, 0x4cb5be0, 0x0, 0x0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc0002c80a0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc000458f00)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xemor

This was cluster-autoscaler 1.15.1 on a Kubernetes 1.16.0-alpha.1 cluster on Azure, provisioned with aks-engine.

This may not be reproducible easily, but I'm posting the panic here in case there's an obvious if foo == nil check that we can add.

@bskiba
Copy link
Member

bskiba commented Jul 26, 2019

/kind bug

@pdbogen
Copy link

pdbogen commented Aug 20, 2019

I also encountered this error:

I0820 20:07:42.849315       1 static_autoscaler.go:398] ip-10-3-198-184.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849333       1 static_autoscaler.go:398] ip-10-3-220-20.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849347       1 static_autoscaler.go:398] ip-10-3-75-152.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849360       1 static_autoscaler.go:398] ip-10-3-190-161.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849374       1 static_autoscaler.go:398] ip-10-3-161-213.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849388       1 static_autoscaler.go:398] ip-10-3-153-189.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849426       1 static_autoscaler.go:398] ip-10-3-71-239.ec2.internal is unneeded since 2019-08-20 20:04:31.122629411 +0000 UTC m=+105.408520828 duration 3m11.71988856s
I0820 20:07:42.849443       1 static_autoscaler.go:398] ip-10-3-178-147.ec2.internal is unneeded since 2019-08-20 20:03:00.034518139 +0000 UTC m=+14.320409566 duration 4m42.807999822s
I0820 20:07:42.849458       1 static_autoscaler.go:409] Scale down status: unneededOnly=true lastScaleUpTime=2019-08-20 20:02:50.03392415 +0000 UTC m=+4.319815527 lastScaleDownDeleteTime=2019-08-20 20:02:50.03392427 +0000 UTC m=+4.319815647 lastScaleDownFailTime=2019-08-20 20:02:50.0339244 +0000 UTC m=+4.319815777 scaleDownForbidden=false isDeleteInProgress=false
I0820 20:07:52.892635       1 static_autoscaler.go:187] Starting main loop
I0820 20:07:52.893537       1 auto_scaling_groups.go:320] Regenerating instance to ASG map for ASGs: [kube-test-dev-iad-nodes-mixed kube-test-dev-iad-nodes-mixed-1804]
I0820 20:07:53.064726       1 auto_scaling_groups.go:102] Updating ASG kube-test-dev-iad-nodes-mixed
I0820 20:07:53.064892       1 aws_manager.go:255] Refreshed ASG list, next refresh after 2019-08-20 20:08:53.064886765 +0000 UTC m=+367.350778202
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 42 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc00100af20, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc00100af20, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0007d8c80, 0xc00167d400, 0x11, 0x20, 0xc000572d50, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0xc000f5a2a0, 0x3)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0007d8c80, 0xbf4f32a635342384, 0x478543f305, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc00012e140)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0001ab100)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec

@SleepyBrett
Copy link

SleepyBrett commented Aug 21, 2019

I see the exact same sigsegv on a 1.15.3 cluster running 1.15.1 ca

I0821 18:11:35.888036       1 node_instances_cache.go:155] Start refreshing cloud provider node instances cache
I0821 18:11:35.888097       1 node_instances_cache.go:167] Refresh cloud provider node instances cache finished, refresh took 40.059µs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x2223f75]

goroutine 2518 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc0002ee580, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x9a5
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc0002ee580, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x227
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc0045901e0, 0xc00bd9ee00, 0x2a, 0x40, 0xc00c76a330, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0xc00d14a900, 0x12)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc0045901e0, 0xbf4f8036139c1a66, 0x1f479a2e920, 0x4cb5be0, 0x0, 0x0)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x5e6
main.run(0xc000176140)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:331 +0x296
main.main.func2(0x2ff6580, 0xc0003e0240)
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:403 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
        /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec

@pdbogen
Copy link

pdbogen commented Aug 21, 2019

I believe this is fixed by #2096.

@lossanarch
Copy link

lossanarch commented Aug 29, 2019

@pdbogen While you're probably right, #2096 was merged to master back in mid June and it hasn't been merged into a cluster-autoscaler release branch yet.
Is it possible to get it merged?
Or perhaps i should be tagging @bskiba apologies

@losipiuk losipiuk closed this as completed Sep 2, 2019
@losipiuk
Copy link
Contributor

losipiuk commented Sep 2, 2019

Fixed with #2096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants