Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential SEGV in updateReadinessStats #2096

Merged

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Jun 5, 2019

Calling cloudprovider.NodeGroupForNode(unregistered.Node) can result
in a nil result for the nodegroup - handle that case.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 5, 2019
@k8s-ci-robot k8s-ci-robot requested review from feiskyer and losipiuk June 5, 2019 17:45
@frobware frobware changed the title Fix SEGV in updateReadinessStats Fix potential SEGV in updateReadinessStats Jun 5, 2019
@@ -573,6 +573,10 @@ func (csr *ClusterStateRegistry) updateReadinessStats(currentTime time.Time) {
klog.Warningf("Failed to get nodegroup for %s: %v", unregistered.Node.Name, errNg)
continue
}
if nodeGroup == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or may be extend the check at L572, if errNg != nil || nodeGroup == nil { ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is simpler as two cases; if the check is extended you need an additional check when logging for the case when ngErr is nil. And semantically they are different. One path is an error, the second is the node simply doesn't exist.

@ingvagabund
Copy link
Contributor

I can confirm this PR fixes the following panic case:

I0607 17:21:00.438397       1 utils.go:576] Skipping 76db30c3-c0c0-477e-8cfb-a3870653c8cf - node group min size reached
I0607 17:21:00.438424       1 utils.go:576] Skipping d93c74b8-d6c9-4ef3-9424-608f3134e403 - node group min size reached
I0607 17:21:00.438443       1 utils.go:576] Skipping 5ed4d1e4-360e-4769-a016-4fa01de2eb0d - node group min size reached
I0607 17:21:00.438462       1 utils.go:576] Skipping 98e29900-4dc6-47a1-8819-9e58eed1c5c8 - node group min size reached
I0607 17:21:00.438605       1 scale_down.go:771] No candidates for scale down
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1fbb227]

goroutine 46 [running]:
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).updateReadinessStats(0xc420b0e000, 0xbf36c5219b97a036, 0x2e404783c9, 0x3f94f20)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:576 +0x7d7
k8s.io/autoscaler/cluster-autoscaler/clusterstate.(*ClusterStateRegistry).UpdateNodes(0xc420b0e000, 0xc420494e00, 0x5, 0x8, 0xc4207fc630, 0xbf36c5219b97a036, 0x2e404783c9, 0x3f94f20, 0x0, 0x0)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/clusterstate/clusterstate.go:310 +0x23f
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).updateClusterState(0xc420168b40, 0xc420494e00, 0x5, 0x8, 0xc4207fc630, 0xbf36c5219b97a036, 0x2e404783c9, 0x3f94f20, 0xc420796230, 0x1)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:568 +0x94
k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc420168b40, 0xbf36c5219b97a036, 0x2e404783c9, 0x3f94f20, 0x0, 0x0)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:217 +0x613
main.run(0xc420450050)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:332 +0x25f
main.main.func2(0x2b6f180, 0xc4202e73c0)
    /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:404 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
    /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:182 +0xee

@@ -573,6 +573,10 @@ func (csr *ClusterStateRegistry) updateReadinessStats(currentTime time.Time) {
klog.Warningf("Failed to get nodegroup for %s: %v", unregistered.Node.Name, errNg)
continue
}
if nodeGroup == nil {
klog.Warningf("nodegroup is nil for %s", unregistered.Node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it uppercase :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it uppercase :)

Done in 91016a6

Calling cloudprovider.NodeGroupForNode(unregistered.Node) can result
in a nil result for the nodegroup - handle that case.
@frobware frobware force-pushed the fix-segv-in-updateReadinessStats branch from 6b45a62 to 91016a6 Compare June 11, 2019 09:42
@losipiuk
Copy link
Contributor

/lgtm
/approve
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: losipiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit dd89fb1 into kubernetes:master Jun 11, 2019
@SleepyBrett
Copy link

can this get cherry picked back into 1.15.x?

@losipiuk
Copy link
Contributor

can this get cherry picked back into 1.15.x?

Sure. Can you prepare a cherry-pick?

@mazzy89
Copy link

mazzy89 commented Nov 27, 2019

Hi guys. I've seen that this has been cherry-picked but still waiting for a release. Do you have any ETA here when the 1.15.4 will land with this fix? Thank you

@losipiuk
Copy link
Contributor

Hi guys. I've seen that this has been cherry-picked but still waiting for a release. Do you have any ETA here when the 1.15.4 will land with this fix? Thank you

I will be cutting patch releases today

@mazzy89
Copy link

mazzy89 commented Nov 28, 2019

@losipiuk thanks a lot for the fast reaction here and the transparency

k8s-ci-robot added a commit that referenced this pull request Mar 27, 2020
Cherry-pick #2096: Fix potential SEGV in updateReadinessStat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants