🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167

sbueringer · 2021-08-27T07:06:30Z

Signed-off-by: Stefan Büringer [email protected]

What this PR does / why we need it:
Since yesterday (https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main) our e2e tests panic. This only occurs when MachinePools are used with CAPD (so it doesn't affect quickstart).

I have no idea why it started to occur only yesterday.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

sbueringer · 2021-08-27T07:07:14Z

/test pull-cluster-api-e2e-workload-upgrade-1-22-latest-main
/test pull-cluster-api-e2e-full-main

Signed-off-by: Stefan Büringer [email protected]

sbueringer · 2021-08-27T07:11:50Z

test/infrastructure/docker/exp/controllers/dockermachinepool_controller.go

@@ -55,7 +55,7 @@ type DockerMachinePoolReconciler struct {
 // +kubebuilder:rbac:groups="",resources=secrets;,verbs=get;list;watch

 func (r *DockerMachinePoolReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl.Result, rerr error) {
-	log := ctrl.LoggerFrom(ctx, "docker-machine-pool", req.NamespacedName)
+	log := ctrl.LoggerFrom(ctx)


I dropped it to make it consistent with the DockerMachine controller. We add the MachinePool name a few lines below.

Why was this throwing a panic?

klog only accepts strings:

panic: key is not a string: {"Namespace":"machine-pool-fzbffp","Name":"machine-pool-omo8mt-dmp-0"} goroutine 420 [running]: k8s.io/klog/v2/klogr.flatten(0xc0039f0640, 0xa, 0xa, 0xc0037a86f0, 0x2) /go/pkg/mod/k8s.io/klog/[email protected]/klogr/klogr.go:158 +0x62e k8s.io/klog/v2/klogr.klogger.Info(0x0, 0x0, 0xc0002f4840, 0x37, 0xc00349aea0, 0xb, 0x12, 0x1a2a357, 0x9, 0x1a8728d, ...) /go/pkg/mod/k8s.io/klog/[email protected]/klogr/klogr.go:200 +0x5c8 sigs.k8s.io/cluster-api/test/infrastructure/docker/exp/controllers.(*DockerMachinePoolReconciler) /workspace/test/infrastructure/docker/exp/controllers/dockermachinepool_controller.go:75 +0xc25

https://github.com/kubernetes/klog/blob/v2.9.0/klogr/klogr.go#L158

The panic was not thrown here, but below in l.75 when we use the logger

Not sure why klog thinks it's a key.

o.O There's an uneven k/v count already before. Will try to figure out what's going on.

Found at least one other bug, maybe two..

sbueringer · 2021-08-27T07:12:30Z

/test pull-cluster-api-e2e-workload-upgrade-1-22-latest-main
/test pull-cluster-api-e2e-full-main

sbueringer · 2021-08-27T07:51:07Z

/retest

fabriziopandini

/lgtm
/approve

k8s-ci-robot · 2021-08-27T09:22:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2021-08-27T09:39:22Z

I took another look at why it started to fail. The corresponding code in CAPI has been there since almost a year, in klog for over 3 years.

We only log under certain circumstances in the MachinePool controller (e.g. when the OwnerRef is not yet set on the MachinePool). I think we either changed code somewhere else so it might take a bit longer to set the ownerref, or the controllers are running slower (e.g. because oversubscription in the Prow cluster).

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 27, 2021

k8s-ci-robot requested review from enxebre and JoelSpeed August 27, 2021 07:06

sbueringer changed the title ~~CAPD: fix panic in DockerMachinePool reconciliation~~ 🐛 CAPD: fix panic in DockerMachinePool reconciliation Aug 27, 2021

sbueringer mentioned this pull request Aug 27, 2021

🌱 KubeadmControlPlane internal/proxy should use pointer structs #5161

Merged

CAPD: fix panic in DockerMachinePool reconcilation

fcf96a5

Signed-off-by: Stefan Büringer [email protected]

sbueringer force-pushed the pr-fix-capd-panic branch from 2b7402a to fcf96a5 Compare August 27, 2021 07:11

sbueringer commented Aug 27, 2021

View reviewed changes

fabriziopandini reviewed Aug 27, 2021

View reviewed changes

k8s-ci-robot assigned fabriziopandini Aug 27, 2021

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 27, 2021

k8s-ci-robot merged commit 4213fda into kubernetes-sigs:master Aug 27, 2021

k8s-ci-robot added this to the v0.4 milestone Aug 27, 2021

sbueringer deleted the pr-fix-capd-panic branch August 27, 2021 09:25

sbueringer mentioned this pull request Aug 27, 2021

🐛 fix reconciler kv in our reconciler loggers #5170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167

🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167

sbueringer commented Aug 27, 2021

sbueringer commented Aug 27, 2021

sbueringer Aug 27, 2021 •

edited

Loading

vincepri Aug 27, 2021

sbueringer Aug 27, 2021 •

edited

Loading

sbueringer Aug 27, 2021 •

edited

Loading

sbueringer Aug 27, 2021

sbueringer commented Aug 27, 2021

sbueringer commented Aug 27, 2021

fabriziopandini left a comment

k8s-ci-robot commented Aug 27, 2021

sbueringer commented Aug 27, 2021 •

edited

Loading

🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167

🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167

Conversation

sbueringer commented Aug 27, 2021

sbueringer commented Aug 27, 2021

sbueringer Aug 27, 2021 • edited Loading

Choose a reason for hiding this comment

vincepri Aug 27, 2021

Choose a reason for hiding this comment

sbueringer Aug 27, 2021 • edited Loading

Choose a reason for hiding this comment

sbueringer Aug 27, 2021 • edited Loading

Choose a reason for hiding this comment

sbueringer Aug 27, 2021

Choose a reason for hiding this comment

sbueringer commented Aug 27, 2021

sbueringer commented Aug 27, 2021

fabriziopandini left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 27, 2021

sbueringer commented Aug 27, 2021 • edited Loading

sbueringer Aug 27, 2021 •

edited

Loading

sbueringer Aug 27, 2021 •

edited

Loading

sbueringer Aug 27, 2021 •

edited

Loading

sbueringer commented Aug 27, 2021 •

edited

Loading