-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167
🐛 CAPD: fix panic in DockerMachinePool reconciliation #5167
Conversation
/test pull-cluster-api-e2e-workload-upgrade-1-22-latest-main |
Signed-off-by: Stefan Büringer [email protected]
2b7402a
to
fcf96a5
Compare
@@ -55,7 +55,7 @@ type DockerMachinePoolReconciler struct { | |||
// +kubebuilder:rbac:groups="",resources=secrets;,verbs=get;list;watch | |||
|
|||
func (r *DockerMachinePoolReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl.Result, rerr error) { | |||
log := ctrl.LoggerFrom(ctx, "docker-machine-pool", req.NamespacedName) | |||
log := ctrl.LoggerFrom(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped it to make it consistent with the DockerMachine controller. We add the MachinePool name a few lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this throwing a panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog only accepts strings:
panic: key is not a string: {"Namespace":"machine-pool-fzbffp","Name":"machine-pool-omo8mt-dmp-0"}
goroutine 420 [running]:
k8s.io/klog/v2/klogr.flatten(0xc0039f0640, 0xa, 0xa, 0xc0037a86f0, 0x2)
/go/pkg/mod/k8s.io/klog/[email protected]/klogr/klogr.go:158 +0x62e
k8s.io/klog/v2/klogr.klogger.Info(0x0, 0x0, 0xc0002f4840, 0x37, 0xc00349aea0, 0xb, 0x12, 0x1a2a357, 0x9, 0x1a8728d, ...)
/go/pkg/mod/k8s.io/klog/[email protected]/klogr/klogr.go:200 +0x5c8
sigs.k8s.io/cluster-api/test/infrastructure/docker/exp/controllers.(*DockerMachinePoolReconciler)
/workspace/test/infrastructure/docker/exp/controllers/dockermachinepool_controller.go:75 +0xc25
https://github.com/kubernetes/klog/blob/v2.9.0/klogr/klogr.go#L158
The panic was not thrown here, but below in l.75 when we use the logger
Not sure why klog thinks it's a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.O There's an uneven k/v count already before. Will try to figure out what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found at least one other bug, maybe two..
/test pull-cluster-api-e2e-workload-upgrade-1-22-latest-main |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I took another look at why it started to fail. The corresponding code in CAPI has been there since almost a year, in klog for over 3 years. We only log under certain circumstances in the MachinePool controller (e.g. when the OwnerRef is not yet set on the MachinePool). I think we either changed code somewhere else so it might take a bit longer to set the ownerref, or the controllers are running slower (e.g. because oversubscription in the Prow cluster). |
Signed-off-by: Stefan Büringer [email protected]
What this PR does / why we need it:
Since yesterday (https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main) our e2e tests panic. This only occurs when MachinePools are used with CAPD (so it doesn't affect quickstart).
I have no idea why it started to occur only yesterday.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #