-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix: KubeadmControlPlane controller crash if infra template is not found #2755
🐛 fix: KubeadmControlPlane controller crash if infra template is not found #2755
Conversation
Signed-off-by: Fan Shang Xiang <[email protected]>
Welcome @MartinForReal! |
Hi @MartinForReal. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MartinForReal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -150,6 +154,11 @@ func (r *KubeadmControlPlaneReconciler) Reconcile(req ctrl.Request) (res ctrl.Re | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
// Make sure to reconcile the external infrastructure reference. | |||
if err := r.reconcileExternalReference(ctx, cluster, kcp.Spec.InfrastructureTemplate); err != nil { | |||
return ctrl.Result{RequeueAfter: DependentInfraTemplateRequeueAfter}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the error is not nil, the reconciler will go in exponential backoff. What's the reasoning around moving the call outside of the reconcile
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we will get following errors:
- InfrastructureTemplate is not found, will be created in the future.
- failed to set ownerReference to InfrastructureTemplate
- connection issue
under such situations, we should retry reconciling InfrastructureTemplate.
cluster-api/controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go
Lines 98 to 106 in 4fc4bee
For(&controlplanev1.KubeadmControlPlane{}). | |
Owns(&clusterv1.Machine{}). | |
Watches( | |
&source.Kind{Type: &clusterv1.Cluster{}}, | |
&handler.EnqueueRequestsFromMapFunc{ToRequests: handler.ToRequestsFunc(r.ClusterToKubeadmControlPlane)}, | |
). | |
WithOptions(options). | |
Build(r) | |
From the code above we can see that InfrastructureTemplate is not watched. And we will never know template is created
I think it is a good idea to make sure InfrastructureTemplate exists before reconcile is called
because we can make sure that the dependency of machine object is ready as long as reconcile is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be ok, the err
is returned and the reconciler will go in exponential backoff. From the trace posted in your issue, it seems like this should be fixed in #2754
/lifecycle active |
/assign @neolit123 |
/assign vincepri |
@@ -119,7 +119,6 @@ func Convert_v1alpha3_KubeadmConfigStatus_To_v1alpha2_KubeadmConfigStatus(in *ku | |||
return nil | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whitespace change is not needed. was it generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was removed after I executed gofmt -w .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I believe it was not generated by codegen because generated code starts with "// +build !ignore_autogenerated"
@MartinForReal are you able to test with the fix that went in #2757 ? |
Yes, it is fixed. Thanks~ |
What this PR does / why we need it:
KubeadmControlPlane controller should not crash if infra template is not found
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2751