-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏃 Wire up kubeadm control plane Ready status #2488
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chuckha The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
controlplane/kubeadm/controllers/kubeadm_control_plane_controller_test.go
Show resolved
Hide resolved
nit: there's a typo in the commit message should be control plane |
🦅 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed about the CNI piece: In my experience MachineDeployments also get stuck in weird ways if there's no CNI (so there's no chance of a Ready Node).
/lgtm
controlplane/kubeadm/controllers/kubeadm_control_plane_controller_test.go
Show resolved
Hide resolved
} | ||
} | ||
r.Log.Info("unable to get workload cluster", "err", fmt.Sprintf("%s", err)) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a capierrors.RequeueAfterError
, do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that this change set would modify behavior in a subtle way. I think this should handle the error cases exactly as they were being handled before and so not change any behavior of this function.
Do you think a capierrors.RequeueAfterError
is more appropriate than returning the errors.Wrap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my worry with not using RequeueAfterError here is that we could hit the 10 retries during the normal error path prior to the remote APIServer being available, which would lead to needing to wait for the resync interval to continue updating the status (or really any other operations after the creation of the initial control plane instance), while waiting for that initial machine to finish booting/initializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the defer is structured there is no way to send up a requeue after error from updateStatus. If you think this should definitely use requeueAfter then we'll need a new issue / PR to refactor how the defer works
yeah, machine deployments require CNI to get nodes into a ready state for scaling to work appropriately |
/hold cancel This is good for review |
controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go
Outdated
Show resolved
Hide resolved
7aef971
to
4f0d8d7
Compare
Eventually(func() (bool, error) { | ||
machineList := &clusterv1.MachineList{} | ||
if err := input.Lister.List(ctx, machineList, inClustersNamespaceListOption, matchClusterListOption); err != nil { | ||
fmt.Println(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, it's not leftover, this is the pattern used throughout. These fmt.Printlns should be replaced with a writer to ginkgo's log. I'm going to keep this and file an issue for improvement.
/lgtm just a lovely piece of code |
Signed-off-by: Chuck Ha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Signed-off-by: Chuck Ha [email protected]
What this PR does / why we need it:
This PR aligns the KCP
Initialized
field with the comment on the field in the API type as well as sets theReady
status when it is appropriate to set.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2480
I'm holding this because I might have to make changes to the capd e2es now that we're actually setting Ready based on NodeReady which requires CNI to exist on the cluster. CAPD may now how to install a CNI to pass which it really should do anyway in order to scale machine deployments.In general, my experience with the tests for the reconciler are that we are testing many things throughout the whole Reconcile function. This is good in some ways and it's bad in others. There must be some way to make these tests more readable/understandable because it takes me forever to understand what it's doing and why an assertion is being made.