-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱Add Machine and KCP conditions to KCP controller #3674
Conversation
Skipping CI for Draft Pull Request. |
/assign Going to take a deeper look later today, if you're ready for a full review @sedefsavas |
ba2d784
to
701fdae
Compare
As per discussion, implementing a global HealthTrackerInstance introduces a critical component, mostly due to shared memory management. |
11c613b
to
5d07335
Compare
5d07335
to
e7226fb
Compare
Fixed the comments, PTAL. |
Rebased to release-03. We will forward-port it to the master. |
/test pull-cluster-api-e2e-release-0-3 |
1 similar comment
/test pull-cluster-api-e2e-release-0-3 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR is ready PTAL. |
// Check etcd cluster alarms | ||
etcdClient, err := w.etcdClientGenerator.forNodes(ctx, controlPlaneNodes.Items) | ||
if err != nil { | ||
conditions.MarkFalse(controlPlane.KCP, controlplanev1.EtcdClusterHealthy, controlplanev1.EtcdClusterUnhealthyReason, clusterv1.ConditionSeverityWarning, "failed to get etcd client.") | ||
return response, err | ||
} | ||
|
||
defer etcdClient.Close() | ||
alarmList, err := etcdClient.Alarms(ctx) | ||
if len(alarmList) > 0 || err != nil { | ||
conditions.MarkFalse(controlPlane.KCP, controlplanev1.EtcdClusterHealthy, controlplanev1.EtcdClusterUnhealthyReason, clusterv1.ConditionSeverityWarning, "etcd cluster has alarms.") | ||
return response, errors.Errorf("etcd cluster has %d alarms", len(alarmList)) | ||
} | ||
|
||
members, err := etcdClient.Members(ctx) | ||
if err != nil { | ||
conditions.MarkFalse(controlPlane.KCP, controlplanev1.EtcdClusterHealthy, controlplanev1.EtcdClusterUnhealthyReason, clusterv1.ConditionSeverityWarning, "failed to get etcd members.") | ||
return response, err | ||
} | ||
|
||
healthyMembers := 0 | ||
for _, m := range members { | ||
if val, ok := response[m.Name]; ok { | ||
if val == nil { | ||
healthyMembers++ | ||
} | ||
} else { | ||
// There are members in etcd cluster that is not part of controlplane nodes. | ||
conditions.MarkFalse(controlPlane.KCP, controlplanev1.EtcdClusterHealthy, controlplanev1.EtcdClusterUnhealthyReason, clusterv1.ConditionSeverityWarning, "unknown etcd member that is not part of control plane nodes.") | ||
return response, err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that should go in a different PR?
@@ -372,13 +377,42 @@ func (r *KubeadmControlPlaneReconciler) reconcile(ctx context.Context, cluster * | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
func (r *KubeadmControlPlaneReconciler) createControlPlane(ctx context.Context, cluster *clusterv1.Cluster, kcp *controlplanev1.KubeadmControlPlane) (*internal.ControlPlane, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a little misleading, createControlPlane
made me think that we were creating (read initializing) a new control plane, rather than the struct.
If the internal control plane struct requires more information, can we enrich internal.NewControlPlane
to do it for us instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting the owned machines and then create ControlPlane, added because we will do the same thing in both reconcile and reconcileDelete. Although ControlPlane struct is enough as it is, ownedMachines are also calculated here. I think this is a good abstraction, but maybe bad naming?
@sedefsavas Let's try to find a way to split this PR in multiple ones, it seems there might be lots of unrelated or semi-related changes that could be going separately |
@vincepri There were some additions to the KCP logic to surface the new conditions, I will create a PR that adds only those changes. |
/test pull-cluster-api-test-release-0-3 |
@sedefsavas: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Divided this PR into 3 PRs.
Will open the Conditions PR after these are merged. |
What this PR does / why we need it:
This PR adds Pod-related conditions to control-plane machines by KubeadmControlPlane controller.
Also, adds EtcdClusterHealthy condition to KubeadmControlPlane object, which shows etcd cluster health in general.
This PR is WIP, after agreeing on the conditions, tests will be added.
Some details about the conditions are here (some of them have changed): https://docs.google.com/document/d/1pEMf8AuPgUF_ClTM4uCEiDkb-iyUgwcxinwArtpa45c/edit
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3138 with #3670