Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Refine v1beta2 KCP available condition #11451

Conversation

fabriziopandini
Copy link
Member

What this PR does / why we need it:
Improve KCP available condition so:

  • It uses etcd members list as a source of truth when computing etcd availability (this prevent issues when list of machines and list of members are temporarily not aligned while provisioning or deleting a CP machine)
  • It takes into account etcd members for computing quorum
  • Messages of partial unavailability takes properly account for unhealthy reported/not reported yet

Part of #11105

/area provider/control-plane-kubeadm

@k8s-ci-robot k8s-ci-robot added area/provider/control-plane-kubeadm Issues or PRs related to KCP cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 20, 2024
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 20, 2024
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if etcdIsManaged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 should we even report something then? should be something from the thing which manages it or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are reporting status for both etcd and control plane components in case etcd is managed
In case etcd is NOT managed we report only about control plane components

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make this more explicit, but the TLDR is that everything just works because etcdmember list is empty when etcd is not managed

controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
controlplane/kubeadm/internal/controllers/status.go Outdated Show resolved Hide resolved
ObjectMeta: metav1.ObjectMeta{Name: "m4"},
Status: clusterv1.MachineStatus{
NodeRef: nil,
V1Beta2: &clusterv1.MachineV1Beta2Status{Conditions: []metav1.Condition{apiServerPodHealthy, controllerManagerPodHealthy, schedulerPodHealthy, etcdPodHealthy, etcdMemberHealthy}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can cover this case if you want, but we will never set these conditions to true without a NodeRef. Usually these conditions will be unknown then (if I see correctly)

(at least a disclaimer here in the test that this will never happen would be good though to not mislead folks reading this test case)

Probably we should consider having a test case to cover the case without noderef and unknown conditions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a note with also the explanation why adding a use case with unknown conditions does not provide a good signal

Copy link
Member

@sbueringer sbueringer Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good for this test case.

Would it be good to have an additional test case with the realistic case of a Machine without nodeRef and unknown conditions? (because this is something that actually happens, so it would be good to cover and validate what the resulting condition will be)

controlplane/kubeadm/internal/controllers/status_test.go Outdated Show resolved Hide resolved
@fabriziopandini fabriziopandini force-pushed the refine-v1beta2-kcp-available-condition2 branch from 287ceb6 to 29afd3e Compare November 20, 2024 10:05
@fabriziopandini fabriziopandini force-pushed the refine-v1beta2-kcp-available-condition2 branch from 29afd3e to cf4145a Compare November 20, 2024 11:27
@chrischdi
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e6f7fc8852972725872d609e7a6e35d99c1cd24f

@sbueringer
Copy link
Member

Thx!

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2024
@k8s-ci-robot k8s-ci-robot merged commit 781d1e4 into kubernetes-sigs:main Nov 20, 2024
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Nov 20, 2024
@fabriziopandini fabriziopandini deleted the refine-v1beta2-kcp-available-condition2 branch December 2, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/control-plane-kubeadm Issues or PRs related to KCP cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants