🐛 Change the number of expected etcd members #2696

chuckha · 2020-03-17T18:33:41Z

Signed-off-by: Chuck Ha [email protected]

What this PR does / why we need it:
This PR allows the control plane to upgrade through an expected state where there are 4 control planes and 3 etcd members. This happens when the control plane controller dies after removing a member but not marking it as having been removed. It's one step closer to removing the annotations that manage state.

Some slightly larger changes due to fixing a stale fake etcd client that went unused in tests.

Adds a test for the particular case we are trying to fix.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2651

/cc @sethp-nr @rudoi
/assign @vincepri

k8s-ci-robot · 2020-03-17T18:33:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chuckha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [chuckha]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

controlplane/kubeadm/internal/workload_cluster.go

sethp-nr · 2020-03-17T18:35:54Z

I'm checking this out locally to test now.

vincepri · 2020-03-17T18:58:41Z

/milestone v0.3.2

sethp-nr · 2020-03-17T19:46:21Z

I commented out the "etcd member removed" annotation apply step to simulate the failure we experienced, and wanted to see it loop forever on re-removing the same etcd member. What I saw instead was a different health check failure:

E0317 19:40:17.837043 39 kubeadm_control_plane_controller.go:537] controllers/KubeadmControlPlane "msg"="waiting for control plane to pass etcd health check before removing a control plane machine" "error"="node "ip-10-0-122-121.ec2.internal": failed to create etcd client: unable to create etcd client: context deadline exceeded" "cluster"="test" "kubeadmControlPlane"="test-controlplane" "namespace"="test"

That node happens to be the one that got member remove'd:

etcd-ip-10-0-122-121.ec2.internal                      0/1     CrashLoopBackOff   6          16m
etcd-ip-10-0-23-80.ec2.internal                        1/1     Running            0          6m29s
etcd-ip-10-0-36-60.ec2.internal                        1/1     Running            0          11m
etcd-ip-10-0-8-103.ec2.internal                        1/1     Running            0          19m

The good news is that it's no longer complaining about there being a mismatch in the number, though!

How should we have the for _, node := range controlPlaneNodes.Items { loop skip the remove'd member in this case?

sethp-nr · 2020-03-17T19:56:35Z

I'm tempted to suggest we check if the pod is ready in that loop and skip over it if not. It doesn't look like we're really waiting for etcd to be healthy again before deleting the control plane machine:

I0317 19:37:52.065804 39 kubeadm_control_plane_controller.go:253] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="test" "kubeadmControlPlane"="test-controlplane" "namespace"="test"
E0317 19:37:57.127608 39 kubeadm_control_plane_controller.go:537] controllers/KubeadmControlPlane "msg"="waiting for control plane to pass etcd health check before removing a control plane machine" "error"="control plane machine test/test-controlplane-pq76m has no status.nodeRef" "cluster"="test" "kubeadmControlPlane"="test-controlplane" "namespace"="test"
I0317 19:38:07.514075 39 kubeadm_control_plane_controller.go:253] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="test" "kubeadmControlPlane"="test-controlplane" "namespace"="test"
E0317 19:38:18.326999 39 kubeadm_control_plane_controller.go:552] controllers/KubeadmControlPlane "msg"="waiting for control plane to pass control plane health check before removing a control plane machine" "error"="node "ip-10-0-23-80.ec2.internal": pods "kube-apiserver-ip-10-0-23-80.ec2.internal" not found" "cluster"="test" "kubeadmControlPlane"="test-controlplane" "namespace"="test"

It looks like a log line "Upgrading control plane" followed by "waiting for control plane to pass control plane health check before removing a control plane machine" is how the KCP reconciler spells "successfully removed an etcd member.

Also, looking more carefully now, I would expect this health check change to also remove the need for the annotation at all – it looks like we're counting on early returns from healthcheck failures to sequence the upgrade steps.

chuckha · 2020-03-17T19:57:39Z

I'm tempted to suggest we check if the pod is ready in that loop and skip over it if not. It doesn't look like we're really waiting for etcd to be healthy again before deleting the control plane machine:

This was also my first instinct. just need to read a little more code to convince myself

sethp-nr · 2020-03-17T20:03:23Z

It makes me a little nervous in general, because I'm worried that means we'll give a pass to an etcd ring that has e.g. a single member who just got oomkilled. But hopefully we'll catch that by comparing against the knownMemberIDSet

vincepri · 2020-03-17T20:41:51Z

Reading through the code a little more, a few observations:

The Workload.EtcdIsHealthy method should probably only return a HealthCheckResult without making assumption about when a node is healthy or not, this method seems more geared towards "check the health of each etcd member and report everything back".
Callers of this method, can use the result to make their own decisions as if to proceed or not. For example, in the case of scale down, we can ignore the member currently in the process of being deleted as it was suggested above.
The HealthCheckResult type can offer helper methods with pre-defined checks that callers can decide upon.
Optional: Might be worth adding some poller when getting the etcd client, to give some headroom for etcd to recover on its own before we declare it unhealthy

wdyt?

chuckha · 2020-03-17T20:46:56Z

Reading through the code a little more, a few observations:

The Workload.EtcdIsHealthy method should probably only return a HealthCheckResult without making assumption about when a node is healthy or not, this method seems more geared towards "check the health of each etcd member and report everything back".

Callers of this method, can use the result to make their own decisions as if to proceed or not. For example, in the case of scale down, we can ignore the member currently in the process of being deleted as it was suggested above.

The HealthCheckResult type can offer helper methods with pre-defined checks that callers can decide upon.

wdyt?

This requires more restructuring elsewhere. I think this could be a nice improvement but it's something a little more invasive than what is going on here. Want to open an issue with your thoughts on the matter?

vincepri · 2020-03-17T20:49:17Z

Sure yeah, let's wrap this up for now. Is there any other change that needs to happen in this PR?

chuckha · 2020-03-17T20:49:42Z

Sure yeah, let's wrap this up for now. Is there any other change that needs to happen in this PR?

Yep. I'm adding a test for this and have some changes locally that need to be made.

chuckha · 2020-03-17T21:21:33Z

controlplane/kubeadm/internal/etcd/fake/client.go

@@ -18,173 +18,41 @@ package fake



This file was out of date since it was unused entirely. I updated it so I could fake etcd responses to simulate the situations necessary for the test.

chuckha · 2020-03-17T21:26:31Z

/test pull-cluster-api-make

o.o....compiles on my machine.....?

chuckha · 2020-03-17T21:29:45Z

controlplane/kubeadm/internal/workload_cluster_test.go

+				},
+			},
+		},
+		etcdClientGenerator: &fakeEtcdClientGenerator{


this isn't what i want it to look like, but i'm working with the existing structs in order to finish this up

chuckha · 2020-03-17T21:31:51Z

oh. issue with rebase on master, a bunch of stuff changed 🙃

Signed-off-by: Chuck Ha <[email protected]>

chuckha · 2020-03-17T21:33:59Z

/retest

sethp-nr · 2020-03-17T21:37:01Z

/retest

o7

sethp-nr · 2020-03-18T00:53:30Z

sethp-nr test bot reports that with these two additional PRs:

I was able to successfully upgrade a KCP even without letting the reconciler add the "etcd member removed" annotation!

/lgtm

k8s-ci-robot assigned vincepri Mar 17, 2020

k8s-ci-robot requested review from rudoi and sethp-nr March 17, 2020 18:33

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 17, 2020

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 17, 2020

chuckha commented Mar 17, 2020

View reviewed changes

controlplane/kubeadm/internal/workload_cluster.go Outdated Show resolved Hide resolved

chuckha force-pushed the etcd-health-check branch 2 times, most recently from 2e5f9f6 to fa4574d Compare March 17, 2020 18:40

k8s-ci-robot added this to the v0.3.2 milestone Mar 17, 2020

chuckha force-pushed the etcd-health-check branch from fa4574d to 2772355 Compare March 17, 2020 21:19

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 17, 2020

chuckha commented Mar 17, 2020

View reviewed changes

Change the number of expected etcd members

85efa5c

Signed-off-by: Chuck Ha <[email protected]>

chuckha force-pushed the etcd-health-check branch from 2772355 to 85efa5c Compare March 17, 2020 21:32

k8s-ci-robot assigned sethp-nr Mar 18, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2020

k8s-ci-robot merged commit 7157b85 into kubernetes-sigs:master Mar 18, 2020

chuckha deleted the etcd-health-check branch March 18, 2020 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Change the number of expected etcd members #2696

🐛 Change the number of expected etcd members #2696

chuckha commented Mar 17, 2020 •

edited

Loading

k8s-ci-robot commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

vincepri commented Mar 17, 2020

sethp-nr commented Mar 17, 2020 •

edited

Loading

sethp-nr commented Mar 17, 2020

chuckha commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

vincepri commented Mar 17, 2020 •

edited

Loading

chuckha commented Mar 17, 2020

vincepri commented Mar 17, 2020

chuckha commented Mar 17, 2020

chuckha Mar 17, 2020

chuckha commented Mar 17, 2020

chuckha Mar 17, 2020

chuckha commented Mar 17, 2020

chuckha commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

sethp-nr commented Mar 18, 2020

🐛 Change the number of expected etcd members #2696

🐛 Change the number of expected etcd members #2696

Conversation

chuckha commented Mar 17, 2020 • edited Loading

k8s-ci-robot commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

vincepri commented Mar 17, 2020

sethp-nr commented Mar 17, 2020 • edited Loading

sethp-nr commented Mar 17, 2020

chuckha commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

vincepri commented Mar 17, 2020 • edited Loading

chuckha commented Mar 17, 2020

vincepri commented Mar 17, 2020

chuckha commented Mar 17, 2020

chuckha Mar 17, 2020

Choose a reason for hiding this comment

chuckha commented Mar 17, 2020

chuckha Mar 17, 2020

Choose a reason for hiding this comment

chuckha commented Mar 17, 2020

chuckha commented Mar 17, 2020

sethp-nr commented Mar 17, 2020

sethp-nr commented Mar 18, 2020

chuckha commented Mar 17, 2020 •

edited

Loading

sethp-nr commented Mar 17, 2020 •

edited

Loading

vincepri commented Mar 17, 2020 •

edited

Loading