✨Forward etcd leadership from machine that is being deleted #2525

alexander-demicev · 2020-03-04T15:07:24Z

During the scaledown process, we need to be sure that the control plane machine that is about to be deleted is not etcd leader. This PR always moves the leadership to the first follower. It also introduces a few minor changes to etcd client, it missed the ability to get the leader ID

Closes #2398

k8s-ci-robot · 2020-03-04T15:07:33Z

Hi @alexander-demichev. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chuckha

few points to address, but i like this approach!

controlplane/kubeadm/internal/etcd/etcd.go

controlplane/kubeadm/internal/workload_cluster.go

controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go

chuckha · 2020-03-04T15:41:00Z

/ok-to-test

chuckha · 2020-03-04T15:41:33Z

/assign

chuckha · 2020-03-04T16:05:27Z

/milestone v0.3.0

michaelgugino · 2020-03-05T15:23:49Z

controlplane/kubeadm/controllers/kubeadm_control_plane_controller.go

@@ -755,6 +761,12 @@ func (r *KubeadmControlPlaneReconciler) reconcileDelete(ctx context.Context, clu
 	for i := range machinesToDelete {
 		m := machinesToDelete[i]
 		logger := logger.WithValues("machine", m)
+		// If etcd leadership is on machine that is about to be deleted, move it to first follower
+		if err := workloadCluster.ForwardEtcdLeadership(ctx, m); err != nil {


It's not clear to me the advantages of doing this.

We appear to be removing a cluster with this operation. All the other machines are gone. We get to this section, we're deleting the remaining control plane machines. Since we cordon and drain before a machine is deleted, it appears that we'll just assign etcd into a circle: 1->2->3->1 by the time this for loop completes. It's also not clear why we need to do this at all, since the cluster is going away.

Nice catch! The original intention was to do this for control plane scale down operations rather than control plane deletion.

@detiber that makes a lot more sense. I thought it was something like that, but the linked issue was not very specific.

vincepri · 2020-03-09T20:36:15Z

Moving this out of v0.3.0 given that it looks like an improvement that can go in later

/milestone v0.3.x

vincepri · 2020-03-11T23:50:11Z

@alexander-demichev are you still interested in working on this change?

vincepri · 2020-03-11T23:54:44Z

per @michaelgugino comment, we need to make sure to focus the change only to target upgrades, rather than doing this when we're deleting the whole cluster. Ideally, it'd be great to move etcd leadership to the first newly-created machine during an upgrade process, this will probably ensure that the leader is stable during the upgrade process and minimize possible disruptions.

alexander-demicev · 2020-03-12T08:15:02Z

@vincepri Yes, I'm still interested in this :)

chuckha

Please keep private identifiers private unless they need to become public

alexander-demicev · 2020-03-13T14:06:00Z

@chuckha fixed

chuckha · 2020-03-13T14:26:20Z

I think some basic unit tests in the workload_cluster_test.go file would be really good to add. We won't need to add anything to the reconciler as that logic doesn't need testing there. It might be nice to add an e2e test for scale down, but that, i think, is outside the scope of this PR and can be done in parallel to this work.

/approve
/assign @vincepri

k8s-ci-robot · 2020-03-13T14:26:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexander-demichev, chuckha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [chuckha]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vincepri · 2020-03-16T18:36:16Z

controlplane/kubeadm/internal/workload_cluster.go

+		if member.ID != currentMember.ID {
+			err := etcdClient.MoveLeader(ctx, member.ID)


This should make sure that the new member isn't a machine that's going to be deleted, a (potential) simple solution would be to always pick the last created machine/etcd member

@alexander-demichev do you have time to tackle the above? or we can do it in a follow-up PR

If you are fine with this PR then feel free to merge. A follow-up sounds good, if that's not urgent I can try to make something during the week.

vincepri

/lgtm

vincepri · 2020-03-17T16:37:05Z

/milestone v0.3.2

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 4, 2020

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 4, 2020

k8s-ci-robot requested review from chuckha and vincepri March 4, 2020 15:08

chuckha reviewed Mar 4, 2020

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 4, 2020

k8s-ci-robot assigned chuckha Mar 4, 2020

k8s-ci-robot added this to the v0.3.0 milestone Mar 4, 2020

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 4, 2020

alexander-demicev changed the title ~~✨[WIP] Forward etcd leadship from machine that is being deleted~~ ✨Forward etcd leadship from machine that is being deleted Mar 5, 2020

alexander-demicev changed the title ~~✨Forward etcd leadship from machine that is being deleted~~ ✨Forward etcd leadership from machine that is being deleted Mar 5, 2020

michaelgugino suggested changes Mar 5, 2020

View reviewed changes

chuckha mentioned this pull request Mar 5, 2020

🏃 continue improving management/workload abstraction #2547

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2020

k8s-ci-robot modified the milestones: v0.3.0, v0.3.x Mar 9, 2020

sethp-nr mentioned this pull request Mar 12, 2020

KCP: upgrades are disruptive to api server clients #2652

Closed

k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 13, 2020

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 13, 2020

chuckha reviewed Mar 13, 2020

View reviewed changes

Forward etcd leadship from machine that is being deleted

64d5067

k8s-ci-robot assigned vincepri Mar 13, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2020

vincepri reviewed Mar 16, 2020

View reviewed changes

vincepri approved these changes Mar 17, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2020

k8s-ci-robot modified the milestones: v0.3.x, v0.3.2 Mar 17, 2020

k8s-ci-robot merged commit 4d87796 into kubernetes-sigs:master Mar 17, 2020

vincepri mentioned this pull request Mar 17, 2020

✨ KCP: Move etcd leadership to newest Machine when upgrading #2695

Merged

alexander-demicev deleted the etcd branch March 8, 2021 17:20

alexander-demicev mentioned this pull request Mar 8, 2021

REQUEST: New membership for alexander-demichev kubernetes/org#2547

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨Forward etcd leadership from machine that is being deleted #2525

✨Forward etcd leadership from machine that is being deleted #2525

alexander-demicev commented Mar 4, 2020 •

edited

Loading

k8s-ci-robot commented Mar 4, 2020

chuckha left a comment

chuckha commented Mar 4, 2020

chuckha commented Mar 4, 2020

chuckha commented Mar 4, 2020

michaelgugino Mar 5, 2020

detiber Mar 5, 2020

michaelgugino Mar 5, 2020

alexander-demicev Mar 13, 2020

vincepri commented Mar 9, 2020

vincepri commented Mar 11, 2020

vincepri commented Mar 11, 2020

alexander-demicev commented Mar 12, 2020

chuckha left a comment

alexander-demicev commented Mar 13, 2020

chuckha commented Mar 13, 2020

k8s-ci-robot commented Mar 13, 2020

vincepri Mar 16, 2020

vincepri Mar 17, 2020

alexander-demicev Mar 17, 2020

vincepri left a comment

vincepri commented Mar 17, 2020

		if member.ID != currentMember.ID {
		err := etcdClient.MoveLeader(ctx, member.ID)

✨Forward etcd leadership from machine that is being deleted #2525

✨Forward etcd leadership from machine that is being deleted #2525

Conversation

alexander-demicev commented Mar 4, 2020 • edited Loading

k8s-ci-robot commented Mar 4, 2020

chuckha left a comment

Choose a reason for hiding this comment

chuckha commented Mar 4, 2020

chuckha commented Mar 4, 2020

chuckha commented Mar 4, 2020

michaelgugino Mar 5, 2020

Choose a reason for hiding this comment

detiber Mar 5, 2020

Choose a reason for hiding this comment

michaelgugino Mar 5, 2020

Choose a reason for hiding this comment

alexander-demicev Mar 13, 2020

Choose a reason for hiding this comment

vincepri commented Mar 9, 2020

vincepri commented Mar 11, 2020

vincepri commented Mar 11, 2020

alexander-demicev commented Mar 12, 2020

chuckha left a comment

Choose a reason for hiding this comment

alexander-demicev commented Mar 13, 2020

chuckha commented Mar 13, 2020

k8s-ci-robot commented Mar 13, 2020

vincepri Mar 16, 2020

Choose a reason for hiding this comment

vincepri Mar 17, 2020

Choose a reason for hiding this comment

alexander-demicev Mar 17, 2020

Choose a reason for hiding this comment

vincepri left a comment

Choose a reason for hiding this comment

vincepri commented Mar 17, 2020

alexander-demicev commented Mar 4, 2020 •

edited

Loading