Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KCP does not reconcileEtcdMembers for deleted machines (release-0.3 branch) #3860

Closed
fabriziopandini opened this issue Oct 23, 2020 · 6 comments · Fixed by #3900
Closed

KCP does not reconcileEtcdMembers for deleted machines (release-0.3 branch) #3860

fabriziopandini opened this issue Oct 23, 2020 · 6 comments · Fixed by #3900
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@fabriziopandini
Copy link
Member

What steps did you take and what happened:

  • Created a 3 CP cluster using CAPI from release-0.3 branch
  • Deleted a machine (using KCP remediation)
  • KCP blocked when trying to restore the 3rd CP machine and reporting

failed to pass etcd health check: there are 2 healthy etcd pods, but 3 etcd members

What did you expect to happen:
KCP to re-create the 3rd CP machine

Anything else you would like to add:
Most probably this is a regression introduced by #3806 and more specifically by the nested if introduced in this change
https://github.com/kubernetes-sigs/cluster-api/blob/release-0.3/controlplane/kubeadm/controllers/controller.go#L502-L516

After a machine is deleted, EtcdIsHealthy returns an error due to

if expectedMembers != len(knownMemberIDSet) {
return response, errors.Errorf("there are %d healthy etcd pods, but %d etcd members", expectedMembers, len(knownMemberIDSet))
}

as a consequence we are entering in

errList = append(errList, errors.Wrap(err, "failed to pass etcd health check"))
and not calling ReconcileEtcdMembers

Environment:

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 23, 2020
@fabriziopandini
Copy link
Member Author

/milestone v0.3.11
/priority critical

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The label(s) priority/critical cannot be applied, because the repository doesn't have them

In response to this:

/milestone v0.3.11
/priority critical

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v0.3.11 milestone Oct 23, 2020
@fabriziopandini
Copy link
Member Author

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 23, 2020
@fabriziopandini
Copy link
Member Author

/assign
/lifecycle active

@fabriziopandini
Copy link
Member Author

#3900 is merged
/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

#3900 is merged
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
2 participants