Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check new controllers against etcd member-list to detect replaced hosts #714

Merged
merged 3 commits into from
Jun 10, 2024

Conversation

kke
Copy link
Contributor

@kke kke commented May 15, 2024

Addresses #603

Before installing new controllers, check if the address is already found from k0s etcd member-list. If a match with the same address is found, the apply process is aborted. K0sctl can try to perform the etcd leave for the vanished address automatically when --force is used. This check is skipped when kine or an external etcd is used.

This protects against situations when a controller host in k0sctl.yaml gets completely wiped or replaced with a fresh one between two runs. Previously k0sctl would have treated it as a regular new controller and caused the etcd to get confused / split brained.

There was also a bug when adding new controllers in general, writing the config on the previously existing clusters could have failed with "empty content on file write" error.

@kke kke added the enhancement New feature or request label May 15, 2024
@kke kke force-pushed the list-etcd-members branch 7 times, most recently from fa1aa89 to 7b922ea Compare May 16, 2024 12:14
phase/gather_k0s_facts.go Outdated Show resolved Hide resolved
@kke kke force-pushed the list-etcd-members branch 2 times, most recently from 805a0d1 to 3debfb6 Compare May 20, 2024 08:04
@kke
Copy link
Contributor Author

kke commented Jun 5, 2024

This was confirmed to help with the problem in some conditions.

Could we also use the node list as another way to see if the node is previously known maybe?

phase/gather_k0s_facts.go Outdated Show resolved Hide resolved
@twz123
Copy link
Member

twz123 commented Jun 5, 2024

Could we also use the node list as another way to see if the node is previously known maybe?

By node list, you mean, figuratively speaking, kubectl get nodes? That will only show k0s nodes that run the worker components. Controller-only k0s nodes won't show up.

The best way of checking etcd memebers would be to use the EtcdMemeber CRDs. You probably want to try that first. This only works for new k0s versions, but in case the CRDs aren't registered, k0sctl could gracefully degrade and use k0s etcd member-list. Another possibility would be to try the Autopilot ControlNode CRDs. Although they're not perfect, too, and there may be 🐉 🐉 🐉.

@kke kke merged commit 68b97cd into main Jun 10, 2024
39 checks passed
@kke kke deleted the list-etcd-members branch June 10, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants