Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] New condition to ensure all etcd's join a single cluster #595

Open
Tracked by #206
aaronfern opened this issue May 5, 2023 · 2 comments
Open
Tracked by #206
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age)

Comments

@aaronfern
Copy link
Contributor

Enhancement (What you would like to be added):
As of today, all etcd-druid conditions rely on all pods running and the etcd cluster being reachable. We log a successful etcd cluster as long as this is true and all etcd's are running.
It is a rare possibility, but if old PVCs exist, it may happen that all etcd's do not join the same cluster but may form multiple clusters, all connected to the same service. In this case, etcd-druid sees that all pods and running and will assume a successful cluster.

We need a way for etcd-druid to ensures that all the etcd's join the same cluster and log the result of this check.

Motivation (Why is this needed?):
This is needed as all pods are reachable via the same service and if there are multiple clusters, data will be split between them and will lead to data inconsistencies.

Approach/Hint to the implement solution (optional):
My proposal right now would be to add a new condition to the etcd status. We would check all renewed leases and ensure that there is only one leader. The condition is logged and it can come to an operators attention so that it can be fixed.
When we introduce member state, this functionality can be moved there.

@aaronfern aaronfern added the kind/enhancement Enhancement, improvement, extension label May 5, 2023
@aaronfern
Copy link
Contributor Author

/assign

@aaronfern aaronfern transferred this issue from gardener/etcd-backup-restore May 5, 2023
@ishan16696
Copy link
Member

Approach/Hint to the implement solution (optional):
My proposal right now would be to add a new condition to the etcd status. We would check all renewed leases and ensure that there is only one leader. The condition is logged and it can come to an operators attention so that it can be fixed.
When we introduce member state, this functionality can be moved there.

Things will change going forward as we will be using etcd-member custom resource #658
As @aaronfern is not working on this issue, so I'm unassigning @aaronfern from this issue.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/stale Nobody worked on this for 6 months (will further age)
Projects
None yet
Development

No branches or pull requests

3 participants