Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ [Feature] Add EtcdMember resource #206

Closed
1 task
timuthy opened this issue Jul 20, 2021 · 9 comments · Fixed by #214 or #658
Closed
1 task

☂️ [Feature] Add EtcdMember resource #206

timuthy opened this issue Jul 20, 2021 · 9 comments · Fixed by #214 or #658
Assignees
Labels
kind/enhancement Enhancement, improvement, extension priority/1 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)

Comments

@timuthy
Copy link
Member

timuthy commented Jul 20, 2021

Feature (What you would like to be added):
A new resource EtcdMemeber should be added to the druid.gardener.cloud/v1alpha1 API group.

Example:

apiVersion: druid.gardener.cloud/v1alpha1
kind: EtcdMember
metadata:
  labels:
    gardener.cloud/owned-by: etcd-test
  name: etcd-test-0 # pod name
  namespace: default
  ownerReferences:
  - apiVersion: druid.gardener.cloud/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: etcd
    name: etcd-test
    uid: <UID>
status:
  id: "1"
  lastTransitionTime: "2021-07-20T10:34:04Z"
  lastUpdateTime: "2021-07-20T10:34:04Z"
  name: member1
  reason: up and running
  role: Member
  status: Ready

Every etcd member in a cluster should have a corresponding EtcdMember resource which contains the shown status information. The EtcdMember resource ought to be created and maintained by the backup-restore side car. Etcd-Druid may set status: Unkown after heartbeatGracePeriod (ref).

Motivation (Why is this needed?):
The original proposal intended the status information of each etcd member to be part of a []members list in the etcd.status resource. However, this will lead to update conflicts as multiple clients try to update the same resource at nearly the same time and we cannot use any adequate patch technique (SSA failed for K8s versions <= 1.21, strategic-merge not supported for CRDs) to prevent that.

Subtasks

/cc @shreyas-s-rao @amshuman-kr

@timuthy timuthy added the kind/enhancement Enhancement, improvement, extension label Jul 20, 2021
@shreyas-s-rao
Copy link
Contributor

Thanks @timuthy for summarizing the issue.

@timuthy
Copy link
Member Author

timuthy commented Jul 20, 2021

/status in-progress

@gardener-robot gardener-robot added the status/in-progress Issue is in progress/work label Jul 20, 2021
@timuthy
Copy link
Member Author

timuthy commented Jul 21, 2021

@amshuman-kr, @shreyas-s-rao and I iterated again over the approach and agreed that we find the EtcdMember design confusing because the resource itself only exists of a status and is not meant to be reconciled by any controller.
Instead, the backup-restore side car can contribute to the corresponding pod.status.conditions with a custom condition type. Extending pod conditions is a supported use-case and enables us to maintain extra status information about each etcd member. Druid can then consult these custom conditions to aggregate the status its AllMemberReady and Ready conditions on etcd.status.conditions.

@amshuman-kr
Copy link
Collaborator

#207 describes an alternative of using leases instead of a new EtcdMember resource.

@timuthy
Copy link
Member Author

timuthy commented Sep 7, 2022

After working in the problem space we realized that the Lease approach is not ideal as an etcd-member has different status information that should be exposed to etcd-druid in order to take appropriate action, e.g.:

  • Is peer-tls enabled for the member -> lets Druid trigger a appropriate rollout if changed from disabled -> enabled (cc @unmarshall)
  • Is the disk corrupted -> lets Druid know to take adequate action, e.g. by triggering quorum restoration (cc @abdasgupta)

After a discussion with @unmarshall, we think it's best to revive the idea of the EtcdMember resource instead of continuing maintaining this information in Leases via annotations and other hacks. The design of the EtcdMember API can of course be revisited and revised, e.g. not having a status but another field.

However, there can also be arguments to have .spec and .status fields. For instance, via a .spec it'd be possible for Druid to trigger a data archiving/movement before quorum restoration (cc @vlerenc).

@timuthy timuthy reopened this Sep 7, 2022
@unmarshall
Copy link
Contributor

For etcd druid to do deterministic recovery/reconciliation its important that individual etcd members publish their up-to-date information. Outside gardener if etcd-druid is not used then any other external actor can be the consumer of this information providing it holistic state of all members in the etcd cluster.

@unmarshall
Copy link
Contributor

/assign

@ishan16696
Copy link
Member

Is the disk corrupted -> lets Druid know to take adequate action, e.g. by triggering quorum restoration (cc @abdasgupta)

Just mentioning few points:
When disk get corrupted for 1 cluster member in multi-node etcd cluster, then druid is not getting involved. Please refer to this doc.
When there is a disk failure for majority of cluster member in multi-node etcd cluster, then we have to do the manual intervention as this is the case of permanent quorum loss which is unlikely to occur.

@shreyas-s-rao shreyas-s-rao added priority/1 Priority (lower number equals higher priority) and removed priority/2 Priority (lower number equals higher priority) labels May 3, 2023
@unmarshall unmarshall changed the title [Feature] Add EtcdMember resource ☂️ [Feature] Add EtcdMember resource May 5, 2023
@shreyas-s-rao
Copy link
Contributor

The state of each etcd member is beneficial not only for deciding the order of rolling the members during an update, but also to decide the order of rolling volumes in the case of rolling volumes (due to changes in volume size, storage class, etc). Exact details of how the statefulset can be prepared and then volumes are rolled without downtime for a multi-node etcd cluster are captured in this comment from @unmarshall -> gardener/gardener-extension-provider-aws#646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension priority/1 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
8 participants