diff --git a/docs/proposals/20210524-etcdadm-bootstrap-provider.md b/docs/proposals/20210524-managed-external-etcd.md similarity index 92% rename from docs/proposals/20210524-etcdadm-bootstrap-provider.md rename to docs/proposals/20210524-managed-external-etcd.md index 3eff9612335f..97927b39a013 100644 --- a/docs/proposals/20210524-etcdadm-bootstrap-provider.md +++ b/docs/proposals/20210524-managed-external-etcd.md @@ -3,7 +3,7 @@ title: Add support for managed external etcd clusters in CAPI authors: - "@mrajashree" creation-date: 2021-05-24 -last-updated: 2021-06-01 +last-updated: 2021-06-14 status: provisional --- @@ -86,6 +86,7 @@ So it would be good to add support for provisioning the etcd cluster too, so the - The first iteration will use IP addresses/hostnames as etcd cluster endpoints. It can not configure static endpoint(s) till the Load Balancer provider is available. - API changes such as adding new fields within existing control plane providers. For instance, if a user chooses KubeadmControlPlane provider as the control plane provider, we will utilize the existing external etcd endpoints field from the KubeadmConfigSpec for KCP instead of adding new fields. - This is not to be used for etcd-as-a-service. The etcd clusters created by an etcd provider within CAPI will only be used by the kube-apiserver of a target workload cluster. +- Etcd cluster snapshots will need its own separate KEP. It should include details about how to take a snapshot, if `etcdctl snapshot save` command is used then how to collect the snapshots from the etcd machines and an integration with a storage provider to save those snapshots. This first iteration will not have a snapshot mechanism implemented. Users can still take snapshots by manually running the `etcdctl snapshot save` command and we can document those steps. ## Proposal @@ -263,8 +264,10 @@ Each etcd provider should add a new API type to manage the etcd cluster lifecycl ###### Contract between etcd provider and control plane provider - Control plane providers should check for the presence of the paused annotation (`cluster.x-k8s.io/paused`), and not continue provisioning if it is set. Out of the existing control plane provider, the KubeadmControlPlane controller does that. -- Control plane providers should check for the presence of the `Cluster.Spec.ManagedExternalEtcdRef` field. This check should happen after the control plane is no longer paused, and before the provider starts provisioning the control plane. -- The control plane provider should "Get" the CR referred by the cluster.spec.ManagedExternalEtcdRef and check its status.Endpoints field. It will parse the endpoints from this field and use these endpoints as the external etcd endpoints where applicable. For instance, the KCP controller will read the status.Endpoints field and parse it into a string slice, and then use it as the [external etcd endpoints](https://github.com/kubernetes-sigs/cluster-api/blob/master/bootstrap/kubeadm/api/v1alpha4/kubeadm_types.go#L269). +- Control plane providers should check for the presence of `Cluster.Spec.ManagedExternalEtcdRef` field. This check should happen after the control plane is no longer paused, and before the provider starts provisioning the control plane. +- The control plane provider should "Get" the CR referred by the cluster.spec.ManagedExternalEtcdRef and check its status.Endpoints field. + - This field will be a comma separated list of etcd endpoints. + - It will parse the endpoints from this field and use these endpoints as the external etcd endpoints where applicable. For instance, the KCP controller will read the status.Endpoints field and parse it into a string slice, and then use it as the [external etcd endpoints](https://github.com/kubernetes-sigs/cluster-api/blob/master/bootstrap/kubeadm/api/v1alpha4/kubeadm_types.go#L269). - The control plane provider will read the etcd CA cert, and the client cert-key pair from the two Kubernetes Secrets named `{cluster.Name}-apiserver-etcd-client` and `{cluster.Name}-etcd`. The KCP controller currently does that. ###### Contract between etcd provider and infrastructure provider @@ -421,7 +424,7 @@ spec: - Member is found unhealthy - During an upgrade - While replacing an unhealthy etcd member, it is important to first remove the member and only then replace it by adding a new one. This ensures the cluster stays stable. [This Etcd doc](https://etcd.io/docs/v3.4/faq/#should-i-add-a-member-before-removing-an-unhealthy-member) explains why removing an unhealthy member is important in depth. Etcd's latest versions allow adding new members as learners so it doesn't affect the quorum size, but that feature is in beta. Also, adding a learner is a two-step process, where you first add a member as learner and then promote it, and etcdadm currently doesn't support it. So we can include this in a later version once the learner feature becomes GA and etcdadm incorporates the required changes. -- We can reuse this same "scale down" function when removing a healthy member during an etcd upgrade. So even upgrades can follow the same pattern of removing the member first and then adding a new one with the latest spec. +- We can reuse this same "scale down" function when removing a healthy member during an etcd upgrade. So even upgrade will follow the same pattern of removing the member first and then adding a new one with the latest spec. - Once the upgrade/machine replacement is completed, the etcd controller will again update the EtcdadmCluster.Status.Endpoints field. The control plane controller will rollout new Machines that use the updated etcd endpoints for the apiserver. ###### Periodic etcd member healthchecks @@ -433,16 +436,29 @@ spec: - Infrastructure providers like CAPA and CAPV set this provider ID based on the instance metadata. - CAPD on the other hand first calls `kubectl patch` to add the provider ID on the node. And only then it sets the provider ID value on the DockerMachine resource. But the problem is that in case of an etcd only cluster, the machine is not registered as a Kubernetes node. Cluster provisioning does not progress until this provider ID is set on the Machine. As a solution, CAPD will check if the bootstrap provider is etcdadm, and skip the kubectl patch process and set the provider ID on DockerMachine directly. These changes are required in the [docker machine controller](https://github.com/mrajashree/cluster-api/pull/2/files#diff-1923dd8291a9406e3c144763f526bd9797a2449a030f5768178b8d06d13c795bR307) for CAPD. +#### Clusterctl changes +- External etcd topology is optional so it won't be enabled by default. Users should be able to install it through `clusterctl init` when needed. This requires the following changes in clusterctl +- New ProviderTypes +```go +EtcdBootstrapProviderType = ProviderType("EtcdBootstrapProvider") +EtcdProvider = ProviderType("EtcdProvider") +``` +The Etcdadm based provider requires two separate provider types, other etcd providers that get added later on might not require two separate providers, so they can utilize just the EtcdProvider type. +With these changes, user should be able to install the etcd provider by running +``` +clusterctl init --etcdbootstrap etcdadm-bootstrap-provider --etcdprovider etcdadm-controller +``` +- CAPI's default `Providers` list defined [here](https://github.com/kubernetes-sigs/cluster-api/blob/master/cmd/clusterctl/client/config/providers_client.go#L95) will be updated to add manifests for the Etcdadm bootstrap provider and controller. + #### Future work ##### Static Etcd endpoints -- If any VMs running an etcd member get replaced, the kube-apiserver will have to be reconfigured to use the new etcd cluster endpoints. Using static endpoints for the external etcd cluster can avoid this. There are two ways of configuring static endpoints, using a load balancer or configuring DNS records. +- If any VMs running an etcd member get replaced, the kube-apiserver will have to be reconfigured to use the new etcd cluster endpoints. We can avoid this by using static endpoints for the external etcd cluster. There are two ways of configuring static endpoints, using a load balancer or configuring DNS records. - Load balancer can add latency because of the hops associated with routing requests, whereas DNS records will directly resolve to the etcd members. -- The DNS records can be configured such that each etcd member gets a separate sub-domain, under the domain associated with the etcd cluster. An example of this when using ec2 would be: - - User creates a hosted zone called `external.etcd.cluster` and gives that as input to `EtcdadmCluster.Spec.HostedZoneName`. +- Instead of configuring a single DNS record for all etcd members, we can configure multiple DNS records, one per etcd member to leverage [etcd client's balancer](https://etcd.io/docs/v3.4/learning/design-client/#clientv3-grpc10-balancer-overview). The DNS records can be configured such that each etcd member gets a separate sub-domain, under the domain associated with the etcd cluster. An example of this when using ec2 would be: + - User creates a hosted zone called `external.etcd.cluster` and gives that as input to a field `EtcdadmCluster.Spec.HostedZoneName`. - The EtcdadmCluster controller creates a separate A record name within that hosted zone for each EtcdadmConfig it creates. The DNS controller will create the route53 A record once the corresponding Machine's IP address is set -- A suggestion from a CAPI meeting is to add a DNS configuration implementation in the [load balancer proposal](https://github.com/kubernetes-sigs/cluster-api/pull/4389) -- In today's CAPI meeting (April 28th) we decided to implement phase 1 with IP addresses in the absence of the load balancer provider. +- A suggestion from a CAPI meeting is to add a DNS configuration implementation in the [load balancer proposal](https://github.com/kubernetes-sigs/cluster-api/pull/4389). In the CAPI meeting on April 28th we decided to implement phase 1 with IP addresses in the absence of the load balancer provider. ##### Member replacement by adding a new member as learner - Etcd allows adding a new member as a [learner](https://etcd.io/docs/v3.3/learning/learner/). This is useful when adding a new etcd member to the cluster, so that it doesn't increase the quorum size, and the learner gets enough time to catch up with the data from the leader.