diff --git a/keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md b/keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md new file mode 100644 index 00000000000..d2682fb3f7e --- /dev/null +++ b/keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md @@ -0,0 +1,211 @@ +--- +kep-number: 31 +title: etcdadm +authors: + - "@justinsb" +owning-sig: sig-cluster-lifecycle +#participating-sigs: +#- sig-apimachinery +reviewers: + - @roberthbailey + - @timothysc +approvers: + - @roberthbailey + - @timothysc +editor: TBD +creation-date: 2018-10-22 +last-updated: 2018-10-22 +status: provisional +#see-also: +# - KEP-1 +# - KEP-2 +#replaces: +# - KEP-3 +#superseded-by: +# - KEP-100 +--- + +# etcdadm - automation for etcd clusters + +## Table of Contents + +* [Table of Contents](#table-of-contents) +* [Summary](#summary) +* [Motivation](#motivation) + * [Goals](#goals) + * [Non-Goals](#non-goals) +* [Proposal](#proposal) + * [User Stories](#user-stories) + * [Manual Cluster Creation](#manual-cluster-creation) + * [Automatic Cluster Creation](#automatic-cluster-creation) + * [Automatic Cluster Creation with EBS volumes](#automatic-cluster-creation-with-ebs-volumes) + * [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + * [Risks and Mitigations](#risks-and-mitigations) +* [Graduation Criteria](#graduation-criteria) +* [Implementation History](#implementation-history) +* [Infrastructure Needed](#infrastructure-needed) + +## Summary + +etcdadm makes operation of etcd for the Kubernetes control plane easy, on clouds +and on bare-metal, including both single-node and HA configurations. + +It is able to perform cluster reconfigurations, upgrades / downgrades, and +backups / restores. + +## Motivation + +Today each installation tool must reimplement etcd operation, and this is +difficult. It also leads to ecosystem fragmentation - e.g. etcd backups from +one tool are not necessarily compatible with the backups from other tools. The +failure modes are subtle and rare, and thus the kubernetes project benefits from +having more collaboration. + + +### Goals + +The following key tasks are in scope: + +* Cluster creation +* Cluster teardown +* Cluster resizing / membership changes +* Cluster backups +* Disaster recovery or restore from backup +* Cluster upgrades +* Cluster downgrades +* PKI management + +We will implement this functionality both as a base layer of imperative (manual +CLI) operation, and a self-management layer which should enable automated +in "safe" scenarios (with fallback to manual operation). + +We'll also optionally support limited interaction with cloud infrastructure, for +example for mounting volumes and peer-discovery. This is primarily for the +self-management layer, but we'll expose it via etcdadm for consistency and for +power-users. The tasks are limited today to listing & mounting a persistent +volume, and listing instances to find peers. A full solution for management of +machines or networks (for example) is out of scope, though we might share some +example configurations for exposition. We expect kubernetes installation +tooling to configure the majority of the cloud infrastructure here, because both +the configurations and the configuration tooling varies widely. + +The big reason that volume mounting is in scope is that volume mounting acts as +a simple mutex on most clouds - it is a cheap way to boost the safety of our +leader/gossip algorithms, because we have an external source of truth. + +We'll also support reading & writing backups to S3 / GCS etc. + +### Non-Goals + +* The project is not targeted at operation of an etcd cluster for use other than + by Kubernetes apiserver. We are not building a general-purpose etcd operation + toolkit. Likely it will work well for other use-cases, but other tools may be + more suitable. +* As described above, we aren't building a full "turn up an etcd cluster on a + cloud solution"; we expect this to be a building block for use by kubernetes + installation tooling (e.g. cluster API solutions). + +## Proposal + +We will combine the [etcdadm](https://github.com/platform9/etcdadm) from +Platform9 with the [etcd-manager](https://github.com/kopeio/etcd-manager) +project from kopeio / @justinsb. + +etcdadm gives us easy to use CLI commands, which will form the base layer of +operation. Automation should ideally describe what it is doing in terms of +etcdadm commands, though we will also expose etcdadm as a go-library for easier +consumption, following the kubectl pattern of a `cmd/` layer calling into a +`pkg/` layer. This means the end-user can understand the operation of the +tooling, and advanced users can feel confident that they can use the CLI tooling +for advanced operations. + +etcd-manager provides automation of the common scenarios, particularly when +running on a cloud. It will be rebased to work in terms of etcdadm CLI +operations (which will likely require some functionality to be added to etcdadm +itself). Where automation is not known to be safe, etcd-manager can stop and +allow for manual intervention using the CLI. + +kops is currently using etcd-manager, and we aim to switch to the (new) etcadm asap. + +We expect other tooling (e.g. cluster-api implementations) to adopt this project +for etcd management going forwards, and do a first integration or two if it +hasn't happened already. + +### User Stories + +#### Manual Cluster Creation + +A cluster operator setting up a cluster manually will be able to do so using etcdadm and kubeadm. + +The basic flow looks like: + +* On a master machine, run `etcdadm init`, making note of the `etcdadm join + ` command +* On each other master machine, copy the CA certificate and key from one of the + other masters, then run the `etcdadm join ` command. +* Run kubeadm following the [external etcd procedure](https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd) + +This results in an multi-node ("HA") etcd cluster. + +#### Automatic Cluster Creation + +etcd-manager works by coordinating via a shared filesystem-like store (e.g. S3 +or GCS) and/or via cloud APIs (e.g. EC2 or GCE). In doing so it is able to +automate the manual commands, which is very handy for running in a cloud +environment like AWS or GCE. + +The basic flow would look like: + +* The user writes a configuration file to GCS using `etcdadm seed + gs://mybucket/cluster1/etcd1 version=3.2.12 nodes=3` +* On each master machine, run `etcdadm auto gs://mybucket/cluster1/etcd1`. + (Likely the user will have to run that persistently, either as a systemd + service or a static pod.) + +`etcdadm auto` downloads the target configuration from GCS, discovers other +peers also running etcdadm, gossips with them to do basic leader election. When +sufficient nodes are available to form a quorum, it starts etcd. + +#### Automatic Cluster Creation with EBS volumes + +etcdadm can also automatically mount EBS volumes. The workflow looks like this: + +* As before, write a configuration file using `etcadm seed ...`, but this time + passing additional arguments "--volume-tag cluster=mycluster" +* Create EBS volumes with the matching tags +* On each master machine, run `etcdadm auto ...` as before. Now etcdadm will + try to mount a volume with the correct tags before acting as a member of the + cluster. + +### Implementation Details/Notes/Constraints + +* There will be some changes needed to both platform9/etcdadm (e.g. etcd2 + support) and kopeio/etcd-manager (to rebase on top of etcdadm). +* It is unlikely that e.g. GKE / EKS will use etcdadm (at least initially), + which limits the pool of contributors. + +### Risks and Mitigations + +* Automatic mode may make incorrect decisions and break a cluster. Mitigation: + automated backups, and a willingness to stop and wait for a fix / operator + intervention (CLI mode). +* Automatic mode relies on peer-to-peer discovery and gossiping, which is less + reliable than Raft. Mitigation: rely on Raft as much as possible, be very + conservative in automated operations (favor correctness over availability or + speed). etcd non-voting members will make this much more reliable. + +## Graduation Criteria + +etcdadm will be considered successful when it is used by the majority of OSS +cluster installations. + +## Implementation History + +* Much SIG discussion +* Initial proposal to SIG 2018-10-09 +* Initial KEP draft 2018-10-22 +* Added clarification of cloud interaction 2018-10-23 + +## Infrastructure Needed + +* etcdadm will be a subproject under sig-cluster-lifecycle