Merge pull request #2835 from justinsb/kep_etcdadm

Add KEP for etcdadm
kubernetes · Nov 22, 2018 · c5f3779 · c5f3779
2 parents bb7c1c7 + 6db2ddc
commit c5f3779
Showing 1 changed file with 211 additions and 0 deletions.
diff --git a/keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md b/keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md
@@ -0,0 +1,211 @@
+---
+kep-number: 31
+title: etcdadm
+authors:
+  - "@justinsb"
+owning-sig: sig-cluster-lifecycle
+#participating-sigs:
+#- sig-apimachinery
+reviewers:
+  - @roberthbailey
+  - @timothysc
+approvers:
+  - @roberthbailey
+  - @timothysc
+editor: TBD
+creation-date: 2018-10-22
+last-updated: 2018-10-22
+status: provisional
+#see-also:
+#  - KEP-1
+#  - KEP-2
+#replaces:
+#  - KEP-3
+#superseded-by:
+#  - KEP-100
+---
+
+# etcdadm - automation for etcd clusters
+
+## Table of Contents
+
+* [Table of Contents](#table-of-contents)
+* [Summary](#summary)
+* [Motivation](#motivation)
+    * [Goals](#goals)
+    * [Non-Goals](#non-goals)
+* [Proposal](#proposal)
+    * [User Stories](#user-stories)
+      * [Manual Cluster Creation](#manual-cluster-creation)
+      * [Automatic Cluster Creation](#automatic-cluster-creation)
+      * [Automatic Cluster Creation with EBS volumes](#automatic-cluster-creation-with-ebs-volumes)
+    * [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
+    * [Risks and Mitigations](#risks-and-mitigations)
+* [Graduation Criteria](#graduation-criteria)
+* [Implementation History](#implementation-history)
+* [Infrastructure Needed](#infrastructure-needed)
+
+## Summary
+
+etcdadm makes operation of etcd for the Kubernetes control plane easy, on clouds
+and on bare-metal, including both single-node and HA configurations.
+
+It is able to perform cluster reconfigurations, upgrades / downgrades, and
+backups / restores.
+
+## Motivation
+
+Today each installation tool must reimplement etcd operation, and this is
+difficult.  It also leads to ecosystem fragmentation - e.g. etcd backups from
+one tool are not necessarily compatible with the backups from other tools.  The
+failure modes are subtle and rare, and thus the kubernetes project benefits from
+having more collaboration.
+
+
+### Goals
+
+The following key tasks are in scope:
+
+* Cluster creation
+* Cluster teardown
+* Cluster resizing / membership changes
+* Cluster backups
+* Disaster recovery or restore from backup
+* Cluster upgrades
+* Cluster downgrades
+* PKI management
+
+We will implement this functionality both as a base layer of imperative (manual
+CLI) operation, and a self-management layer which should enable automated
+in "safe" scenarios (with fallback to manual operation).
+
+We'll also optionally support limited interaction with cloud infrastructure, for
+example for mounting volumes and peer-discovery.  This is primarily for the
+self-management layer, but we'll expose it via etcdadm for consistency and for
+power-users.  The tasks are limited today to listing & mounting a persistent
+volume, and listing instances to find peers.  A full solution for management of
+machines or networks (for example) is out of scope, though we might share some
+example configurations for exposition.  We expect kubernetes installation
+tooling to configure the majority of the cloud infrastructure here, because both
+the configurations and the configuration tooling varies widely.
+
+The big reason that volume mounting is in scope is that volume mounting acts as
+a simple mutex on most clouds - it is a cheap way to boost the safety of our
+leader/gossip algorithms, because we have an external source of truth.
+
+We'll also support reading & writing backups to S3 / GCS etc.
+
+### Non-Goals
+
+* The project is not targeted at operation of an etcd cluster for use other than
+  by Kubernetes apiserver.  We are not building a general-purpose etcd operation
+  toolkit.  Likely it will work well for other use-cases, but other tools may be
+  more suitable.
+* As described above, we aren't building a full "turn up an etcd cluster on a
+  cloud solution"; we expect this to be a building block for use by kubernetes
+  installation tooling (e.g. cluster API solutions).
+
+## Proposal
+
+We will combine the [etcdadm](https://github.com/platform9/etcdadm) from
+Platform9 with the [etcd-manager](https://github.com/kopeio/etcd-manager)
+project from kopeio / @justinsb.
+
+etcdadm gives us easy to use CLI commands, which will form the base layer of
+operation.  Automation should ideally describe what it is doing in terms of
+etcdadm commands, though we will also expose etcdadm as a go-library for easier
+consumption, following the kubectl pattern of a `cmd/` layer calling into a
+`pkg/` layer.  This means the end-user can understand the operation of the
+tooling, and advanced users can feel confident that they can use the CLI tooling
+for advanced operations.
+
+etcd-manager provides automation of the common scenarios, particularly when
+running on a cloud.  It will be rebased to work in terms of etcdadm CLI
+operations (which will likely require some functionality to be added to etcdadm
+itself).  Where automation is not known to be safe, etcd-manager can stop and
+allow for manual intervention using the CLI.
+
+kops is currently using etcd-manager, and we aim to switch to the (new) etcadm asap.
+
+We expect other tooling (e.g. cluster-api implementations) to adopt this project
+for etcd management going forwards, and do a first integration or two if it
+hasn't happened already.
+
+### User Stories
+
+#### Manual Cluster Creation
+
+A cluster operator setting up a cluster manually will be able to do so using etcdadm and kubeadm.
+
+The basic flow looks like:
+
+* On a master machine, run `etcdadm init`, making note of the `etcdadm join
+  <endpoint>` command
+* On each other master machine, copy the CA certificate and key from one of the
+  other masters, then run the `etcdadm join <endpoint>` command.
+* Run kubeadm following the [external etcd procedure](https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd)
+
+This results in an multi-node ("HA") etcd cluster.
+
+#### Automatic Cluster Creation
+
+etcd-manager works by coordinating via a shared filesystem-like store (e.g. S3
+or GCS) and/or via cloud APIs (e.g. EC2 or GCE).  In doing so it is able to
+automate the manual commands, which is very handy for running in a cloud
+environment like AWS or GCE.
+
+The basic flow would look like:
+
+* The user writes a configuration file to GCS using `etcdadm seed
+  gs://mybucket/cluster1/etcd1 version=3.2.12 nodes=3`
+* On each master machine, run `etcdadm auto gs://mybucket/cluster1/etcd1`.
+  (Likely the user will have to run that persistently, either as a systemd
+  service or a static pod.)
+
+`etcdadm auto` downloads the target configuration from GCS, discovers other
+peers also running etcdadm, gossips with them to do basic leader election.  When
+sufficient nodes are available to form a quorum, it starts etcd.
+
+#### Automatic Cluster Creation with EBS volumes
+
+etcdadm can also automatically mount EBS volumes.  The workflow looks like this:
+
+* As before, write a configuration file using `etcadm seed ...`, but this time
+  passing additional arguments "--volume-tag cluster=mycluster"
+* Create EBS volumes with the matching tags
+* On each master machine, run `etcdadm auto ...` as before.  Now etcdadm will
+  try to mount a volume with the correct tags before acting as a member of the
+  cluster.
+
+### Implementation Details/Notes/Constraints
+
+* There will be some changes needed to both platform9/etcdadm (e.g. etcd2
+  support) and kopeio/etcd-manager (to rebase on top of etcdadm).
+* It is unlikely that e.g. GKE / EKS will use etcdadm (at least initially),
+  which limits the pool of contributors.
+
+### Risks and Mitigations
+
+* Automatic mode may make incorrect decisions and break a cluster.  Mitigation:
+  automated backups, and a willingness to stop and wait for a fix / operator
+  intervention (CLI mode).
+* Automatic mode relies on peer-to-peer discovery and gossiping, which is less
+  reliable than Raft.  Mitigation: rely on Raft as much as possible, be very
+  conservative in automated operations (favor correctness over availability or
+  speed).  etcd non-voting members will make this much more reliable.
+
+## Graduation Criteria
+
+etcdadm will be considered successful when it is used by the majority of OSS
+cluster installations.
+
+## Implementation History
+
+* Much SIG discussion
+* Initial proposal to SIG 2018-10-09
+* Initial KEP draft 2018-10-22 
+* Added clarification of cloud interaction 2018-10-23
+
+## Infrastructure Needed
+
+* etcdadm will be a subproject under sig-cluster-lifecycle