Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KEP for etcdadm #2835

Merged
merged 2 commits into from
Nov 22, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion keps/NEXT_KEP_NUMBER
Original file line number Diff line number Diff line change
@@ -1 +1 @@
31
32
211 changes: 211 additions & 0 deletions keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
---
kep-number: 31
title: etcdadm
authors:
- "@justinsb"
owning-sig: sig-cluster-lifecycle
#participating-sigs:
#- sig-apimachinery
reviewers:
- @roberthbailey
- @timothysc
approvers:
- @roberthbailey
- @timothysc
editor: TBD
creation-date: 2018-10-22
last-updated: 2018-10-22
status: provisional
#see-also:
# - KEP-1
# - KEP-2
#replaces:
# - KEP-3
#superseded-by:
# - KEP-100
---

# etcdadm - automation for etcd clusters

## Table of Contents

* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [User Stories](#user-stories)
* [Manual Cluster Creation](#manual-cluster-creation)
* [Automatic Cluster Creation](#automatic-cluster-creation)
* [Automatic Cluster Creation with EBS volumes](#automatic-cluster-creation-with-ebs-volumes)
* [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)
* [Infrastructure Needed](#infrastructure-needed)

## Summary

etcdadm makes operation of etcd for the Kubernetes control plane easy, on clouds
and on bare-metal, including both single-node and HA configurations.

It is able to perform cluster reconfigurations, upgrades / downgrades, and
backups / restores.

## Motivation

Today each installation tool must reimplement etcd operation, and this is
difficult. It also leads to ecosystem fragmentation - e.g. etcd backups from
one tool are not necessarily compatible with the backups from other tools. The
failure modes are subtle and rare, and thus the kubernetes project benefits from
having more collaboration.


### Goals

The following key tasks are in scope:

* Cluster creation
* Cluster teardown
* Cluster resizing / membership changes
* Cluster backups
* Disaster recovery or restore from backup
* Cluster upgrades
* Cluster downgrades
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently downgrade is not supported by etcd, but being added to v3.4 etcd-io/etcd#7308. Please ping us if you need any help. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if etcd can support it natively! Currently kopeio/etcd-manager implements it via a backup/restore, with a key-by-key copy. We can obviously be smarter about that, but it should work anywhere (it seems to even work for etcd3 -> etcd2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if etcd can support it natively!

Yes, @wenjiaswe is working on it :)

Copy link
Contributor

@wenjiaswe wenjiaswe Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb yes, I am working on this and here is "etcd downgrad design" documentation. Here is the items planned. I haven't tried kopeio/etcd-manager but I will try it out. I think it's good that it works the way it is and etcdadm could use it before etcd downgrade is supported. Meanwhile, shall we sync on eligibility of integration of etcd native downgrade with etcdadm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenjiaswe thanks & absolutely. I don't think there's any question that when etcd supports downgrade natively we should prefer that option :-) (For expediency in kopeio/etcd-manager all upgrades involve a key/value copy today, but I'll fix that for the upgrades that etcd does support - it's easier to have one code path, but it is very sub-optimal).

But we should definitely sync - for example, today we put etcd into "read-only" mode by switching ports. That lets an HA cluster stay up, but means we know that apiserver won't be writing to it. But ... it's not the cleanest solution, and this is another thing that it would be wonderful to have native support for. But again: not a real blocker.

The real wishlist is for non-voting cluster members - that would make automatic management much safer. But I understand that is coming to etcd as well 🎉

* PKI management

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a really useful task will be "pivoting" from kubeadm local etcd to etcdadm managed etcd, thus providing the user a way forward from simplest etcd clusters to something more complex

We will implement this functionality both as a base layer of imperative (manual
CLI) operation, and a self-management layer which should enable automated
in "safe" scenarios (with fallback to manual operation).

We'll also optionally support limited interaction with cloud infrastructure, for
example for mounting volumes and peer-discovery. This is primarily for the
self-management layer, but we'll expose it via etcdadm for consistency and for
power-users. The tasks are limited today to listing & mounting a persistent
volume, and listing instances to find peers. A full solution for management of
machines or networks (for example) is out of scope, though we might share some
example configurations for exposition. We expect kubernetes installation
tooling to configure the majority of the cloud infrastructure here, because both
the configurations and the configuration tooling varies widely.

The big reason that volume mounting is in scope is that volume mounting acts as
a simple mutex on most clouds - it is a cheap way to boost the safety of our
leader/gossip algorithms, because we have an external source of truth.

We'll also support reading & writing backups to S3 / GCS etc.

### Non-Goals

* The project is not targeted at operation of an etcd cluster for use other than
by Kubernetes apiserver. We are not building a general-purpose etcd operation
toolkit. Likely it will work well for other use-cases, but other tools may be
more suitable.
* As described above, we aren't building a full "turn up an etcd cluster on a
cloud solution"; we expect this to be a building block for use by kubernetes
installation tooling (e.g. cluster API solutions).

## Proposal

We will combine the [etcdadm](https://github.com/platform9/etcdadm) from
Platform9 with the [etcd-manager](https://github.com/kopeio/etcd-manager)
project from kopeio / @justinsb.

etcdadm gives us easy to use CLI commands, which will form the base layer of
operation. Automation should ideally describe what it is doing in terms of
etcdadm commands, though we will also expose etcdadm as a go-library for easier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

consumption, following the kubectl pattern of a `cmd/` layer calling into a
`pkg/` layer. This means the end-user can understand the operation of the
tooling, and advanced users can feel confident that they can use the CLI tooling
for advanced operations.

etcd-manager provides automation of the common scenarios, particularly when
running on a cloud. It will be rebased to work in terms of etcdadm CLI
operations (which will likely require some functionality to be added to etcdadm
itself). Where automation is not known to be safe, etcd-manager can stop and
allow for manual intervention using the CLI.

kops is currently using etcd-manager, and we aim to switch to the (new) etcadm asap.

We expect other tooling (e.g. cluster-api implementations) to adopt this project
for etcd management going forwards, and do a first integration or two if it
hasn't happened already.

### User Stories

#### Manual Cluster Creation

A cluster operator setting up a cluster manually will be able to do so using etcdadm and kubeadm.

The basic flow looks like:

* On a master machine, run `etcdadm init`, making note of the `etcdadm join
<endpoint>` command
* On each other master machine, copy the CA certificate and key from one of the
other masters, then run the `etcdadm join <endpoint>` command.
* Run kubeadm following the [external etcd procedure](https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this tool to integrate with kubeadm in such a way that it produces "local" etcd clusters? There is something nice about etcd running as pods in the cluster as it allows reuse of k8s based tooling for monitoring, logging, metrics, etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd imagine kubeadm could easily replace its built-in etcd management with a call-out to etcdadm.

And yes, I agree that pods in the cluster is the only configuration that we test today and so it's the one I personally feel most comfortable with. Hopefully we can add more e2e configurations going forward though!


This results in an multi-node ("HA") etcd cluster.

#### Automatic Cluster Creation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in hearing where the line is drawn between this KEP and the current functionality provided by kops. For instance kops has a very specific architecture (1 ASG per control plane node AZ with a min=max=desired=1, EBS volumes per control plane node AZ tagged in a specific way, etc). This architecture is optimized for etcd fault tolerance and DR. Does this KEP offer the option to have similar fault tolerance and DR functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So kopeio/etcd-manager was a reimplementation of the kops etcd management functionality. The intention is that this is a clean implementation that any installation tool can use, not just kops.

I'll clarify though that we're assuming that an external installation tool sets up the infrastructure itself if we're using EBS volumes - i.e. I don't think etcdadm should set up the volumes or the AWS ASGs or GCE MIGs that will likely provide the machines on which this runs. (Or that would should be a separate KEP if so!) It will make it very easy to set up those ASGs though, as they can all run the same command. I'll clarify this though, as I don't think I covered it sufficiently... etcdadm should (optionally) support auto-mounting of volumes IMO, but I think setting them up is best done externally.

We could bring this into scope, but I think it's better just to clearly document the requirements as opinions vary so widely here! (e.g. "if you're using volumes, pass the tags using the this flag, you probably want to put them in separate AZs, and you probably want to run in separate ASGs to guarantee equal zonal coverage")


etcd-manager works by coordinating via a shared filesystem-like store (e.g. S3
or GCS) and/or via cloud APIs (e.g. EC2 or GCE). In doing so it is able to
automate the manual commands, which is very handy for running in a cloud
environment like AWS or GCE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think using the bootstrap token to encrypt and store the certs as secrets also makes a lot of sense. It keeps it local to the cluster without adding the dependencies and also expires after a period of 24-hours by default to eliminate several of the security concerns.


The basic flow would look like:

* The user writes a configuration file to GCS using `etcdadm seed
gs://mybucket/cluster1/etcd1 version=3.2.12 nodes=3`
* On each master machine, run `etcdadm auto gs://mybucket/cluster1/etcd1`.
(Likely the user will have to run that persistently, either as a systemd
service or a static pod.)

`etcdadm auto` downloads the target configuration from GCS, discovers other
peers also running etcdadm, gossips with them to do basic leader election. When
sufficient nodes are available to form a quorum, it starts etcd.

#### Automatic Cluster Creation with EBS volumes

etcdadm can also automatically mount EBS volumes. The workflow looks like this:

* As before, write a configuration file using `etcadm seed ...`, but this time
passing additional arguments "--volume-tag cluster=mycluster"
* Create EBS volumes with the matching tags
* On each master machine, run `etcdadm auto ...` as before. Now etcdadm will
try to mount a volume with the correct tags before acting as a member of the
cluster.

### Implementation Details/Notes/Constraints

* There will be some changes needed to both platform9/etcdadm (e.g. etcd2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original sketch of etcdadm included an upgrade verb. It would be limited to making changes on the host where etcdadm is run. Should we add this as a note here?

support) and kopeio/etcd-manager (to rebase on top of etcdadm).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is targeting net-new usage why support etcd2 here? It seems like etcd3+ would be sufficient for new and future usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For users that are still on etcd2. We're going to strand them in 1.13 otherwise.

* It is unlikely that e.g. GKE / EKS will use etcdadm (at least initially),
which limits the pool of contributors.

### Risks and Mitigations

* Automatic mode may make incorrect decisions and break a cluster. Mitigation:
automated backups, and a willingness to stop and wait for a fix / operator
intervention (CLI mode).
* Automatic mode relies on peer-to-peer discovery and gossiping, which is less
reliable than Raft. Mitigation: rely on Raft as much as possible, be very
conservative in automated operations (favor correctness over availability or
speed). etcd non-voting members will make this much more reliable.

## Graduation Criteria

etcdadm will be considered successful when it is used by the majority of OSS
cluster installations.

## Implementation History

* Much SIG discussion
* Initial proposal to SIG 2018-10-09
* Initial KEP draft 2018-10-22
* Added clarification of cloud interaction 2018-10-23

## Infrastructure Needed

* etcdadm will be a subproject under sig-cluster-lifecycle