From 1c1dbae926cc58df03c76237b0d75197e39e5991 Mon Sep 17 00:00:00 2001 From: Justin SB Date: Mon, 28 Jan 2019 18:52:27 -0500 Subject: [PATCH 1/3] KEP: [sig-cluster-lifecycle] addons via operators --- .../0035-20190128-addons-via-operators.md | 181 ++++++++++++++++++ 1 file changed, 181 insertions(+) create mode 100644 keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md diff --git a/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md new file mode 100644 index 00000000000..8920ec785a0 --- /dev/null +++ b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md @@ -0,0 +1,181 @@ +KEP: Addons via Operators + +--- +kep-number: 35 +title: Addons via Operators +authors: + - "@justinsb" +owning-sig: sig-cluster-lifecycle +reviewers: + - TBD +approvers: + - TBD +editor: TBD +creation-date: 2019-01-28 +last-updated: 2019-01-28 +status: provisional +--- + +# Addons via Operators + +## Table of Contents + +* [Table of Contents](#table-of-contents) +* [Summary](#summary) +* [Motivation](#motivation) + * [Goals](#goals) + * [Non-Goals](#non-goals) +* [Proposal](#proposal) + * [Risks and Mitigations](#risks-and-mitigations) +* [Graduation Criteria](#graduation-criteria) +* [Implementation History](#implementation-history) +* [Infrastructure Needed](#infrastructure-needed) + + +## Summary + +We propose to use operators for managing cluster addons. Each addon will have +its own CRD, and users will be able to perform limited tailoring of the addon +(install/don’t install, choose version, primary feature selection) by modifying +the CR. The operator encodes any special logic (e.g. dependencies) needed to +install the addon. + +We will create tooling to make it easy to build addon operators that follow the +best practices we identify as part of this work. For example, we expect that +most addons will be declarative, and likely be specified as part of a “cluster +bundle”, so we will make it easy to build basic addon operators that follow +these patterns. + +We hope that components will choose to maintain their own operators, encoding +their knowledge of how best to operate their addon. + + +## Motivation + +Addons are components that are managed alongside the lifecycle of the cluster. +They are often tied to or dependent on the configuration of other cluster +components. Management of these components has proved complicated. Our +existing solution in the form of the bash addon-manager has many known +shortcomings and is not widely adopted. As we focus more development outside of +the kubernetes/kubernetes repo, we expect more addon components of greater +complexity. This is one of the long-standing backlog items for +sig-cluster-lifecycle. + +Use of operators is now generally accepted, and the benefits to other +applications are generally recognized. We aim to bring the benefits of +operators to addons also. + +### Goals + +* Explore the use of operators for managing addons +* Create patterns, libraries & tooling so that addons are of high quality, + consistent in their API surface (common fields on CRDs, use of Application + CRD, consistent labeling of created resources), yet are easy to build. +* Build addons for the basic set of components, acting as a quality reference + implementation suitable for production use. We aim also to demonstrate the + utility and explore any challenges, and to verify that the tooling does make + addon-development easy. + + +### Non-Goals + +* We do not intend to mandate that all installation tools use addon operators; + installation tools are free to choose their own path. +* Management of non-addons is out of scope (for example installation of end-user + applications, or of packaged software that is not an addon) + + +## Proposal + +This is the current plan of action; it is based on experience gathered and work +done for Google’s GKE-on-prem product. However we don’t expect this will +necessarily be directly applicable in the OSS world and we are open to change as +we discover new requirements. + +* Extend kubebuilder & controller-runtime to make it easy to build operators for + addons +* Build addons for the primary addons currently in the cluster/ directory +* Plug in those addons operators into kube-up / cluster-api / kubeadm / kops / + others (subject to those projects being interested) +* Develop at least one addon operator outside of kubernetes/kubernetes + (LocalDNS-Cache?) and figure out how it can be used despite being out-of-tree +* Investigate use of webhooks to prevent accidental mutation of child objects +* Investigate the RBAC story for addons - currently the operator must itself + have all the permissions that the addon needs, which is not really + least-privilege. But it is not clear how to side-step this, nor that any of + the alternatives would be better or more secure. +* Investigate use of patching mechanisms (as seen in `kubectl patch` and + `kustomize`) to support advanced tailoring of addons. The goal here is to + make sure that everyone can use the addon operators, even if they “love it but + just need to change one thing”. This ensures that the addon operators + themselves can remain bounded in scope and complexity. + + +We expect the following functionality to be common to all operators for addons: + +* A CRD per addon +* Common fields in spec that define the version and/or channel +* Common fields in status that expose the current health & version information + of the addon +* Addons follow a common structure, with the CR as root object, an Application + CR, consistent labels of all objects +* Some form of protection or rapid reconciliation to prevent accidental + modification of child objects +* Operators are declaratively driven, and can source manifests via https + (including mirrors), or from data stored in the cluster itself + (e.g. configmaps or cluster-bundle CRD, useful for airgapped) +* Operators are able to expose different update behaviours: automatic immediate + updates; notification of update-available in status; purely manual updates +* Operators are able to observe other CRs to perform basic sequencing +* Addon manifests are able express an operator minimum version requirement, so + that an addon with new requirements can require that the operator be updated + first + + +### Risks and Mitigations + +This will involve running a large number of new controllers. This will require +more resources; we can mitigate this by combining them into a single binary +(similar to kube-controller-manager). + +Automatically updating addons could result in new SPOFs, we can mitigate this +through mirroring (including support for air-gapped mirrors). + +Providing a good set of addons could result in a monoculture where mistakes +affect most/all kubernetes clusters (even if we don’t mandate adoption, if we +succeed we hope for widespread adoption). We can continue with our strategies +that we use for core components such as kube-apiserver: primarily we must keep +the notion of stable vs less-stable releases, to stagger the risk of a bad +rollout. We must also consider this a trade-off against the risk that without +coordination each piece of tooling must reinvent the wheel; we expect more +mistakes (even measured per cluster) in that scenario. + +## Graduation Criteria + +We will succeed if addon operators are: + +* Used: addon operators are adopted by the majority of cluster installation +tooling +* Useful: users are generally satisfied with the functionality of addon +operators and are not trying to work around them, or making lots of proposals / +PRs to extend them +* Ubiquitous: the majority of components include an operator +* Federated: the components maintain their own operators, encoding their +knowledge of how best to run their addon. + + +## Implementation History + +Addon Operator session given by jrjohnson & justinsb at Kubecon NA - Dec 2018 +KEP created - Jan 29 2019 + +## Infrastructure Needed + +Initial development of the tooling can probably take place as part of +kubebuilder + +We should likely create a repo for holding the operators themselves. Eventually +we would hope these would migrate to the various addon components, so we could +also just store these under e.g. cluster-api. + +Unclear whether this should be a subproject? From 48d85ea3fe7e1a8ee4c146459b5d27b5e2bec4a3 Mon Sep 17 00:00:00 2001 From: Justin SB Date: Mon, 11 Mar 2019 14:34:58 -0400 Subject: [PATCH 2/3] Update addon-operator KEP per feedback --- .../0035-20190128-addons-via-operators.md | 65 +++++++++++++++---- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md index 8920ec785a0..93dbfa620ce 100644 --- a/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md +++ b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md @@ -1,5 +1,3 @@ -KEP: Addons via Operators - --- kep-number: 35 title: Addons via Operators @@ -7,12 +5,14 @@ authors: - "@justinsb" owning-sig: sig-cluster-lifecycle reviewers: - - TBD + - @luxas + - @roberthbailey + - @timothysc approvers: - - TBD + - @timothysc editor: TBD creation-date: 2019-01-28 -last-updated: 2019-01-28 +last-updated: 2019-03-11 status: provisional --- @@ -40,10 +40,13 @@ its own CRD, and users will be able to perform limited tailoring of the addon the CR. The operator encodes any special logic (e.g. dependencies) needed to install the addon. +By creating a CRD per addon, we are able make use of the kubernetes API +machinery for those per-addon options. + We will create tooling to make it easy to build addon operators that follow the best practices we identify as part of this work. For example, we expect that -most addons will be declarative, and likely be specified as part of a “cluster -bundle”, so we will make it easy to build basic addon operators that follow +most addons will be declarative, and likely be specified as part of a "cluster +bundle", so we will make it easy to build basic addon operators that follow these patterns. We hope that components will choose to maintain their own operators, encoding @@ -94,7 +97,9 @@ we discover new requirements. * Extend kubebuilder & controller-runtime to make it easy to build operators for addons -* Build addons for the primary addons currently in the cluster/ directory +* Build addons for the primary addons currently in the cluster/ directory, at + least including those required to bring up a conformant cluster. Proposed + list: CoreDNS, kube-proxy, dashboard, metrics-server, localdns-agent. * Plug in those addons operators into kube-up / cluster-api / kubeadm / kops / others (subject to those projects being interested) * Develop at least one addon operator outside of kubernetes/kubernetes @@ -108,8 +113,13 @@ we discover new requirements. `kustomize`) to support advanced tailoring of addons. The goal here is to make sure that everyone can use the addon operators, even if they “love it but just need to change one thing”. This ensures that the addon operators - themselves can remain bounded in scope and complexity. - + themselves can remain bounded in scope and complexity. Patching will fail if + the underlying addon changes dramatically, so we'll likely have a "patch + incompatible with new version" error - and generally addons should avoid + gratuitously changing their structure. +* We should develop a convention (labels / owner-refs) so that we are able to + discover which CRDs are cluster-addons, and there is no confusion with + application operators. We expect the following functionality to be common to all operators for addons: @@ -130,7 +140,28 @@ We expect the following functionality to be common to all operators for addons: * Addon manifests are able express an operator minimum version requirement, so that an addon with new requirements can require that the operator be updated first +* Airgapped operation should be possible by combining a registry mirror and + storage of the underlying manifest in the cluster itself. + + +An example can make this easier to understand, here is what a CRD instance for +kube-proxy might look like: +```yaml +apiVersion: addons.sigs.k8s.io/v1alpha1 +kind: KubeProxy +metadata: + name: default + namespace: kube-system +spec: + clusterCidr: 100.64.0.0/10 + version: 1.14.4 +status: + healthy: true +``` + +This particular manifest is pinned to `version: 1.14.4`. We could also +subscribe to a stream of updates with a field like `channel: stable`. ### Risks and Mitigations @@ -150,7 +181,7 @@ rollout. We must also consider this a trade-off against the risk that without coordination each piece of tooling must reinvent the wheel; we expect more mistakes (even measured per cluster) in that scenario. -## Graduation Criteria +## Success Criteria We will succeed if addon operators are: @@ -163,10 +194,20 @@ PRs to extend them * Federated: the components maintain their own operators, encoding their knowledge of how best to run their addon. +## Graduation Criteria + +alpha: addon-operators are used to manage kube-proxy & CoreDNS: +* in kube-up +* in kubeadm (at least as an option) +* in kops +* in cluster-api-provider-aws & cluster-api-provider-gcp +* adoption is documented for use by other tooling / self-managed clusters + +(post-alpha criteria will be added post-alpha) ## Implementation History -Addon Operator session given by jrjohnson & justinsb at Kubecon NA - Dec 2018 +[Addon Operator session](https://www.youtube.com/watch?v=LPejvfBR5_w) given by jrjohnson & justinsb at Kubecon NA - Dec 2018 KEP created - Jan 29 2019 ## Infrastructure Needed From d487676b28a7575441af1c7cba1abb49105a637c Mon Sep 17 00:00:00 2001 From: Justin SB Date: Mon, 18 Mar 2019 08:29:26 -0700 Subject: [PATCH 3/3] Updated per review feedback --- .../0035-20190128-addons-via-operators.md | 21 ++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md index 93dbfa620ce..d3652da8bb7 100644 --- a/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md +++ b/keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md @@ -37,7 +37,7 @@ status: provisional We propose to use operators for managing cluster addons. Each addon will have its own CRD, and users will be able to perform limited tailoring of the addon (install/don’t install, choose version, primary feature selection) by modifying -the CR. The operator encodes any special logic (e.g. dependencies) needed to +the instance of the CRD. The operator encodes any special logic (e.g. dependencies) needed to install the addon. By creating a CRD per addon, we are able make use of the kubernetes API @@ -127,8 +127,8 @@ We expect the following functionality to be common to all operators for addons: * Common fields in spec that define the version and/or channel * Common fields in status that expose the current health & version information of the addon -* Addons follow a common structure, with the CR as root object, an Application - CR, consistent labels of all objects +* Addons follow a common structure, with an instance of the CRD as root object, + an Application CRD instance, consistent labels of all objects * Some form of protection or rapid reconciliation to prevent accidental modification of child objects * Operators are declaratively driven, and can source manifests via https @@ -136,7 +136,7 @@ We expect the following functionality to be common to all operators for addons: (e.g. configmaps or cluster-bundle CRD, useful for airgapped) * Operators are able to expose different update behaviours: automatic immediate updates; notification of update-available in status; purely manual updates -* Operators are able to observe other CRs to perform basic sequencing +* Operators are able to observe other CRDs to perform basic sequencing * Addon manifests are able express an operator minimum version requirement, so that an addon with new requirements can require that the operator be updated first @@ -181,6 +181,17 @@ rollout. We must also consider this a trade-off against the risk that without coordination each piece of tooling must reinvent the wheel; we expect more mistakes (even measured per cluster) in that scenario. +Test and release may become more complicated because of fragmentation across +repos. Mitigation: be disciplined about versioning of operators and addons and +encourage installation tooling to pin to a particular version of both for a +particular release. We need to set up automated builds (with CI) for rapid +releases so installation tooling is not blocked waiting for operator releases. +We need to set up a mechanism so that addons can be updated without requiring an +operator update. With this, if tooling is able to pin to particular addon +versions, that should be at parity with the "embedded manifest" approach that is +widely used currently. (We hope to enable usage that is less lock-step, but +that itself will likely require new approaches for testing and release) + ## Success Criteria We will succeed if addon operators are: @@ -219,4 +230,4 @@ We should likely create a repo for holding the operators themselves. Eventually we would hope these would migrate to the various addon components, so we could also just store these under e.g. cluster-api. -Unclear whether this should be a subproject? +We are requesting to be a subproject under sig-cluster-lifecycle.