Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: [sig-cluster-lifecycle] addons via operators #746

Merged
merged 3 commits into from
Mar 26, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions keps/sig-cluster-lifecycle/0035-20190128-addons-via-operators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
kep-number: 35
title: Addons via Operators
authors:
- "@justinsb"
owning-sig: sig-cluster-lifecycle
reviewers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some reviewers / approvers before we commit this. Maybe something to discuss tomorrow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has not been discussed in the SIG already it should be. The SIG should look at who the approvers are (the chairs?)

The related sigs field should also be filled in. I would suggest SIG Architecture as one of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This was raised to SIG Cluster Lifecycle in today's meeting]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone post a summary of where the SIG is at on this? There's no harm in merging and iterating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can chat about it during our next call.
/cc @justinsb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @mattfarina - I'd like to bring it to sig-architecture when we have something concrete to show them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIG Arch is working to decentralize decision-making:
https://docs.google.com/document/d/1BlmHq5uPyBUDlppYqAAzslVbAO8hilgjqZUTaNXUhKM/edit#bookmark=id.7lj9uyhho0mu

I suggest discussing in SIG Cluster Lifecycle for now, and invite interested parties to attend.

- @luxas
- @roberthbailey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is @roberthbailey still going to be a reviewer for this considering his role change?

- @timothysc
approvers:
- @timothysc
editor: TBD
creation-date: 2019-01-28
last-updated: 2019-03-11
status: provisional
---

# Addons via Operators

## Table of Contents

* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)
* [Infrastructure Needed](#infrastructure-needed)


## Summary

We propose to use operators for managing cluster addons. Each addon will have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I struggle with is: do we need a whole CRD for every addon. Or is kustomize + default bundle of YAML sufficient? I'm guessing this would depend on the addon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRDs are supposed to be cheap, so I'd say we should have one for every addon - we do expect there will be per-addon-settings, and so a CRD-per-addon lets us actually use all this machinery we've invested so much in.

The hope is that we can make developing the controllers as easy as kustomize + default bundle of YAML though!

I'll add a comment to this effect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we'll have e.g. /apis/addons.k8s.io/v1/kubeproxies and /apis/addons.k8s.io/v1/corednses and so on?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (modulo naming and pluralization rules)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it seem a bit weird to do that for singleton addons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really to my mind - can you clarify why it's weird?

I do think we should establish patterns for singletons (e.g. "name the instance default"). But I love the progress made on CRDs and think we should make use of them!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have followed a convention similar to what @justinsb described. the operator basically only respects a single instance of a named cluster scoped CR for certain types of operators (not all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A naming pattern like default works for me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something here, but having a different CRD type for each oddon makes more difficult to create tools for generic operations, like listing all addons with a given status, or watch for the installation of addos? What approaches should be used in that cases?

its own CRD, and users will be able to perform limited tailoring of the addon
(install/don’t install, choose version, primary feature selection) by modifying
the instance of the CRD. The operator encodes any special logic (e.g. dependencies) needed to
install the addon.

By creating a CRD per addon, we are able make use of the kubernetes API
machinery for those per-addon options.

We will create tooling to make it easy to build addon operators that follow the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just upstream this work into controller runtime so that both kubebuilder and OperatorKit can leverage it? Or do you have more specific work in mind i.e something like AddOn(builder/kit)?

best practices we identify as part of this work. For example, we expect that
most addons will be declarative, and likely be specified as part of a "cluster
bundle", so we will make it easy to build basic addon operators that follow
these patterns.

We hope that components will choose to maintain their own operators, encoding
their knowledge of how best to operate their addon.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the scope to also have all these managed k8s providers like gke/eks also move to this addon model ? Its currently a pain to not be able to modify easily the addons provided by default on the various providers . But at the same time, will CRD provide enough configurability or just need a way to expose all these addons to be modifiable by consumers. Every one has their unique requirements on what part of each addon they want to modify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a requirement for anyone to move to the model, but we would like to create something that tooling chooses to move to. Figuring out ways to support a balance of modification and control to enable that is in scope.



## Motivation

Addons are components that are managed alongside the lifecycle of the cluster.
They are often tied to or dependent on the configuration of other cluster
components. Management of these components has proved complicated. Our
existing solution in the form of the bash addon-manager has many known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I like the idea of having add on manager replace the bash scripts. I think it generally improves our cluster turn up story for users of OSS kubernetes that do not have the benefit of leveraging a cloud provider service for their cluster management.

shortcomings and is not widely adopted. As we focus more development outside of
the kubernetes/kubernetes repo, we expect more addon components of greater
complexity. This is one of the long-standing backlog items for
sig-cluster-lifecycle.

Use of operators is now generally accepted, and the benefits to other
applications are generally recognized. We aim to bring the benefits of
operators to addons also.

### Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been working on operator lifecycle manager for a couple of years, and it shares a lot of these same goals:

  • Define extensions to Kubernetes as operators managing CRDs (or extension apiservers)
  • Provide a consistent interface for describing operators and their interactions with the cluster and each other
  • Constrain the scope of where an operator is allowed to operate (e.g. cluster-scoped vs. namespace-scoped, something in between)
  • Prevent end-user privilege escalation via operators
  • Express dependencies between operators and their provided APIs
  • Provide a packaging and installation model that supports offline, air-gapped installs
  • Provide optional, automated updates across operators

There might be some places where OLM isn't perfectly aligned with what is needed for managing addons - but it seems pretty close, and we'd love to jumpstart this work by contributing what we've already done and iterating on it to meet the community's needs.

As a first step, we could try to prototype installing today's addons with OLM and see if there are any gaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course - we'd welcome your collaboration!


* Explore the use of operators for managing addons
* Create patterns, libraries & tooling so that addons are of high quality,
consistent in their API surface (common fields on CRDs, use of Application
CRD, consistent labeling of created resources), yet are easy to build.
* Build addons for the basic set of components, acting as a quality reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does bundles fit under these goals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ideally each addon operator references a set of manifests somewhere, rather than baking them into the image, so we don't always have to update the operator (just when the underlying change is non-trivial).

Those manifests could be in any format really. They could be simple yaml files. I do think we should recommend a format.

If we actually pick bundles as our format, this gives us a bit of metadata when pulling from https, but because bundles are CRDs, this also gives us the ability to fetch those manifests from the cluster rather than relying on some https server, i.e. it's "half" of airgap support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that mean that the manifests themselves are the artifacts and the image should be embedded in them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please not files on a disk that have to be rsync'ed.

Copy link
Member Author

@justinsb justinsb Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow @smarterclayton - manifests do specify an image name, but I'm missing where you're going with that...

@bgrant0607 I don't think we would recommend files on a disk for any production configuration :-)

implementation suitable for production use. We aim also to demonstrate the
utility and explore any challenges, and to verify that the tooling does make
addon-development easy.
Copy link
Contributor

@smarterclayton smarterclayton Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add another goal here:

API review, feature addition, and scope of the project should not drastically increases as a result of this KEP.

Or said another way:

We want the minimal solution which doesn't grow the scope of Kube BUT allows Kube like things to be trivial to add.

The possible APIs for Kube-like components is unbounded and cannot be scoped, so the more API surface we have here the greater the burden on the project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt cause we discussed this recently

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are CRDs though - they aren't all going to be k8s APIs. We would prefer homogeneity, so we would welcome any input from you and @liggitt on what a "recommended" API should look like. We have a prototype where we defined a few common fields, which feels right - better to have version always at spec.version (I think we can agree). But whether it should be spec.version or spec.version.lock or something else entirely we could definitely use help on!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points.

I think they also apply elsewhere, such as the component config effort, exported metrics, Event contents, component log messages, and kubectl commands. We're accruing operational surface area over time.

I think the only way that can be addressed is by enabling the technical oversight of these areas to be distributed, such as by mentoring/shadowing, documenting policies and conventions, writing linters and compatibility tests, etc.



### Non-Goals

* We do not intend to mandate that all installation tools use addon operators;
installation tools are free to choose their own path.
* Management of non-addons is out of scope (for example installation of end-user
applications, or of packaged software that is not an addon)


## Proposal

This is the current plan of action; it is based on experience gathered and work
done for Google’s GKE-on-prem product. However we don’t expect this will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift has been exploring this space as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a summary of your learnings written down?

necessarily be directly applicable in the OSS world and we are open to change as
we discover new requirements.

* Extend kubebuilder & controller-runtime to make it easy to build operators for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the project goals for the operator SDK, bootstrapping new operator projects so that they will be set up to follow best practices. The SDK is based on the controller-runtime project.

addons
* Build addons for the primary addons currently in the cluster/ directory, at
least including those required to bring up a conformant cluster. Proposed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need principles regarding which add-ons belong as part of the Kubernetes project. In the past, the project accepted many contributions, which then later became maintenance burdens. I'd suggest we only support add-ons for components developed as part of the project, and that the owners of those components should also own the reference add-ons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the suggestion. Ideally the "cluster-addons" subproject ends up not "owning" any operators, but rather each addon operator is maintained by the team that writes the addon (on the theory that they know best how to operate their addon).

If we keep the operators sufficiently declarative, we can also make the use of the operator a tooling decision also.

list: CoreDNS, kube-proxy, dashboard, metrics-server, localdns-agent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a number of existing operators in the world for these items.

https://github.com/openshift/cluster-dns-operator
https://github.com/openshift/cluster-monitoring-operator
https://github.com/openshift/cluster-ingress-operator

can we look at the set of operators that have already been developed and enable integration of those pieces? plenty of folks are successful building operators in this space that we should take into account for any design approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll absolutely look at what other people have done in OSS, and it would be great to have you collaborate with us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see why we couldn't incorporate those (with some SWE effort) as part of the swtich over from bash to CRDs.

* Plug in those addons operators into kube-up / cluster-api / kubeadm / kops /
others (subject to those projects being interested)
* Develop at least one addon operator outside of kubernetes/kubernetes
(LocalDNS-Cache?) and figure out how it can be used despite being out-of-tree
* Investigate use of webhooks to prevent accidental mutation of child objects
* Investigate the RBAC story for addons - currently the operator must itself
have all the permissions that the addon needs, which is not really
least-privilege. But it is not clear how to side-step this, nor that any of
the alternatives would be better or more secure.
* Investigate use of patching mechanisms (as seen in `kubectl patch` and
`kustomize`) to support advanced tailoring of addons. The goal here is to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patching might make upgrades tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the underlying structure changes dramatically the patches won't apply. Added a comment about this, and that we'll need some form of error state, and that a good addon won't change structure just for fun!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to patching the addons that the operator is managing (the operand), like patching a Deployment?

In this context, patching opens the door to changes that could make the safety of upgrades unpredictable for the operator. Can they patch env vars, attached volumes, or command line arguments for the binary being run in the container? Will you limit what they can / can't patch?

The CRD is the safe API contract. Once the admin strays outside of that and starts patching the generated resources, the operator can only offer a best effort attempt to upgrade. What if a command line flag on the binary needs to be deprecated or changed? This isn't out of the question and has happened before with kube binaries. If the operator is in control and this feature is defined at the API level, then rollout of this flag change can be done smoothly by the operator. If an admin instead has patched the Deployment to use flags that the operator isn't managing, when the new version tries to roll out and the patch is reapplied with an invalid flag, the new pods are going to crashloop. This wouldn't have been caught at the patching step, the patch itself was not invalid, it wouldn't be caught until the middle of the upgrade when things start to fail.

make sure that everyone can use the addon operators, even if they “love it but
just need to change one thing”. This ensures that the addon operators
themselves can remain bounded in scope and complexity. Patching will fail if
the underlying addon changes dramatically, so we'll likely have a "patch
incompatible with new version" error - and generally addons should avoid
gratuitously changing their structure.
* We should develop a convention (labels / owner-refs) so that we are able to
discover which CRDs are cluster-addons, and there is no confusion with
application operators.

We expect the following functionality to be common to all operators for addons:

* A CRD per addon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to see example(s) for common core ones, e.g. proxy.

Copy link
Member

@errordeveloper errordeveloper Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some concrete examples would help a lot. I think there are a few classes of add-ons and each of the classes has certain characteristics that are are shared within the class.

Here is what I've been thinking about the classification:

One type of addons are the components of Kubernetes that are strongly coupled with the Kubernetes releases, such as kube-dns/coredns, kube-proxy. Beside version coupling, you normally wouldn't want to run multiples of these running at the same time.

A networking component is usually less strongly coupled to Kubernetes version, but it is more critical to cluster functionality. There is also not standard way of running multiple different networks.

Secondly, there are kinds of add-ons that provide functionality that is rather critical to a user who runs workloads on the given cluster, ingress controllers and storage drivers would fall into this category. Some of these things can be seen as operators.

There is also the class of add-ons that extends functionality even further, and are often a combination of components which may or may not be defined as standalone add-ons from one of the above categories also. Istio and knative would fall into this category.

Additionally, there is a class of workloads that depends on versions of Kubernetes, but doesn't provide additional functionality to the cluster directly, observability products, dashboards, deployment tools and things like security scanners will fall into this category.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example.

@errordeveloper you raise some good categorizations - some of these are going to be "apps". For example the nginx-ingress-controller isn't really tied to the k8s cluster lifecycle. We would expect those to be deployed using tools like helm or kustomize or kubectl.

I do think we should figure out what to do about singleton objects. I've been called them "default" in "kube-system", but there are many approaches and even I only passionately hate about half of them ;-)

I don't really know how to handle "pick one of a set" type operators (i.e. CNI). Operators do let us have better cleanup procedures, but realistically it's highly disruptive to switch. kops started labelling the addons so that we could prune the other ones, but most CNI providers leave some networking plumbing behind (e.g. iptables). We can't easily do a rolling-update in most cases because the network partitions between old-CNI and new-CNI. There are usually supporting infrastructure things e.g. opening firewall rules for IPIP for calico. My preference would be that we can set up the operators to do all they can, but realistically if we get a nice migration path for any CNI providers it'll be because the tooling put in special code to move between a particular transition. We at least have somewhere to put that code now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example the nginx-ingress-controller isn't really tied to the k8s cluster lifecycle.

AWS ALB ingress controller would only ever work in AWS and needs IAM permissions, so it's tied to a few properties. When it comes to lifecycle and Cluster API, provisioning of such IAM permissions becomes a lifecycle concern.

GKE has built-in ingress controller, and that's currently something GKE user selects as an add-on and it is managed by the add-on manager. This one is actually tied into lifecycle as it gets included during cluster creation, and cannot be removed easily as far as I know.

Also, Heptio Contour ships a CRD, and that has versions dependencies.

I agree that neither of these are tied into cluster lifecycle, but they have dependencies on cluster APIs, and provide vita functionality for many users.

We would expect those to be deployed using tools like helm or kustomize or kubectl.

Even if we are to say that some things like ingress controllers are seen as "apps with deep Kubernetes integration" and in some case dependencies on cloud resources associated with the cluster, how do we expect users to make this distinctions and find the right way of managing such things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network partitions between old-CNI and new-CNI

Not the case with Weave Net, and I don't know when it is practically the case. I would expect any well-designed CNI addon to behave well.
If it is the case, I'd say it's addon-specific, and certainly justifies the need for an addon-specific operator.

There are usually supporting infrastructure things

Seems similar to IAM dependencies problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that IAM provisioning likely brings it into cluster-scope; I don't think an ingress controller or a something like kube2iam are necessarily deal-breakers there.

I would like to avoid "deploying a CRD => cluster-lifecycle". I know there are versioning concerns, but I would hope we could come up with some reasonable patterns such that in practice helm or something could manage an app that includes CRDs. i.e. although technically there are problems with incompatible CRDs, by being careful about versions we can avoid them.

I think for users, we would expect their installation tooling (kubeadm, cluster-api etc) to include or reference a small set of cluster-addons. and for users to follow that lead. And I expect we'll naturally expect kubernetes component developers to identify whether they are really a cluster-addon or an app, and there will be natural pushback against inclusion from the installation tooling if they choose cluster-addon when they are really an app.

If the CNI providers are in fact capable of being switched, that's much easier! We can probably start with simple docs for how to turn on Weave and turn off Calico, and I'm sure Weave will write those docs, just as I'm sure Calico will write the doc in the other direction!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who reviews and approves the CRD APIs for these operators? I.e. if kube owns "official DNS installation API", do the core Kubernetes API approvers have veto power over what features DNS can support? How do we determine what features are allowed to be added to the "official DNS installation API"?

We have a lot of experience with the shortcomings of domain-specific APIs like ingress, service load balancer, storage where multiple different implementations can have wildly different feature sets. What happens when the N+1 feature addition to DNS API gets NACKed by the maintainer of the "official DNS installation API"? Does someone create "unofficial DNS installation API"? Do we support random extensions? Do we pick a very hardline and encourage people to fork and write their own operators?

I like the idea of a simple DNS config API. I don't like the idea of the DNS config API becoming just another ingress API problem. I don't see how these APIs won't fall into that trap.

@bgrant0607

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternate approach would be:

Someone goes and writes a damn good coredns operator that accepts lots of features. It's not part of the core project, but we make it really easy to install that operator. No API review in core is needed. Scope of Kube doesn't increase.

Since operators are controllers the best operators will be a simple deployment, rbac rules, and maybe a secret. Do we need to go beyond that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to distinguish the types of addons?

Several of the examples, such as kube-proxy and metrics server (https://github.com/kubernetes-incubator/metrics-server), are components developed as part of the Kubernetes project. Is there a reason why it wouldn't make sense for the owners of those components to also own the operator CRDs for those components?

I'd assume SIG Network would own the DNS-related operators.

On the simple DNS API: Offhand, I'd split the common parameters into the Ingress-like implementation-independent API (or just additional fields on some object[s]), and leave implementation-specific details in the operator CRDs. I haven't seriously looked at what such a split would look like, but candidate fields include those used to generate the default /etc/resolve.conf for containers and those documented as part of kube-dns and CoreDNS configuration:
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think we have an "official DNS installation API" today?

* Common fields in spec that define the version and/or channel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a channel? Maintained and supported by the community?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A channel is what we're calling a stream of versions. Better names welcome, but I think channel is a good name that wise people wearing red hats use...

I think it would be up to a project e.g. CoreDNS whether to support a channel for their operator and how they would define it. But again, suggestions & recommendations are welcome.

* Common fields in status that expose the current health & version information
of the addon
* Addons follow a common structure, with an instance of the CRD as root object,
an Application CRD instance, consistent labels of all objects
* Some form of protection or rapid reconciliation to prevent accidental
modification of child objects
* Operators are declaratively driven, and can source manifests via https
(including mirrors), or from data stored in the cluster itself
(e.g. configmaps or cluster-bundle CRD, useful for airgapped)
* Operators are able to expose different update behaviours: automatic immediate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh automatic updates...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I would recommend it for prod clusters, but totally automated updates of e.g. CoreDNS would be a cool demo!

updates; notification of update-available in status; purely manual updates
* Operators are able to observe other CRDs to perform basic sequencing
* Addon manifests are able express an operator minimum version requirement, so
that an addon with new requirements can require that the operator be updated
first
* Airgapped operation should be possible by combining a registry mirror and
storage of the underlying manifest in the cluster itself.


An example can make this easier to understand, here is what a CRD instance for
kube-proxy might look like:

```yaml
apiVersion: addons.sigs.k8s.io/v1alpha1
kind: KubeProxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm a network plugin, am I required to respect this if installed (even though calico / others might provide kube-proxy function natively)? How would a third party network plugin indicate they are ignoring this operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the operator should be required to, no. These are declarations of user intent, and so if the user intent is unsatisfiable I believe the common pattern is to expose it under status if we know it won't work, or - often - simply not work correctly. kube-router might choose to check for kube-proxy and surface a status, for a better UX, but I don't think we would require that.

metadata:
name: default
namespace: kube-system
spec:
clusterCidr: 100.64.0.0/10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm guessing use of bundling allows you to customize this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; other approaches include the operator discovering it by looking for a cluster-api Cluster object, or by having cluster-api create this KubeProxy instance with clusterCidr set. The latter feels more straightforward, and I do like that it is explicit vs magic defaulting, but this might actually vary for different use-cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP is proposing a subproject, it would be good to understand what the dependency orderings are for the project.

Is this orthogonal to the Cluster API?
Is it additive to any conformant Kubernetes cluster?
Can we get a clear statement on intended layering across the various subprojects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KEP implies it’s open for adoption by a plurality of tooling, but the comments make that less clear. Ideally the operator can discover it’s required context information independent of installation tooling choice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr the above is really just my speculatively enumerating some of the approaches an operator could take to get a value (clusterCidr). We do a good job of covering all the tooling in sig-cluster-lifecycle (mostly due to the eternal vigilance of @timothysc ), and so if you're able to attend the meeting then we can address your questions there I think - I'm not entirely sure what you mean by some of them. Even better, it would be great to have your participation in building the right thing!

Copy link
Member

@derekwaynecarr derekwaynecarr Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinsb thanks for clarifying. I asked the question if this was decomposed as the cluster api subproject meeting yesterday was also discussing composition of its APIs moving forward. I think it’s right to handle this as a separate subproject and look forward to collaborating.

version: 1.14.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who decides what versions can be used here? Can this drift from Kube versions, or is it required to be aligned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something like a Calico network plugin able to dictate that they only work with certain versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the addon itself will define its own numbering scheme. For the particular example of kube-proxy, today it is locked to k8s versions, but that isn't always going to be the case. CoreDNS & calico define their own versioning scheme - kube-proxy is more of a special case here.

I don't think we want to dictate that a calico operator define that it only work with some e.g. k8s versions. I suspect we'll come up with some recommendations for what an operator should do if it is running outside of the tested versions (e.g. surface a warning in status?) or running with a known-incompatible version (e.g. refuse to update & surface an error condition in status?).

What do you think we should do @smarterclayton ? What did OpenShift end up doing here?

status:
healthy: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things that status would have to represent that aren't covered here:

  1. dimensions of health - conditions, message, structured information for a consumer like Deployment
  2. whether the current version has been rolled out completely and if not, how long before it is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - we plan on developing these together as a community project. It would be great to have your help / input from people that worked on operators in OpenShift!

```

This particular manifest is pinned to `version: 1.14.4`. We could also
subscribe to a stream of updates with a field like `channel: stable`.

### Risks and Mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test and verification for a release is also super important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - added a paragraph specifically for that. I think if we get to parity with the "embedded manifest" approach that we know and love, that'll help initially. And I think we unlock approaches that are a little more flexible than embedding manifests, but I don't think we need to dictate anything there.


This will involve running a large number of new controllers. This will require
more resources; we can mitigate this by combining them into a single binary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect operator owners will want to lifecycle and release new versions of their operators independently from each other. Especially for operators that aren't tied to a particular kubernetes release.

Combining into a single operator binary for all addons also makes your RBAC concerns worse, now there is one operator with access to do everything all of the operands can do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair point. I'm also unhappy about the way RBAC permissions "trickle down", where an operator needs to have all the permissions of the addon - I hope we can investigate ways to address that. Is there anything we can learn from from Openshift's work here?

(similar to kube-controller-manager).

Automatically updating addons could result in new SPOFs, we can mitigate this
through mirroring (including support for air-gapped mirrors).

Providing a good set of addons could result in a monoculture where mistakes
affect most/all kubernetes clusters (even if we don’t mandate adoption, if we
succeed we hope for widespread adoption). We can continue with our strategies
that we use for core components such as kube-apiserver: primarily we must keep
the notion of stable vs less-stable releases, to stagger the risk of a bad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have the notion of stable vs less stable releases today. Individual components, like Core DNS, have become the default once they are considered GA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I'm saying that we would expect that addon X might continue to support 1.1 for a while after introducing 1.2 to avoid the monoculture risk. They might choose to leverage channels for this, for example choosing to define a "stable" channel that stays on 1.1 for a while, while the "beta" channel moves to 1.2 until they feel it has been derisked.

Today this work is all on the kubernetes test & release teams, which feels very centralized.

rollout. We must also consider this a trade-off against the risk that without
coordination each piece of tooling must reinvent the wheel; we expect more
mistakes (even measured per cluster) in that scenario.

Test and release may become more complicated because of fragmentation across
repos. Mitigation: be disciplined about versioning of operators and addons and
encourage installation tooling to pin to a particular version of both for a
particular release. We need to set up automated builds (with CI) for rapid
releases so installation tooling is not blocked waiting for operator releases.
We need to set up a mechanism so that addons can be updated without requiring an
operator update. With this, if tooling is able to pin to particular addon
versions, that should be at parity with the "embedded manifest" approach that is
widely used currently. (We hope to enable usage that is less lock-step, but
that itself will likely require new approaches for testing and release)

## Success Criteria

We will succeed if addon operators are:

* Used: addon operators are adopted by the majority of cluster installation
tooling
* Useful: users are generally satisfied with the functionality of addon
operators and are not trying to work around them, or making lots of proposals /
PRs to extend them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great success criteria and the one I'm skeptical about in the current scope.

Would we do an ingress controller operator? Which ingress controllers would it support? What options on those ingress controllers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these all have operators, I was imagining there would be an nginx-ingress operator, and a distinct gclb-ingress operator, aws-alb operator etc. Ideally component projects end up owning their operators.

I was not imagining we would try here to define a generic ingress operator that can install arbitrary ingress controllers - it doesn't feel very compatible with the idea of federating responsibility & of self-determination. It also feels like that would be a sig-network project.

* Ubiquitous: the majority of components include an operator
* Federated: the components maintain their own operators, encoding their
knowledge of how best to run their addon.

## Graduation Criteria

alpha: addon-operators are used to manage kube-proxy & CoreDNS:
* in kube-up
* in kubeadm (at least as an option)
* in kops
* in cluster-api-provider-aws & cluster-api-provider-gcp
* adoption is documented for use by other tooling / self-managed clusters

(post-alpha criteria will be added post-alpha)

## Implementation History

[Addon Operator session](https://www.youtube.com/watch?v=LPejvfBR5_w) given by jrjohnson & justinsb at Kubecon NA - Dec 2018
KEP created - Jan 29 2019

## Infrastructure Needed

Initial development of the tooling can probably take place as part of
kubebuilder

We should likely create a repo for holding the operators themselves. Eventually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to have one repo per operator as opposed to a single repo for all operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already concerned today with the fragmentation that exists in the CAP*, and I think this would provide yet another degree of freedom where I'd really like to bake the defaults into a single canonical location.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @detiber A single repo for all operators doesn't allow the owners of the addons to build their operators where and how they want. If we know of addon operators that don't need to cycle exactly with a particular kubernetes release, a single repo doesn't seem like the right approach to enabling that scenario.

Define the API contract an operator must fulfill (status/condition reporting, how the operator says if it has rolled out the expected version, etc). Have tooling to make it easy for an operator to bootstrap a project to conform to that API contract.

we would hope these would migrate to the various addon components, so we could
also just store these under e.g. cluster-api.

We are requesting to be a subproject under sig-cluster-lifecycle.