From 252734be507a69a0e6ec771780dec2e1072832c4 Mon Sep 17 00:00:00 2001 From: Ryan Zhang Date: Mon, 18 May 2020 11:05:42 -0700 Subject: [PATCH] add trait workload design --- ...er-trait-workload-interaction-mechanism.md | 204 ++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 design/one-pager-trait-workload-interaction-mechanism.md diff --git a/design/one-pager-trait-workload-interaction-mechanism.md b/design/one-pager-trait-workload-interaction-mechanism.md new file mode 100644 index 00000000..64ae9522 --- /dev/null +++ b/design/one-pager-trait-workload-interaction-mechanism.md @@ -0,0 +1,204 @@ +# Traits and workloads interaction mechanism in OAM + +* Owner: Ryan Zhang (@ryanzhang-oss) +* Reviewers: Crossplane Maintainers +* Status: Draft + +## Terminology + +* **CRD (Custom Resource Definition)** : A standard Kubernetes Custom Resource Definition +* **CR (Custom Resource)** : An instance of a Kubernetes type that was defined using a CRD +* **GVK (Group Version Kind)** : The API Group, Version, and Kind for a type of Kubernetes + resource (including CRDs) +* **Workload child resources** : The Kubernetes resources generated by a workload controller. They + should all have a controller reference pointing to the parent workload instance. + +## Background +Traits and workloads are two major types of resources in OAM. Traits usually affect how a +kubernetes resource operate either directly (through spec change) or indirectly (add ingress or +sidecar). However, the current OAM implementation does not contain a generic mechanism for traits + to locate the corresponding resource to modify. + +We will use the following hypothetical OAM application as the baseline to illustrate the problem + and our solution. + +```yaml +apiVersion: core.oam.dev/v1alpha2 +kind: WorkloadDefinition +metadata: +name: mydbs.standard.oam.dev +spec: +definitionRef: + name: mydbs.standard.oam.dev +--- +apiVersion: core.oam.dev/v1alpha2 +kind: TraitDefinition +metadata: +name: manualscalertraits.core.oam.dev +spec: +definitionRef: + name: manualscalertraits.core.oam.dev +--- +apiVersion: core.oam.dev/v1alpha2 +kind: Component +metadata: +name: example-db +spec: +workload: + apiVersion: standard.oam.dev/v1alpha2 + kind: Mydb + metadata: + name: mydb-example + spec: + containers: + - name: mysql + image: mysql:latest +--- +apiVersion: core.oam.dev/v1alpha2 +kind: ApplicationConfiguration +metadata: +name: example-appconfig +spec: +components: + - componentName: example-db + traits: + - trait: + apiVersion: core.oam.dev/v1alpha2 + kind: ManualScalerTrait + metadata: + name: example-appconfig-trait + spec: + replicaCount: 3 +``` + +The problem is two folds +1. A trait controller needs a way to find the workload that it is applied to. + - In the example, the manual scalar trait needs to know that it is supposed to scale the + example-db workload. However, we want to keep the applicationConfiguration controller + agnostic to the schema of any `trait` or `workload` it generates to make it extensible. + Thus, the applicationConfiguration controller needs to emit a `ManualScalerTrait` CR that + contains a reference to the `example-db` workload without knowing the trait's specific schema. + +2. A trait controller needs to know the exact resources it should modify. Note that these +resources are most likely not the workload itself. + - Use the same example, just knowing the `example-db` workload is not enough for the + `ManualScalerTrait` to work. The trait controller does not work with the `example-db` workload + directly. It needs to find the actual Kubernetes resources that the `example-db` + workload generates and then it can modify the `replica` field in its spec. + +## Goals +In order to maximize the extensibility of our OAM implementation, our solution need to meet the + following two design objections. +1. **Extensible trait system**: We want to allow a `trait` to apply to any eligible `workload` +instead of just a list of specific ones. This means that we want to empower a trait developer to +write the controller code once, and it will work for any new `workload` that this `trait` can +apply to in the future. + - Using the example again, the `ManualScalerTrait` should work with any workload that + generates a Kubernetes resource that has a `replica` field in its spec even if the + workload does not exist when the `ManualScalerTrait` is implemented. +2. **Adopting existing CRDs**: The mechanism cannot put any limit on the `trait` or `workload +` CRDs. This means that we cannot assume any pre-defined CRD fields in any `trait` or `workload +` beyond Kubernetes conventions (i.e. spec or status). + - For example, the following `EtcdBackup` operator can be used as a `trait` in an OAM + application to apply to an `EtcdCluster` workload. Here, the `etcdEndPoints` field in + the trait signals to which `workload` it applies, and we need to accommodate + this type of `trait`. + ```yaml + apiVersion: "etcd.database.coreos.com/v1beta2" + kind: "EtcdBackup" + metadata: + name: example-etcd-cluster-backup + spec: + etcdEndpoints: [] + storageType: S3 + s3: + path: + awsSecret: + ``` + + +## Proposal +The overall idea is for the applicationConfiguration controller to fill critical information +in the workload and trait CR it emits. In addition, we will provide a helper library so that +trait controller developers can locate the resources they need with a simple function call. +Here is the list of changes that we propose. +1. ApplicationConfig controller no longer assumes that all `trait` CRDs contain a "spec +.workloadRef" field conforms to the OAM definition. It only fills the workload GVK to a `trait +` CR if its CRD has a "spec.workloadRef" field defined as below. + ```yaml + workloadRef: + properties: + apiVersion: + type: string + kind: + type: string + name: + type: string + required: + - apiVersion + - kind + - name + type: object + ``` +2. Add a `childResourceKinds` field in the WorkloadDefinition. +Currently, a workloadDefinition is nothing but a shim of a real workload CRD. We propose to add +an **optional** field called `childResourceKinds` to the schema of the workloadDefinition. We encourage +workload owners to fill in this field when they register their controllers to the OAM system. +This is the way for them to declare the types of the Kubernetes resources their workload +controller actually generates. In our example, the workload definition can claim to generate +deployment and service child resources. + ```yaml + apiVersion: core.oam.dev/v1alpha2 + kind: WorkloadDefinition + metadata: + name: mydb.standard.oam.dev + spec: + definitionRef: + name: mydb.standard.oam.dev + childResourceKinds: + - apiVersion: apps/v1 + kind: Deployment + - apiVersion: v1 + kind: Service + ``` +3. OAM runtime will provide a helper library. The library follows the following logic to help a + trait developer locate the resources for the trait to modify. + 1. Get the corresponding `workload` instance from the Kubernetes cluster with the information + inserted by the application controller in the `trait` CR. + 2. Fetch the corresponding `workloadDefinition` CR following an + [OAM convention](https://github.com/oam-dev/spec/blob/master/3.workload.md#definitionref). + The convention requires that the name of the `workloadDefinition` CR is the name of the + `workload` CRD it refers to. For example, the name of the `workloadDefinition` CR + that refers to a `containerizedworkloads.core.oam.dev` CRD is exactly + `containerizedworkloads.core.oam.dev` as well. + 3. Fetch all the `childResourceKinds` values in the corresponding`workloadDefinition` instance. + 4. List each child resource by its GVK and filter by owner reference. Here, we assume that + all the child resources that the workload controller generates have an controller reference + field pointing back to the workload instance. + +## Impact to the existing system +Here are the impacts of this mechanism to the existing OAM components +- ApplicationConfiguration: This mechanism requires minimum changes in the + applicationConfiguration controller. +- Workload: This mechanism does not affect workload controller implementation. +- Trait: This mechanism is optional so all existing trait controller still works. This mechanism +requires modification to any existing trait that wants to take advantage of +extensibility of OAM. Any trait that only applies to a certain type of workload, such as + `EtcdBackup` trait, doesn't need to use this mechanism. +- WorkloadDefinition: workload owners can modify the existing workloadDefinition if needed. + +## Alternative approach +1. One alternative approach is that we can make the applicationConfiguration controller watch all +the possible workload child resources. It also inserts the child resources GVK and name to the +corresponding workload CR. I would not recommend this approach as it increases the complexity +of the applicationConfiguration controller and makes more of availability liability. +2. Another approach is to implement a separate type for binding traits to workloads. This would + work, but it seems that label/annotation is a natural place to record the information. Otherwise + , we need a way for the trait to discover the binding instance first. + +## Extra labels +There might be cases that a workload generates more than one resource with the same GVK and only +want to expose a subset of them to traits. In this case, we can add a pre-defined label such as +`core.oam.dev/expose=true` for the workload owner to indicate what resources to expose. This +section is just to illustrate that this is a solvable problem, and it's beyond the scope of this +proposal for now .