diff --git a/docs/book/src/SUMMARY.md b/docs/book/src/SUMMARY.md index 8ecf83a26f41..256cdffbc39c 100644 --- a/docs/book/src/SUMMARY.md +++ b/docs/book/src/SUMMARY.md @@ -24,7 +24,11 @@ - [Writing a ClusterClass](./tasks/experimental-features/cluster-class/write-clusterclass.md) - [Changing a ClusterClass](./tasks/experimental-features/cluster-class/change-clusterclass.md) - [Operating a managed Cluster](./tasks/experimental-features/cluster-class/operate-cluster.md) - - [Runtime SDK](./tasks/experimental-features/runtime-sdk.md) + - [Runtime SDK](tasks/experimental-features/runtime-sdk/index.md) + - [Implementing Runtime Extensions](./tasks/experimental-features/runtime-sdk/implement-extensions.md) + - [Implementing Lifecycle Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md) + - [Implementing Topology Mutation Hook Extensions](./tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md) + - [Deploying Runtime Extensions](./tasks/experimental-features/runtime-sdk/deploy-runtime-extension.md) - [Ignition Bootstrap configuration](./tasks/experimental-features/ignition.md) - [Security Guidelines](./security/index.md) - [Pod Security Standards](./security/pod-security-standards.md) diff --git a/docs/book/src/images/runtime-sdk-lifecycle-hooks.png b/docs/book/src/images/runtime-sdk-lifecycle-hooks.png new file mode 100644 index 000000000000..7153ee288aef Binary files /dev/null and b/docs/book/src/images/runtime-sdk-lifecycle-hooks.png differ diff --git a/docs/book/src/images/runtime-sdk-topology-mutation.plantuml b/docs/book/src/images/runtime-sdk-topology-mutation.plantuml new file mode 100644 index 000000000000..ce7290c5487d --- /dev/null +++ b/docs/book/src/images/runtime-sdk-topology-mutation.plantuml @@ -0,0 +1,56 @@ +@startuml +title Figure 1. Cluster topology reconciliation + + +' -- GROUPS START --- + +box #LightGreen +participant "API Server" +end box + +box #LightBlue +participant "Cluster Topology Controller" +end box + +box #LightBlue +participant "External Patch Extensions" +end box + +' -- GROUPS END --- + +"API Server" --> "Cluster Topology Controller": Cluster reconcile event +activate "Cluster Topology Controller" + +"Cluster Topology Controller" -> "API Server": Get current Cluster topology +activate "API Server" +"API Server" -> "Cluster Topology Controller": +deactivate "API Server" + +group Compute desired State + "Cluster Topology Controller" -> "Cluster Topology Controller": Compute desired State + loop Ordered list of Patches + alt + "Cluster Topology Controller" -> "Cluster Topology Controller": Generate inline patches + else + "Cluster Topology Controller" -> "External Patch Extensions": Generate external patches + activate "External Patch Extensions" + "External Patch Extensions" -> "Cluster Topology Controller": + deactivate "External Patch Extensions" + end + "Cluster Topology Controller" -> "Cluster Topology Controller": Apply patches to desired State + end loop + + loop External Patches + "Cluster Topology Controller" -> "External Patch Extensions": ValidateTopology + activate "External Patch Extensions" + "External Patch Extensions" -> "Cluster Topology Controller": + deactivate "External Patch Extensions" + end loop +end group + +"Cluster Topology Controller" -> "API Server": Reconcile Cluster topology + +deactivate "Cluster Topology Controller" + +hide footbox +@enduml diff --git a/docs/book/src/images/runtime-sdk-topology-mutation.png b/docs/book/src/images/runtime-sdk-topology-mutation.png new file mode 100644 index 000000000000..c26f52ee782a Binary files /dev/null and b/docs/book/src/images/runtime-sdk-topology-mutation.png differ diff --git a/docs/book/src/tasks/experimental-features/cluster-class/index.md b/docs/book/src/tasks/experimental-features/cluster-class/index.md index 59433a4b8173..d708451fc465 100644 --- a/docs/book/src/tasks/experimental-features/cluster-class/index.md +++ b/docs/book/src/tasks/experimental-features/cluster-class/index.md @@ -16,7 +16,7 @@ In order to use the ClusterClass (alpha) experimental feature the Kubernetes Ver **Variable name to enable/disable the feature gate**: `CLUSTER_TOPOLOGY` Additional documentation: -* Background information: [ClusterClass and Managed Topologies CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210526-cluster-class-and-managed-topologies.md) +* Background information: [ClusterClass and Managed Topologies CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210526-cluster-class-and-managed-topologies.md) * For ClusterClass authors: * [Writing a ClusterClass](./write-clusterclass.md) * [Changing a ClusterClass](./change-clusterclass.md) diff --git a/docs/book/src/tasks/experimental-features/experimental-features.md b/docs/book/src/tasks/experimental-features/experimental-features.md index c8d43115643e..3564980e34ff 100644 --- a/docs/book/src/tasks/experimental-features/experimental-features.md +++ b/docs/book/src/tasks/experimental-features/experimental-features.md @@ -78,7 +78,7 @@ kubectl describe -n capi-system deployment.apps/capi-controller-manager * [ClusterResourceSet](./cluster-resource-set.md) * [ClusterClass](./cluster-class/index.md) * [Ignition Bootstrap configuration](./ignition.md) -* [Runtime SDK](./runtime-sdk.md) +* [Runtime SDK](runtime-sdk/index.md) **Warning**: Experimental features are unreliable, i.e., some may one day be promoted to the main repository, or they may be modified arbitrarily or even disappear altogether. In short, they are not subject to any compatibility or deprecation promise. diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk.md b/docs/book/src/tasks/experimental-features/runtime-sdk.md deleted file mode 100644 index f27a2b5cd0eb..000000000000 --- a/docs/book/src/tasks/experimental-features/runtime-sdk.md +++ /dev/null @@ -1,12 +0,0 @@ -# Experimental Feature: Runtime SDK - -The Runtime SDK feature provides an extensibility mechanism that allows systems, products, and services built on top of Cluster API to hook into a workload cluster’s lifecycle. - - -**Feature gate name**: `RuntimeSDK` - -**Variable name to enable/disable the feature gate**: `EXP_RUNTIME_SDK` - - -More details on the Runtime SDK can be found at: -[RuntimeSDK CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220221-runtime-SDK.md) diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/deploy-runtime-extension.md b/docs/book/src/tasks/experimental-features/runtime-sdk/deploy-runtime-extension.md new file mode 100644 index 000000000000..6157f3176cb8 --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/deploy-runtime-extension.md @@ -0,0 +1,47 @@ +# Deploy Runtime Extensions + + + +Cluster API requires that each Runtime Extension must be deployed using an endpoint accessible from the Cluster API +controllers. The recommended deployment model is to deploy a Runtime Extension in the management cluster by: + +- Packing the Runtime Extension in a container image. +- Using a Kubernetes Deployment to run the above container inside the Management Cluster. +- Using a Cluster IP Service to make the Runtime Extension instances accessible via a stable DNS name. +- Using a cert-manager generated Certificate to protect the endpoint. + +For an example, please see our [test extension](https://github.com/kubernetes-sigs/cluster-api/tree/main/test/extension) +which follows, as closely as possible, the kubebuilder setup used for controllers in Cluster API. + +There are a set of important guidelines that must be considered while choosing the deployment method: + +## Availability + +It is recommended that Runtime Extensions should leverage some form of load-balancing, to provide high availability +and performance benefits. You can run multiple Runtime Extension servers behind a Kubernetes Service to leverage the +load-balancing that services support. + +## Identity and access management + +The security model for each Runtime Extension should be carefully defined, similar to any other application deployed +in the Cluster. If the Runtime Extension requires access to the apiserver the deployment must use a dedicated service +account with limited RBAC permission. Otherwise no service account should be used. + +On top of that, the container image for the Runtime Extension should be carefully designed in order to avoid +privilege escalation (e.g using [distroless](https://github.com/GoogleContainerTools/distroless) base images). +The Pod spec in the Deployment manifest should enforce security best practices (e.g. do not use privileged pods). + +## Alternative deployments methods + +Alternative deployment methods can be used as long as the HTTPs endpoint is accessible, like e.g.: + +- deploying the HTTPS Server as a part of another component, e.g. a controller. +- deploying the HTTPS Server outside the Management Cluster. + +In those cases recommendations about availability and identity and access management still apply. diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md new file mode 100644 index 000000000000..2d17ffee8fb5 --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md @@ -0,0 +1,293 @@ +# Implementing Runtime Extensions + + + +## Introduction + +As a developer building systems on top of Cluster API, if you want to hook into the Cluster’s lifecycle via +a Runtime Hook, you have to implement a Runtime Extension handling requests according to the +OpenAPI specification for the Runtime Hook you are interested in. + +Runtime Extensions by design are very powerful and flexible, however given that with great power comes +great responsibility, a few key consideration should always be kept in mind (more details in the following sections): + +- Runtime Extensions are components that should be designed, written and deployed with great caution given that they + can affect the proper functioning of the Cluster API runtime. +- Cluster administrators should carefully vet any Runtime Extension registration, thus preventing malicious components + from being added to the system. + +Please note that following similar practices is already commonly accepted in the Kubernetes ecosystem for +Kubernetes API server admission webhooks. Runtime Extensions share the same foundation and most of the same +considerations/concerns apply. + +## Implementation + +As mentioned above as a developer building systems on top of Cluster API, if you want to hook in the Cluster’s +lifecycle via a Runtime Extension, you have to implement an HTTPS server handling a discovery request and a set +of additional requests according to the OpenAPI specification for the Runtime Hook you are interested in. + +The following shows a minimal example of a Runtime Extension server implementation: + +```go +package main + +import ( + "context" + "flag" + "net/http" + "os" + + "github.com/spf13/pflag" + cliflag "k8s.io/component-base/cli/flag" + "k8s.io/component-base/logs" + "k8s.io/klog/v2" + "k8s.io/utils/pointer" + ctrl "sigs.k8s.io/controller-runtime" + + runtimecatalog "sigs.k8s.io/cluster-api/exp/runtime/catalog" + runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1" + "sigs.k8s.io/cluster-api/exp/runtime/server" +) + +var ( + catalog = runtimecatalog.New() + setupLog = ctrl.Log.WithName("setup") + + // Flags. + profilerAddress string + webhookPort int + webhookCertDir string + logOptions = logs.NewOptions() +) + +func init() { + // Register the Runtime Hook types into the catalog. + _ = runtimehooksv1.AddToCatalog(catalog) +} + +// InitFlags initializes the flags. +func InitFlags(fs *pflag.FlagSet) { + logs.AddFlags(fs, logs.SkipLoggingConfigurationFlags()) + logOptions.AddFlags(fs) + + fs.StringVar(&profilerAddress, "profiler-address", "", + "Bind address to expose the pprof profiler (e.g. localhost:6060)") + + fs.IntVar(&webhookPort, "webhook-port", 9443, + "Webhook Server port") + + fs.StringVar(&webhookCertDir, "webhook-cert-dir", "/tmp/k8s-webhook-server/serving-certs/", + "Webhook cert dir, only used when webhook-port is specified.") +} + +func main() { + InitFlags(pflag.CommandLine) + pflag.CommandLine.SetNormalizeFunc(cliflag.WordSepNormalizeFunc) + pflag.CommandLine.AddGoFlagSet(flag.CommandLine) + pflag.Parse() + + if err := logOptions.ValidateAndApply(nil); err != nil { + setupLog.Error(err, "unable to start extension") + os.Exit(1) + } + + // klog.Background will automatically use the right logger. + ctrl.SetLogger(klog.Background()) + + if profilerAddress != "" { + klog.Infof("Profiler listening for requests at %s", profilerAddress) + go func() { + klog.Info(http.ListenAndServe(profilerAddress, nil)) + }() + } + + ctx := ctrl.SetupSignalHandler() + + webhookServer, err := server.NewServer(server.Options{ + Catalog: catalog, + Port: webhookPort, + CertDir: webhookCertDir, + }) + if err != nil { + setupLog.Error(err, "error creating webhook server") + os.Exit(1) + } + + // Register extension handlers. + if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{ + Hook: runtimehooksv1.BeforeClusterCreate, + Name: "before-cluster-create", + HandlerFunc: DoBeforeClusterCreate, + TimeoutSeconds: pointer.Int32(5), + FailurePolicy: toPtr(runtimehooksv1.FailurePolicyFail), + }); err != nil { + setupLog.Error(err, "error adding handler") + os.Exit(1) + } + if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{ + Hook: runtimehooksv1.BeforeClusterUpgrade, + Name: "before-cluster-upgrade", + HandlerFunc: DoBeforeClusterUpgrade, + TimeoutSeconds: pointer.Int32(5), + FailurePolicy: toPtr(runtimehooksv1.FailurePolicyFail), + }); err != nil { + setupLog.Error(err, "error adding handler") + os.Exit(1) + } + + setupLog.Info("Starting Runtime Extension server") + if err := webhookServer.Start(ctx); err != nil { + setupLog.Error(err, "error running webhook server") + os.Exit(1) + } +} + +func DoBeforeClusterCreate(ctx context.Context, request *runtimehooksv1.BeforeClusterCreateRequest, response *runtimehooksv1.BeforeClusterCreateResponse) { + log := ctrl.LoggerFrom(ctx) + log.Info("BeforeClusterCreate is called") + // Your implementation +} + +func DoBeforeClusterUpgrade(ctx context.Context, request *runtimehooksv1.BeforeClusterUpgradeRequest, response *runtimehooksv1.BeforeClusterUpgradeResponse) { + log := ctrl.LoggerFrom(ctx) + log.Info("BeforeClusterUpgrade is called") + // Your implementation +} + +func toPtr(f runtimehooksv1.FailurePolicy) *runtimehooksv1.FailurePolicy { + return &f +} +``` + +For a full example see our [test extension](https://github.com/kubernetes-sigs/cluster-api/tree/main/test/extension). + +Please note that a Runtime Extension server can serve multiple Runtime Hooks (in the example above +`BeforeClusterCreate` and `BeforeClusterUpgrade`) at the same time. Each of them are handled at a different path, like the +Kubernetes API server does for different API resources. The exact format of those paths is handled by the server +automatically in accordance to the OpenAPI specification of the Runtime Hooks. + +There is an additional `Discovery` endpoint which is automatically served by the `Server`. The `Discovery` endpoint +returns a list of extension handlers to inform Cluster API which Runtime Hooks are implemented by this +Runtime Extension server. + +Please note that Cluster API is only able to enforce the correct request and response types as defined by a Runtime Hook version. +Developers are fully responsible for all other elements of the design of a Runtime Extension implementation, including: + +- To choose which programming language to use; please note that Golang is the language of choice, and we are not planning + to test or provide tooling and libraries for other languages. Nevertheless, given that we rely on Open API and plain + HTTPS calls, other languages should just work but support will be provided at best effort. +- To choose if a dedicated or a shared HTTPS Server is used for the Runtime Extension (it can be e.g. also used to serve a + metric endpoint). + +When using Golang the Runtime Extension developer can benefit from the following packages (provided by the +`sigs.k8s.io/cluster-api` module) as shown in the example above: + +- `exp/runtime/hooks/api/v1alpha1` contains the Runtime Hook Golang API types, which are also used to generate the + OpenAPI specification. +- `exp/runtime/catalog` provides the `Catalog` object to register Runtime Hook definitions. The `Catalog` is then + used by the `server` package to handle requests. `Catalog` is similar to the `runtime.Scheme` of the + `k8s.io/apimachinery/pkg/runtime` package, but it is designed to store Runtime Hook registrations. +- `exp/runtime/server` provides a `Server` object which makes it easy to implement a Runtime Extension server. + The `Server` will automatically handle tasks like Marshalling/Unmarshalling requests and responses. A Runtime + Extension developer only has to implement a strongly typed function that contains the actual logic. + +## Guidelines + +While writing a Runtime Extension the following important guidelines must be considered: + +### Timeouts + +Runtime Extension processing adds to reconcile durations of Cluster API controllers. They should respond to requests +as quickly as possible, typically in milliseconds. Runtime Extension developers can decide how long the Cluster API Runtime +should wait for a Runtime Extension to respond before treating the call as a failure (max is 30s) by returning the timeout +during discovery. Of course a Runtime Extension can trigger long-running tasks in the background, but they shouldn't block +synchronously. + +### Availability + +Runtime Extension failure could result in errors in handling the workload clusters lifecycle, and so the implementation +should be robust, have proper error handling, avoid panics, etc.. . Failure policies can be set up to mitigate the +negative impact of a Runtime Extension on the Cluster API Runtime, but this option can’t be used in all cases +(see [Error Management](#error-management)). + +### Blocking Hooks + +A Runtime Hook can be defined as "blocking" - e.g. the `BeforeClusterUpgrade` hook allows a Runtime Extension +to prevent the upgrade from starting. A Runtime Extension registered for the `BeforeClusterUpgrade` hook +can block by returning a non-zero `retryAfterSeconds` value. Following consideration apply: + +- The system might decide to retry the same Runtime Extension even before the `retryAfterSeconds` period expires, + e.g. due to other changes in the Cluster, so `retryAfterSeconds` should be considered as an approximate maximum + time before the next reconcile. +- If there is more than one Runtime Extension registered for the same Runtime Hook and more than one returns + `retryAfterSeconds`, the shortest non-zero value will be used. +- If there is more than one Runtime Extension registered for the same Runtime Hook and at least one returns + `retryAfterSeconds`, all Runtime Extensions will be called again. + +Detailed description of what "blocking" means for each specific Runtime Hooks is documented case by case +in the hook-specific implementation documentation (e.g. [Implementing Lifecycle Hook Runtime Extensions](./implement-lifecycle-hooks.md#Definitions)). + +### Side Effects + +It is recommended that Runtime Extensions should avoid side effects if possible, which means they should operate +only on the content of the request sent to them, and not make out-of-band changes. If side effects are required, +rules defined in the following sections apply. + +### Idempotence + +An idempotent Runtime Extension is able to succeed even in case it has already been completed before (the Runtime +Extension checks current state and changes it only if necessary). This is necessary because a Runtime Extension +may be called many times after it already succeeded because other Runtime Extensions for the same hook may not +succeed in the same reconcile. + +A practical example that explains why idempotence is relevant is the fact that extensions could be called more +than once for the same lifecycle transition, e.g. + +- Two Runtime Extensions are registered for the `BeforeClusterUpgrade` hook. +- Before a Cluster upgrade is started both extensions are called, but one of them temporarily blocks the operation + by asking to retry after 30 seconds. +- After 30 seconds the system retries the lifecycle transition, and both extensions are called again to re-evaluate + if it is now possible to proceed with the Cluster upgrade. + +### Avoid dependencies + +Each Runtime Extension should accomplish its task without depending on other Runtime Extensions. Introducing +dependencies across Runtime Extensions makes the system fragile, and it is probably a consequence of poor +"Separation of Concerns" between extensions. + +### Deterministic result + +A deterministic Runtime Extension is implemented in such a way that given the same input it will always return +the same output. + +Some Runtime Hooks, e.g. like external patches, might explicitly request for corresponding Runtime Extensions +to support this property. But we encourage developers to follow this pattern more generally given that it fits +well with practices like unit testing and generally makes the entire system more predictable and easier to troubleshoot. + +### Error Management + +In case a Runtime Extension returns an error, the error will be handled according to the corresponding failure policy +defined in the response of the Discovery call. + +If the failure policy is `Ignore` the error is going to be recorded in the controller's logs, but the processing +will continue. However we recognize that this failure policy cannot be used in most of the use cases because Runtime +Extension implementers want to ensure that the task implemented by an extension is completed before continuing with +the cluster's lifecycle. + +If instead the failure policy is `Fail` the system will retry the operation until it passes. The following general +considerations apply: + +- It is the responsibility of Cluster API components to surface Runtime Extension errors using conditions. +- Operations will be retried with an exponential backoff or whenever the state of a Cluster changes (we are going to rely + on controller runtime exponential backoff/watches). +- If there is more than one Runtime Extension registered for the same Runtime Hook and at least one of them fails, + all the registered Runtime Extension will be retried. See [Idempotence](#idempotence) + +Additional considerations about errors that apply only to a specific Runtime Hook will be documented in the hook-specific +implementation documentation. diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md new file mode 100644 index 000000000000..adede3e2db4c --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md @@ -0,0 +1,255 @@ +# Implementing Lifecycle Hook Runtime Extensions + + + +## Introduction + +The lifecycle hooks allow hooking into the Cluster lifecycle. The following diagram provides an overview: + +![Lifecycle Hooks overview](../../../images/runtime-sdk-lifecycle-hooks.png) + +Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md) +for additional background information. + +## Guidelines + +All guidelines defined in [Implementing Runtime Extensions](implement-extensions.md#guidelines) apply to the +implementation of Runtime Extensions for lifecycle hooks as well. + +In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given +that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could +potentially block lifecycle transitions from happening. + +Following recommendations are especially relevant: + +* [Blocking and non Blocking](implement-extensions.md#blocking-hooks) +* [Error management](implement-extensions.md#error-management) +* [Avoid dependencies](implement-extensions.md#avoid-dependencies) + +## Definitions + +### BeforeClusterCreate + +This hook is called after the Cluster object has been created by the user, immediately before all the objects which +are part of a Cluster topology(*) are going to be created. Runtime Extension implementers can use this hook to +determine/prepare add-ons for the Cluster and block the creation of those objects until everything is ready. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterCreateRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterCreateResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). + +(*) The objects which are part of a Cluster topology are the infrastructure Cluster, the Control Plane, the +MachineDeployments and the templates derived from the ClusterClass. + +### AfterControlPlaneInitialized + +This hook is called after the Control Plane for the Cluster is marked as available for the first time. Runtime Extension +implementers can use this hook to execute tasks, for example component installation on workload clusters, that are only +possible once the Control Plane is available. This hook does not block any further changes to the Cluster. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterControlPlaneInitializedRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterControlPlaneInitializedResponse +status: Success # or Failure +message: "error message if status == Failure" +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). + +### BeforeClusterUpgrade + +This hook is called after the Cluster object has been updated with a new `spec.topology.version` by the user, and +immediately before the new version is going to be propagated to the control plane (*). Runtime Extension implementers +can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane and Workers. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterUpgradeRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +fromKubernetesVersion: "v1.21.2" +toKubernetesVersion: "v1.22.0" +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). + +(*) Under normal circumstances `spec.topology.version` gets propagated to the control plane immediately; however + if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations + to complete before starting the new upgrade. + +### AfterControlPlaneUpgrade + +This hook is called after the control plane has been upgraded to the version specified in `spec.topology.version`, +and immediately before the new version is going to be propagated to the MachineDeployments of the Cluster. +Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to workers +until everything is ready. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterControlPlaneUpgradeRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +kubernetesVersion: "v1.22.0" +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterControlPlaneUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). + +### AfterClusterUpgrade + +This hook is called after the Cluster, control plane and workers have been upgraded to the version specified in +`spec.topology.version`. Runtime Extensions implementers can use this hook to execute post-upgrade add-on tasks. +This hook does not block any further changes or upgrades to the Cluster. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterClusterUpgradeRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +kubernetesVersion: "v1.22.0" +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: AfterClusterUpgradeResponse +status: Success # or Failure +message: "error message if status == Failure" +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). + +### BeforeClusterDelete + +This hook is called after the Cluster deletion has been triggered by the user and immediately before the topology +of the Cluster is going to be deleted. Runtime Extension implementers can use this hook to execute +cleanup tasks for the add-ons and block deletion of the Cluster and descendant objects until everything is ready. + +#### Example Request: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterDeleteRequest +cluster: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: Cluster + metadata: + name: test-cluster + namespace: test-ns + spec: + ... + status: + ... +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: BeforeClusterDeleteResponse +status: Success # or Failure +message: "error message if status == Failure" +retryAfterSeconds: 10 +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md new file mode 100644 index 000000000000..ce73a29c1c96 --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md @@ -0,0 +1,182 @@ +# Implementing Topology Mutation Hook Runtime Extensions + + + +## Introduction + +The Topology Mutation Hooks are going to be called during each Cluster topology reconciliation. More specifically +we are going to call two different hooks for each reconciliation: + +* **GeneratePatches**: GeneratePatches is responsible for generating patches for the entire Cluster topology. +* **ValidateTopology**: ValidateTopology is called after all patches have been applied and thus allow to validate + the resulting objects. + +![Cluster topology reconciliation](../../../images/runtime-sdk-topology-mutation.png) + +Please see the corresponding [CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md) +for additional background information. + +## Inline vs. external patches + +Inline patches have the following advantages: +* Inline patches are easier when getting started with ClusterClass as they are built into + the Cluster API core controller, no external component have to be developed and managed. + +External patches have the following advantages: +* External patches can be individually written, unit tested and released/versioned. +* External patches can leverage the full feature set of a programming language and + are thus not limited to the capabilities of JSON patches and Go templating. +* External patches can use external data (e.g. from cloud APIs) during patch generation. +* External patches can be easily reused across ClusterClasses. + +## Using one or multiple external patch extensions + +Some considerations: +* In general a single external patch extension is simpler than many, as only one extension + then has to be built, deployed and managed. +* A single extension also requires less HTTP round-trips between the CAPI controller and the extension(s). +* With a single extension it is still possible to implement multiple logical features using different variables. +* When implementing multiple logical features in one extension it's recommended that they can be conditionally + enabled/disabled via variables (either via certain values or by their existence). +* [Conway's law](https://en.wikipedia.org/wiki/Conway%27s_law) might make it not feasible in large organizations + to use a single extension. In those cases it's important that boundaries between extensions are clearly defined. + +## Guidelines + +For general Runtime Extension developer guidelines please refer to the guidelines in [Implementing Runtime Extensions](implement-extensions.md#guidelines). +This section outlines considerations specific to Topology Mutation hooks: + +* **Input validation**: An External Patch Extension must always validate its input, i.e. it must validate that + all variables exist, have the right type and it must validate the kind and apiVersion of the templates which + should be patched. +* **Timeouts**: As External Patch Extensions are called during each Cluster topology reconciliation, they must + respond as fast as possible (<=200ms) to avoid delaying individual reconciles and congestion. +* **Availability**: An External Patch Extension must be always available, otherwise Cluster topologies won’t be + reconciled anymore. +* **Side Effects**: An External Patch Extension must not make out-of-band changes. If necessary external data can + be retrieved, but be aware of performance impact. +* **Deterministic results**: For a given request (a set of templates and variables) an External Patch Extension must + always return the same response (a set of patches). Otherwise the Cluster topology will never reach a stable state. +* **Idempotence**: An External Patch Extension must only return patches if changes to the templates are required, + i.e. unnecessary patches when the template is already in the desired state must be avoided. +* **Avoid Dependencies**: An External Patch Extension must be independent of other External Patch Extensions. However + if dependencies cannot be avoided, it is possible to control the order in which patches are executed via the ClusterClass. + +## Definitions + +### GeneratePatches + +A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all +templates, the global variables and the template-specific variables. The response contains generated patches. + +#### Example request: + +* Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a + holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips. +* Each item in the request will contain the template as a raw object. Additionally information about where + the template is used is provided via `holderReference`. + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: GeneratePatchesRequest +variables: +- name: + value: + ... +items: +- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 + holderReference: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: MachineDeployment + namespace: default + name: cluster-md1-xyz + fieldPath: spec.template.spec.infrastructureRef + object: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 + kind: AWSMachineTemplate + spec: + ... + variables: + - name: + value: + ... +``` + +#### Example Response: + +* The response contains patches instead of full objects to reduce the payload. +* Templates in the request and patches in the response will be correlated via UIDs. +* Like inline patches, external patches are only allowed to change fields in `spec.template.spec`. + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: GeneratePatchesResponse +status: Success # or Failure +message: "error message if status == Failure" +items: +- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 + patchType: JSONPatch + patch: +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/topology-mutation-hook/runtime-sdk-openapi.yaml). + +We are considering to introduce a library to facilitate development of External Patch Extensions. It would provide capabilities like: +* Accessing builtin variables +* Extracting certain templates from a GeneratePatches request (e.g. all bootstrap templates) + +If you are interested in contributing to this library please reach out to the maintainer team or +feel free to open an issue describing your idea or use case. + +### ValidateTopology + +A ValidateTopology call validates the topology after all patches have been applied. The request contains all +templates of the Cluster topology, the global variables and the template-specific variables. The response +contains the result of the validation. + +#### Example Request: + +* The request is the same as the GeneratePatches request except it doesn't have `uid` fields. We don't + need them as we don't have to correlate patches in the response. + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: ValidateTopologyRequest +variables: +- name: + value: + ... +items: +- holderReference: + apiVersion: cluster.x-k8s.io/v1beta1 + kind: MachineDeployment + namespace: default + name: cluster-md1-xyz + fieldPath: spec.template.spec.infrastructureRef + object: + apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 + kind: AWSMachineTemplate + spec: + ... + variables: + - name: + value: + ... +``` + +#### Example Response: + +```yaml +apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 +kind: ValidateTopologyResponse +status: Success # or Failure +message: "error message if status == Failure" +``` + +For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/topology-mutation-hook/runtime-sdk-openapi.yaml). diff --git a/docs/book/src/tasks/experimental-features/runtime-sdk/index.md b/docs/book/src/tasks/experimental-features/runtime-sdk/index.md new file mode 100644 index 000000000000..d621fe7547a5 --- /dev/null +++ b/docs/book/src/tasks/experimental-features/runtime-sdk/index.md @@ -0,0 +1,34 @@ +# Experimental Feature: Runtime SDK (alpha) + +The Runtime SDK feature provides an extensibility mechanism that allows systems, products, and services built on top of Cluster API to hook into a workload cluster’s lifecycle. + + + + + +**Feature gate name**: `RuntimeSDK` + +**Variable name to enable/disable the feature gate**: `EXP_RUNTIME_SDK` + +Additional documentation: + +* Background information: + * [Runtime SDK CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220221-runtime-SDK.md) + * [Topology Mutation Hook CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220330-topology-mutation-hook.md) + * [Runtime Hooks for Add-on Management CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220414-runtime-hooks.md) +* For Runtime Extension developers: + * [Implementing Runtime Extensions](./implement-extensions.md) + * [Implementing Lifecycle Hook Extensions](./implement-lifecycle-hooks.md) + * [Implementing Topology Mutation Hook Extensions](./implement-topology-mutation-hook.md) +* For Cluster operators: + * [Deploying Runtime Extensions](./deploy-runtime-extension.md) diff --git a/docs/proposals/20220221-runtime-SDK.md b/docs/proposals/20220221-runtime-SDK.md index 347cc822f69c..719e0549f21f 100644 --- a/docs/proposals/20220221-runtime-SDK.md +++ b/docs/proposals/20220221-runtime-SDK.md @@ -39,19 +39,6 @@ superseded-by: * [Cluster API Runtime Hooks vs Kubernetes admission webhooks](#cluster-api-runtime-hooks-vs-kubernetes-admission-webhooks) * [Runtime SDK rules](#runtime-sdk-rules) * [Runtime Extensions developer guide](#runtime-extensions-developer-guide) - * [Implementing Runtime Extensions](#implementing-runtime-extensions) - * [Timeouts](#timeouts) - * [Availability](#availability) - * [Blocking Hooks](#blocking-hooks) - * [Side Effects](#side-effects) - * [Idempotence](#idempotence) - * [Avoid dependencies](#avoid-dependencies) - * [Deterministic result](#deterministic-result) - * [Error Management](#error-management) - * [Deploy Runtime Extensions](#deploy-runtime-extensions) - * [Availability](#availability-1) - * [Identity and access management](#identity-and-access-management) - * [Alternative deployments methods](#alternative-deployments-methods) * [Registering Runtime Extensions](#registering-runtime-extensions) * [Runtime Hooks developer guide (CAPI internals)](#runtime-hooks-developer-guide-capi-internals) * [Runtime hook implementation](#runtime-hook-implementation) @@ -74,13 +61,13 @@ superseded-by: Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). -- **Cluster API Runtime**: identifies the Cluster API execution model, a set of controllers cooperating in managing the +- **Cluster API Runtime**: identifies the Cluster API execution model, a set of controllers cooperating in managing the workload cluster’s lifecycle. -- **Runtime SDK**: a set of rules, recommendations and fundamental capabilities required to develop Runtime Hooks and +- **Runtime SDK**: a set of rules, recommendations and fundamental capabilities required to develop Runtime Hooks and Runtime Extensions. - **Runtime Hook**: a single, well identified, extension point allowing applications built on top of Cluster API to hook - into specific moments of the workload cluster’s lifecycle, e.g. Cluster.BeforeUpgrade, Machine.BeforeRemediation. -- **Runtime Extension**: an external component which is part of a system/product built on top of Cluster API that can + into specific moments of the workload cluster’s lifecycle, e.g. `BeforeClusterUpgrade`, `BeforeMachineRemediation`. +- **Runtime Extension**: an external component which is part of a system/product built on top of Cluster API that can handle requests for a specific Runtime Hook. ## Summary @@ -105,7 +92,7 @@ Instead, with the growing adoption of Cluster API as a common layer to manage fl now a new category of systems, products and services built on top of Cluster API that require strict interactions with the lifecycle of Clusters, but at the same time they do not want to replace any “low-level” components in Cluster API, because they happily benefit from all the features available in the existing providers (built on top vs -plug-in/swap). +plug-in/swap). A common approach for this problem has been to watch for Cluster API resources; another approach has been to implement API Server admission webhooks to alter CAPI resources, but both approaches are limited by the fact that the system @@ -120,12 +107,12 @@ other lifecycle moments. This proposal aims to solve the above problem in a more structured and generic way, by introducing the Runtime SDK, a set of rules, recommendations and fundamental capabilities required to implement a new extensibility mechanism that will allow systems, products and services built on top of Cluster API to hook in the workload cluster’s -lifecycle. +lifecycle. The key elements of the above extensibility mechanism are Runtime Hooks and Runtime Extensions. Runtime Hooks and Runtime Extensions are designed to be powerful and flexible, and _by opportunity_ it will be also -possible to use this capability for allowing the user to hook into Cluster API reconcile loops at "low level", e.g. +possible to use this capability for allowing the user to hook into Cluster API reconcile loops at "low level", e.g. by allowing a Runtime Extension providing external patches to be executed on every topology reconcile. ### Goals @@ -164,7 +151,7 @@ To define the Runtime SDK and more specifically ### User Stories - As a cluster operator I want to be able to execute a particular action in well-defined moments of the Workload - Cluster’s lifecycle, e.g. + Cluster’s lifecycle, e.g. - As a cluster operator I want to automatically install the external CPI addon Before Upgrading the Cluster. - As a cluster operator I want to automatically check my quota management systems Before Creating a cluster. - As a cluster operator I want to automatically run Kubernetes conformance tests After a Cluster upgrade completes. @@ -221,7 +208,7 @@ Runtime Hooks are inspired by Kubernetes admission webhooks, but there is one ke - Admission webhooks are strictly linked to Kubernetes API Server/etcd **CRUD operations** e.g. Create or Update Cluster in etcd. -- Runtime Hooks can be used to define **arbitrary operations**, e.g. Cluster.BeforeUpgrade, Machine.Remediate etc. +- Runtime Hooks can be used to define **arbitrary operations**, e.g. `BeforeClusterUpgrade`, `BeforeMachineRemediation` etc. In other words, Runtime Hooks are not concerned about “low-level” details of how Kubernetes handles objects in the API Server/etcd; Runtime Hooks instead focus on “high-level” events of a Cluster’s lifecycle. @@ -232,7 +219,7 @@ defined in the following paragraphs. #### Runtime SDK rules -As this proposal is based on RESTful APIs, we are using [OpenAPI Specification v3.0.0](https://swagger.io/specification/) [1] +As this proposal is based on RESTful APIs, we are using [OpenAPI Specification v3.0.0](https://swagger.io/specification/) [1] to document Runtime Hooks supported by Cluster API. Most specifically, a single OpenAPI document providing specification for all the Runtime Hooks supported by a @@ -243,12 +230,12 @@ book as well e.g. ![overview](images/runtime-sdk/swagger-ui.png) Each Runtime Hook will be defined by one (or more) RESTful APIs implemented as a `POST` operation; each operation -is going to receive an input parameter as a request body, and return an output value as response body, both +is going to receive a request parameter as a request body, and return a response value as response body, both `application/json` encoded and with a schema of arbitrary complexity that should be considered an integral part of the Runtime Hook definition. It is also worth noting that more than one version of the same Runtime Hook might be supported at the same time; -e.g. in the example above the `Cluster.BeforeUpgrade` Hook exist in version `v1alpha1` (old version) +e.g. in the example above the `BeforeClusterUpgrade` Hook exist in version `v1alpha1` (old version) and `v1alpha2` (current). Supporting more versions at the same time is a requirement in order to: @@ -279,231 +266,10 @@ mechanism allowing to: ## Runtime Extensions developer guide -As a developer building systems on top of Cluster API, if you want to hook in the Cluster’s lifecycle via -a Runtime Hook, you are required to implement a Runtime Extension handling requests according to the -OpenAPI specification for the Runtime Hook you are interested in. +The following sections have been moved to the Cluster API book to avoid duplication: -Runtime Extensions by design are very powerful and flexible, however given that with great power comes -great responsibility, a few key consideration should always be kept in mind (more details in the following paragraphs): - -- Runtime Extension are components that should be designed, written and deployed with great caution given that they - can affect the proper functioning of the Cluster API runtime. -- Cluster administrators should carefully vet any Runtime Extension registration, thus preventing malicious components - from being added to the system. - -Please note that following similar practices is already commonly accepted in the Kubernetes ecosystem for -Kubernetes API server admission webhooks, and Runtime Extensions share the same foundation and most of the same -considerations/concerns apply. - -### Implementing Runtime Extensions - -As a developer building systems on top of Cluster API, if you want to hook in the Cluster’s lifecycle via a -Runtime Extension, you are required to implement an HTTP server handling a discovery request and a set of -additional requests according to the OpenAPI specification for the Runtime Hook you are interested in. - -E.g. - -```go -// Note: this is pseudo code, meant to demonstrate that implementing a RuntimeExtension requires only minimal scaffolding; -// the exact details will be defined during implementation, possibly taking advantage of Golang generics. - -var c = catalog.NewCatalog() - -func init() { - v1alpha2.AddToCatalog(c) -} - -func main() { - ctx := context.Background() - - listener, err := net.Listen("tcp", net.JoinHostPort("127.0.0.1", "8082")) - if err != nil { - panic(err) - } - - // Build an HTTP handler for a given operation, calling a strongly typed func at each request - beforeUgradeHandler, err := catalogHTTP.NewHandlerBuilder(). - WithCatalog(c). - AddService(&v1alpha2.DiscoveryHook{}, doDiscovery). - AddService(&v1alpha2.BeforeUgradeHook{}, doBeforeUpgrade). - Build() - if err != nil { - panic(err) - } - - srv := &http.Server{ - Handler: BeforeUgradeHandler, - } - if err := srv.Serve(listener); err != nil && err != http.ErrServerClosed { - panic(err) - } -} - -func doDiscovery(in *v1alpha2.DiscoveryInput, out *v1alpha2.DiscoveryOutput) error { - out.Items = append(out.Items, - *v1alpha2.DiscoveryExtension{ - name: "upgradeAddons", - hook: *v1alpha2.DiscoveryHook{ - apiVersion: "hook.runtime.cluster.x-k8s.io/v1alpha1", - name: "BeforeUpgrade", - }, - timeoutSeconds: 5, - failurePolicy: *v1alpha2.FailurePolicyFail, - }) - return nil -} - -func doBeforeUpgrade(in *v1alpha2.BeforeUpgradeInput, out *v1alpha2.BeforeUpgradeOutput) error { - // your actual implementation... - return nil -} -``` - -Please note that each runtime extension server could answer to different hooks calls (in the -example above `Discovery` and `BeforeUpgrade`) each one of them handled at different path, like API server does -for the different API resources. The exact format of those paths will be defined during the implementation -and this document updated accordingly. - -[Discovery](#discovery-hook) is an operation that allows each runtime extension server to inform Cluster API of the -list of Runtime Extension it implements, their version and other features. - -Please note that Cluster API only enforces input and output parameters types as defined by a Runtime Hook version, -and developers are fully responsible for all the other elements of the design of a Runtime Extension implementation, -including: - -- To choose which programming language to use; please note that for sake of this proposal Golang is the language - of choice, and we are not planning to test/provide tooling/libraries for other languages. Nevertheless, given that - we rely on Open API and plain HTTP(s) calls, other languages should just work but support will be provided at - best effort. -- To choose if to have a dedicated HTTP server(s) for Runtime Extensions only or if to use the HTTP server for other - purposes as well (e.g. metric endpoint). - -In case the Runtime Extension is being developed in Golang, the implementer can benefit from importing -`sigs.k8s.io/cluster-api` as show in the example above and: - -- Use Golang types defined under `/runtime` (the types from which the OpenAPI specification has been generated). -- Use the RuntimeExtension catalog object to generate the skeleton of the HTTP handler for a given Runtime extension. - The generated func will take care of scaffolding tasks like Marshal/Unmarshal; the only missing part will be to - implement a strongly typed func that contains the actual extension implementation. - -While writing the actual code of the Runtime Extension a set of important guidelines must be considered: - -#### Timeouts -Runtime Extension processing adds to network request latency, they should run as quickly as possible -(typically in milliseconds); Cluster Administrator will be allowed to configure how long the Cluster API Runtime -should wait for a Runtime Extension to respond before treating the call as a failure (max 10s). - -#### Availability -Runtime Extension failure could result in errors in handling Workload’s Clusters lifecycle, and so the implementation -should be robust, have proper error handling, avoid panics, etc.; It will be allowed to set up -failure policies preventing a Runtime Extension failure to have negative effects on the Cluster API Runtime, but -this option can’t be used in all the use cases. see [Error Management](#error-management) - -#### Blocking Hooks -A Runtime Hook can be defined as "blocking", e.g. the BeforeClusterUpgrade hook allows a Runtime Extension -to prevent the upgrade to start. - -A Runtime Extension registered for the above hook will be allowed to block by retuning a `retryAfterSeconds` value. -Following consideration apply: - -- The system might decide to retry the same Runtime Extension even before the `retryAfterSeconds` period expires, - e.g. due to other changes in the Cluster, so retry after should be considered as an approximate maximum - time before the next reconcile. -- If there is more than one Runtime Extension registered for the same Runtime Hook and more than one returns - `retryAfterSeconds`, the shortest one will be used. -- If there is more than one Runtime Extension registered for the same Runtime Hook and at least one returns - `retryAfterSeconds`, all the Runtime Extension be executed when the operation will be re-tried. - -Detailed description of what "blocking" means for each specific Runtime Hooks will be documented case by case. - -#### Side Effects -It is recommended that Runtime Extensions should avoid side effects if possible, which means to operate only on -the content of the Input/Output sent to them, and not make out-of-band changes. -If side effects are required, rules defined in the following paragraphs apply. - -#### Idempotence -An idempotent Runtime Extension is able to successfully accomplish its task also in case it has already been completed -(the Runtime Extension checks current state and changes it only if necessary). - -A practical example that explains why idempotence is relevant is the fact that extension could be called more than once -for the same lifecycle transition, e.g. - -- Two RuntimeExtension are registered for the BeforeUpgrade hook. -- Before a Cluster upgrades both extensions are called, but one of them temporarily block the operation asking to retry after 30s. -- After 30s the system retries the lifecycle transition, and both the extensions are called again to re-evaluate - if it is now possible to proceed with Cluster upgrade. - -#### Avoid dependencies -Each Runtime Extension should accomplish its task without dependency or relations with other Runtime Extensions. -Introducing dependencies across Runtime Extensions makes the system fragile, and it is probably a consequence of -poor “Separation of Concerns” while designing such components. - -#### Deterministic result -A deterministic Runtime Extension is implemented in such a way that given the same input it will always return -the same output. - -Some Runtime Hook, like e.g. external patches, might explicitly request for corresponding -Runtime Extensions to support this property, but we encourage developers to follow -this pattern more generally given that it fits well with practices like unit testing and -generally makes the entire system more predictable and easier to troubleshoot. - -#### Error Management -In case a Runtime Extension returns an error, the error will be handled according to the corresponding FailurePolicy -defined in the response to the Discovery call. - -If the failure policy is `Ignore` the error is going to be recorded in controller's logs, but the processing will continue; -however we recognize that this failure policy cannot be used in most of the use cases because Runtime Extension -implementers want to ensure that some task is completed before continuing with the cluster's lifecycle. - -If instead the failure policy is `Fail` the system will retry the operation until it passes. -Following general considerations apply: - -- It is responsibility of Cluster API components to surface RuntimeExtension errors using conditions. -- Operations will be retried with an exponential backoff or whenever the state of Cluster changes (we are going to rely - on controller runtime exponential backoff/watches). -- If the operation is defined as "blocking", the error is going to block a lifecycle transition, - e.g. an error on a Runtime Extension for the BeforeClusterUpgrade hook is going to block the Cluster upgrade to start. -- If there is more than one Runtime Extension registered for the same Runtime Hook and at least one of them fails, - all the registered Runtime Extension will be retried. see [Idempotence](#idempotence) - -Additional consideration about errors that apply to a specific Runtime Hooks only will be documented case by case. - -### Deploy Runtime Extensions -Cluster API requires that each Runtime Extension must be deployed using an endpoint accessible from the Cluster API -controllers; additionally, Runtime Extensions must always be executed out of process (not in the same process as -the Cluster API runtime). - -This proposal assumes that the default solution that implementers are going to adopt is to deploy Runtime Extension -in the Management Cluster by: - -- Packing the Runtime Extension in a container image; -- Using a Kubernetes Deployment to run the above container inside the Management Cluster; -- Using a Cluster IP service to make the Runtime Extension instances accessible via a stable DNS name; -- Using a cert-manager generated Certificate to protect the endpoint. - -There are a set of important guidelines that must be considered while choosing the deployment method: - -#### Availability -It is recommended that Runtime Extensions should leverage some form of load-balancing, to provide high availability -and performance benefits; you can run multiple Runtime Extension backends behind a service to leverage the -load-balancing that services support. - -#### Identity and access management -The security model for each Runtime Extensions should be carefully defined, similar to any other application deployed -in the Cluster: the deployment must use a dedicated service account with limited RBAC permission. - -On top of that, the container image for the Runtime Extension should be carefully designed in order to avoid -privilege escalation (e.g using [distroless](https://github.com/GoogleContainerTools/distroless) base images) and -the Pod spec in the Deployment manifest should enforce security best practices (e.g. do not use privileged pods etc.). - -#### Alternative deployments methods -Alternative deployment methods can be used given that the requirement about the HTTP endpoint accessibility -is satisfied, like e.g. - -- deploying the HTTP Server as a part of another component, e.g. a controller; -- deploying the HTTP Server outside the Management Cluster. - -In these cases above recommendations about availability and identity and access management apply as well. +* [Implementing Runtime Extensions](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md) +* [Deploying Runtime Extensions](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-extensions.md) ### Registering Runtime Extensions @@ -516,7 +282,7 @@ By registering a Runtime Extension the Cluster API Runtime becomes aware of a Ru Runtime Hook, and as a consequence the runtime starts calling the extension at well-defined moments of the workload cluster’s lifecycle. -This process has many similarities with registering dynamic webhooks in Kubernetes, but some specific +This process has many similarities with registering dynamic webhooks in Kubernetes, but some specific behavior is introduced by this proposal: The Cluster administrator is required to register available Runtime Extension server using the following CR: @@ -544,22 +310,23 @@ spec: ``` Once the extension is registered the [discovery hook](#discovery-hook) is called and the above CR is updated with the list -of the Runtime Extensions supported by the server. The ExtensionConfig is Cluster scoped, meaning it has no namespace. The `namespaceSelector` will enable targeting of a subset of Clusters. +of the Runtime Extensions supported by the server. The ExtensionConfig is Cluster scoped, meaning it has no namespace. +The `namespaceSelector` will enable targeting of a subset of Clusters. ```yaml -apiVersion: runtime.cluster.x-k8s.io/v1beta1 +apiVersion: runtime.cluster.x-k8s.io/v1alpha1 kind: ExtensionConfig metadata: name: "my-amazing-extensions" spec: ... status: - runtimeExtensions: ## Details of supported runtime extensions + handlers: ## Details of supported Runtime Extensions - name: "http-proxy.my-amazing-extensions" # unique name, computed - hook: + requestHook: apiVersion: "hook.runtime.cluster.x-k8s.io/v1alpha1" - name: "generatePatches" + hook: "generatePatches" timeoutSeconds: 5 # Timeout to be used when calling the extension. Max timeout allowed 10s. failurePolicy: Fail # FailurePolicy defines how unrecognized errors from the admission endpoint are handled - allowed values are Ignore or Fail. Defaults to Fail. - ... @@ -569,7 +336,7 @@ status: As you can notice, each Runtime Extension is given a unique identifier that can be used to reference it from other part of the system, e.g. from ClusterClass. Additionally, it is documented the exact reference to the hook/version -the Runtime Extension is implementing as well as the failurePolicy and the timeout the system should use when +the Runtime Extension is implementing as well as the failurePolicy and the timeout the system should use when calling the extension. If consensus is reached/in a follow-up iteration we consider to eventually add support for defining @@ -584,12 +351,12 @@ objectSelector: ``` -Instead, unless there's a strong and evident need for it, we are not considering adding support for defining -dependencies among Runtime Extensions, being it modeled with something similar to +Instead, unless there's a strong and evident need for it, we are not considering adding support for defining +dependencies among Runtime Extensions, being it modeled with something similar to [systemd unit options](https://www.freedesktop.org/software/systemd/man/systemd.unit.html) or alternative approaches. The main reason behind that is that such type of feature introduces complexity and creates "pet" like relations across -components making the overall system more fragile. This is also consistent with the [avoid dependencies](#avoid-dependencies) +components making the overall system more fragile. This is also consistent with the [avoid dependencies](#avoid-dependencies) recommendation above. ## Runtime Hooks developer guide (CAPI internals) @@ -603,102 +370,121 @@ The process of implementing the new Runtime Hooks is intentionally designed in o used to define API types, thus providing a familiar experience to the maintainers/the people used to look at the Cluster API codebase. Most specifically: -- Runtime Hooks versions must be defined under a `/runtime` folder. -- For each Runtime Hook, there must be one version, each one defined in its own folder e.g. `/v1alpha1`, `/v1alpha2` etc. -- Eventually we can have further grouping by "area" (TBD during implementation). +- Runtime Hooks versions must be defined under the `/exp/runtime/hooks/api` folder. +- There must be one folder per apiVersion, e.g. `/v1alpha1`, `/v1alpha2` etc. ``` -/runtime -└── contract - ├── cluster - │ ├── v1alpha1 - │ └── v1alpha2 - └── controlplane - └── v1alpha3 +/exp/runtime/hooks/api +├── v1alpha1 +└── v1alpha2 ``` Each version folder must - Define a group version -- Provide type definitions for the RuntimeHook and its input/output parameters. +- Provide type definitions for the Runtime Hook and its request and response parameters. ``` -/runtime/contract/cluster/v1alpha1 +/exp/runtime/hooks/api/v1alpha1 ├── groupversion_info.go -└── before_upgrade_types.go +└── lifecyclehooks_types.go ``` -Type definitions are standard golang type definitions with golang json tags and a set of additional k8s/kubebuilder +Type definitions are standard Golang type definitions with Golang JSON tags and a set of additional k8s/kubebuilder markers triggering code generators for: -- DeepCopy func, making input/output parameters types to satisfy the runtime.object interface. -- Conversion func from older releases of the Runtime Hook input/output parameters types to the latest one. -- OpenAPI schema’s definition for each type. +- DeepCopy functions, so that request and response parameter types satisfy the `runtime.Object` interface. +- Conversion functions from older apiVersions of the Runtime Hook request and response parameter types to the latest one. +- OpenAPI schema definitions for each type. ```go +// BeforeClusterUpgradeRequest is the request of the BeforeClusterUpgrade hook. // +k8s:openapi-gen=true +// +kubebuilder:object:generate=true // +kubebuilder:object:root=true -type BeforeUpgradeInput struct { -metav1.TypeMeta `json:",inline"` - ... +type BeforeClusterUpgradeRequest struct { + metav1.TypeMeta `json:",inline"` + + ... +} + +// BeforeClusterUpgradeResponse is the response of the BeforeClusterUpgrade hook. +// +k8s:openapi-gen=true +// +kubebuilder:object:generate=true +// +kubebuilder:object:root=true +type BeforeClusterUpgradeResponse struct { + metav1.TypeMeta `json:",inline"` + + ... } + +// BeforeClusterUpgrade is the hook that will be called after a Cluster.spec.version is upgraded and +// before the updated version is propagated to the underlying objects. +func BeforeClusterUpgrade(*BeforeClusterUpgradeRequest, *BeforeClusterUpgradeResponse) {} ``` -The code generators are https://github.com/kubernetes-sigs/controller-tools and https://github.com/kubernetes/kube-openapi; -the expected output will be something similar to: +The code generators are https://github.com/kubernetes-sigs/controller-tools and https://github.com/kubernetes/kube-openapi; +the expected output will be similar to: ``` /runtime/contract/cluster/v1alpha1 ├── groupversion_info.go -├── before_upgrade_types.go +├── lifecyclehooks_types.go ├── zz_generated.conversion.go ├── zz_generated.deepcopy.go └── zz_generated.openapi.go ``` Similarly to what happens for API types and api-machinery schema, the type definitions inside every version folder -have to be added to a **catalog**, but with few notable differences: +have to be added to a `Catalog`, but with a few notable differences: -- The Runtime Hooks tracks mapping between a group/version/hook and its own corresponding input/output types - (group/version/input-kind and group/version/output-kind). +- The `Catalog` tracks mapping between a group/version/hook and its own corresponding request/response types + (group/version/request-GVK and group/version/response-GVK). - Type conversions are allowed between objects with the same group/hook (instead of being in a “flat type-space” like in the api-machinery schema). -_Note: this is pseudo code, meant to demonstrate that registering an Runtime Hook is similar to registering an API type; -the exact details will be defined during implementation._ - +`groupversion_info.go`: ```go var ( - // GroupVersion is group version identifying Runtime Hooks defined in this package - // and their request and response types. - GroupVersion = catalog.GroupVersion{Group: "cluster.runtime.cluster.x-k8s.io", Version: "v1alpha1"} - - // catalogBuilder is used to add Runtime Hooks and their input and output types - // to a Catalog. - catalogBuilder = catalog.NewBuilder(GroupVersion) - - // AddToCatalog adds Runtime Hooks defined in this package and their input and - // output types to a catalog. - AddToCatalog = catalogBuilder.AddToCatalog - - // localSchemeBuilder provides access to the SchemeBuilder used for managing Runtime Hooks - // input and output types defined in this package. - // NOTE: this object is required to allow registration of automatically generated - // conversions func. - localSchemeBuilder = catalogBuilder.SchemeBuilder + // GroupVersion is the group version identifying Runtime Hooks defined in this package + // and their request and response types. + GroupVersion = schema.GroupVersion{Group: "hooks.runtime.cluster.x-k8s.io", Version: "v1alpha1"} + + // catalogBuilder is used to add Runtime Hooks and their request and response types + // to a Catalog. + catalogBuilder = &runtimecatalog.Builder{GroupVersion: GroupVersion} + + // AddToCatalog adds Runtime Hooks defined in this package and their request and + // response types to a catalog. + AddToCatalog = catalogBuilder.AddToCatalog + + // localSchemeBuilder provide access to the SchemeBuilder used for managing Runtime Hooks + // and their request and response types defined in this package. + // NOTE: This object is required to allow registration of automatically generated + // conversions func. + localSchemeBuilder = catalogBuilder ) func init() { - // Add Open API definitions for Runtime Hooks input and output types in this package - // NOTE: the GetOpenAPIDefinitions func is automatically generated by openapi-gen. - catalogBuilder.OpenAPIDefinitions(GetOpenAPIDefinitions) + // Add Open API definitions for RuntimeHooks request and response types in this package + // NOTE: the GetOpenAPIDefinitions func is automatically generated by openapi-gen. + catalogBuilder.RegisterOpenAPIDefinitions(GetOpenAPIDefinitions) +} +``` - // Register Runtime Hooks defined in this package and their input and output types. - catalogBuilder.RegisterHook(&BeforeUgradeHook{}, &BeforeUgradeInput{}, &BeforeUgradeOutput{}) +`lifecyclehooks_types.go`: +```go +func init() { + // Register Runtime Hooks defined in this package. + catalogBuilder.RegisterHook(BeforeClusterUpgrade, &runtimecatalog.HookMeta{ + Tags: []string{"Lifecycle Hooks"}, + Summary: "Called before the Cluster is upgraded.", + Description: "This blocking hook is called after the Cluster object has been updated with a new spec.topology.version by the user, and immediately before the new version is propagated to the Control Plane.", + }) } ``` -Given the above definitions, a catalog can finally be created as follow: +Given the above definitions, a catalog can finally be created as follows: ```go var c = catalog.NewCatalog() @@ -723,7 +509,7 @@ responsibility of this controller should be to maintain an internal, shared **re at a given time. Please note that the Runtime Extensions registry also provides a single point to centralize a set of common behaviors -supporting interaction with those external components, thus making the adoption of this feature scalable - +supporting interaction with those external components, thus making the adoption of this feature scalable - in the sense of being used for an increasing numbers of use cases in Cluster API - while operating consistently across the board. @@ -732,7 +518,7 @@ in case of errors, thus preventing Cluster API from creating pressure on HTTP Se ongoing operational issues. Another cross-cutting concern is about ensuring that Runtime Extensions, which are external components triggered -in the middle of Cluster API controllers logic, do not block the reconciliation process indefinitely +in the middle of Cluster API controllers logic, do not block the reconciliation process indefinitely (e.g by enforcing a maximum timeout for all the Runtime Extensions calls). ### Calling Runtime Extensions @@ -744,51 +530,76 @@ Cluster API is going to implement calls to registered Runtime Extensions at well The two key elements that make the implementation of runtime extension calls simple and consistent across the codebase are: -- The Runtime Hook catalog, providing the info about all the defined Runtime Hooks, supported version and - corresponding input/output types; -- The Runtime Extensions registry, providing info about the registered Runtime Extensions implementing the - Runtime Hooks defined above. - -Given these two elements, the code for calling Runtime Extensions is: +- The catalog, providing the info about all the defined Runtime Hooks, supported version and + corresponding request/response types; +- The client, implementing the call to a Runtime Extension. -_Note: this is pseudo code, meant to demonstrate a set of elements described below the example; the exact details -will be defined during implementation, possibly taking advantage of golang generics._ +Given these two elements, the code for calling a Runtime Extension is: +`main.go`: ```go -extensions := registry.Get( - registry.Group("cluster.runtime.cluster.x-k8s.io/v1alpha2”), - registry.Hook(&v1alpha2.BeforeUgradeHook{}), +var ( + // Create a Catalog. + catalog = runtimecatalog.New() + ... ) -for _, e := range extensions{ - client := catalogHTTP.NewClientBuilder(). - WithCatalog(c). - Host(e.host). - Build() +func init() { + ... + // Register the RuntimeHook types into the catalog. + _ = runtimehooksv1.AddToCatalog(catalog) + ... +} - hook := &v1alpha2.BeforeUgradeHook{} - in := &v1alpha2.BeforeUgradeInput{First: 1, Second: "Hello CAPI Runtime Extensions!"} - out := &v1alpha2.BeforeUgradeOutput{} - if err := client.Extension(hook, catalogHTTP.SpecVersion(r.version)).Invoke(ctx, in, out); err != nil { - panic(err) - } +func setupReconcilers(ctx context.Context, mgr ctrl.Manager) { + ... + // Setup the runtime client. + runtimeClient = runtimeclient.New(runtimeclient.Options{ + Catalog: catalog, + Registry: runtimeregistry.New(), + Client: mgr.GetClient(), + }) + ... + // Pass the runtime client to a reconciler. + if err := (&controllers.ClusterTopologyReconciler{ + Client: mgr.GetClient(), + APIReader: mgr.GetAPIReader(), + RuntimeClient: runtimeClient, + UnstructuredCachingClient: unstructuredCachingClient, + WatchFilterValue: watchFilterValue, + }).SetupWithManager(ctx, mgr, concurrency(clusterTopologyConcurrency)); err != nil { + setupLog.Error(err, "unable to create controller", "controller", "ClusterTopology") + os.Exit(1) + } + ... +} +``` - // Do something with the output e.g. proceed with upgrade or block +`cluster_controller.go`: +```go + // Call BeforeClusterCreate Runtime Extensions. + hookRequest := &runtimehooksv1.BeforeClusterCreateRequest{ + Cluster: *s.Current.Cluster, + } + hookResponse := &runtimehooksv1.BeforeClusterCreateResponse{} + if err := r.RuntimeClient.CallAllExtensions(ctx, runtimehooksv1.BeforeClusterCreate, s.Current.Cluster, hookRequest, hookResponse); err != nil { + return ctrl.Result{}, err + } } ``` A couple of elements are worth noting: -- Registered Runtime Extensions are returned by group and hook; this will also include Runtime Extensions - implementing older versions of the same Runtime Hook; -- The call is implemented using the last version of the Runtime Hook/Input/Output types; the Invoke function - will take care of version conversions, if required. +- `CallAllExtensions` will call all registered Runtime Extensions of the corresponding group and hook. + This will also include Runtime Extensions implementing older versions of the same Runtime Hook. +- The call is implemented using the latest version of the Runtime Hook/request/response types; the + `CallAllExtensions` function will take care of version conversions, if required. ## Security Model Following threats were considered: -- Malicious Runtime Extensions being registered +- Malicious Runtime Extensions being registered Mitigation: The same mitigations used for avoiding malicious dynamic webhooks in Kubernetes apply (defining RBAC rules for the ExtensionConfig assigning this responsibility to cluster admin only). @@ -833,10 +644,10 @@ However, rules for evolving Runtime Hook across Cluster API versions are introdu ### Test Plan -While in alpha phase it is expected that the Runtime SDK will have unit tests covering all the main components: +While in alpha phase it is expected that the Runtime SDK will have unit tests covering all the main components: catalog, discovery controller, tooling. -With the increasing adoption of this feature, we expect more unit tests, integration tests and E2E tests +With the increasing adoption of this feature, we expect more unit tests, integration tests and E2E tests to be added covering specific Runtime Hooks. ### Graduation Criteria @@ -852,28 +663,28 @@ See upgrade strategy. ### Runtime SDK rules -**Rule #1: Runtime Hooks and input/output parameter elements may only be removed by incrementing the version of the +**Rule #1: Runtime Hooks and request/response parameter elements may only be removed by incrementing the version of the Runtime Hook.** -Once a Runtime Hook or a Runtime Hook input/output parameter element has been added to a particular version, +Once a Runtime Hook or a Runtime Hook request/response parameter element has been added to a particular version, it can not be removed from that version or have its behavior significantly changed. -**Rule #2 Runtime Hook’s input parameters must be down-convertible, output parameters must be up-convertible. +**Rule #2 Runtime Hook’s request parameters must be down-convertible, response parameters must be up-convertible. Most specifically** -- input parameters must be able to be down-converted from the latest version to previous versions of the same +- request parameters must be able to be down-converted from the latest version to previous versions of the same Runtime Hook; this might imply information loss, but the behavior of the previous version of the Runtime Hook must not be affected by this. -- output parameters must be able to be up-converted from previous versions to current versions of the same +- response parameters must be able to be up-converted from previous versions to current versions of the same Runtime Hook; this means that new information should be nullable or have defaults. -For example assume that we have a Cluster.BeforeUpgrade Runtime Hook with version `v1alpha1` and `v1alpha2`; -In order to avoid duplicating code, Cluster API internally will always work at the latest version, `v1alpha2` +For example assume that we have a `BeforeClusterUpgrade` Runtime Hook with version `v1alpha1` and `v1alpha2`; +In order to avoid duplicating code, Cluster API internally will always work at the latest version, `v1alpha2` in the example, but there could be still a deployed Runtime Extension on `v1alpha1`. This rule makes it possible to call the Runtime Extensions still using the `v1alpha1` by ensuring it is possible -to down-converting the input parameter for the `v1alpha2` call implemented in CAPI, make the call, and then -up-converting the `v1alpha1` output parameter to the v1alpha2 version `CAPI` expects. +to down-converting the request parameter for the `v1alpha2` call implemented in CAPI, make the call, and then +up-converting the `v1alpha1` response parameter to the v1alpha2 version `CAPI` expects. **Rule #3: A Runtime Hook version in a given track may not be deprecated until a new version at least as stable is released.** @@ -898,19 +709,19 @@ When invoked the discovery hook is expected to provide the following answer: ```yaml status: Success # or Failure -message: "error message if status == failure" -items: # Info about implemented runtime extensions +message: "error message if status == Failure" +handlers: # Info about implemented runtime extensions - name: http-proxy # Unique name identifying the runtime extension - hook: + requestHook: apiVersion: "hook.runtime.cluster.x-k8s.io/v1alpha1" - name: "generatePatches" + hook: "generatePatches" timeoutSeconds: 5 # Default value suggested by the RuntimeExtension developers failurePolicy: Fail # Default value suggested by the RuntimeExtension developers - ... ``` Please note that the above struct supports defining more than one Runtime Extension for the same hook, e.g. -defining more than one "generatePatches" extensions. +defining more than one "generatePatches" extensions. ## Implementation History diff --git a/docs/proposals/20220330-topology-mutation-hook.md b/docs/proposals/20220330-topology-mutation-hook.md index d9abf0049e8f..5abb5cc971e5 100644 --- a/docs/proposals/20220330-topology-mutation-hook.md +++ b/docs/proposals/20220330-topology-mutation-hook.md @@ -33,8 +33,7 @@ superseded-by: * [ClusterClass author guide](#clusterclass-author-guide) * [Developer guide](#developer-guide) * [Cluster topology reconciliation](#cluster-topology-reconciliation) - * [GeneratePatches Hook](#generatepatches-hook) - * [ValidateTopology Hook](#validatetopology-hook) + * [Definitions](#definitions) * [Guidelines](#guidelines) * [clusterctl alpha topology plan](#clusterctl-alpha-topology-plan) * [Security Model](#security-model) @@ -169,115 +168,22 @@ This section provides guidance for developers on the implementation of an Extern #### Cluster topology reconciliation -This section documents when the Topology Mutation Hook is going to be called during each Cluster topology reconciliation. More specifically we are going to call two different hooks for each reconciliation: - -* **GeneratePatches**: GeneratePatches is responsible for generating patches for the entire Cluster topology. -* **[optional] ValidateTopology**: ValidateTopology is called after all patches have been applied and thus allows the External Patch Extension developer to validate the resulting objects. - - **Note**: ValidateTopology is optional, i.e. it will be only called if an External Patch Extension implements it (and returns it during discovery). +This section documents when the Topology Mutation Hook is going to be called during each Cluster topology reconciliation. ![Cluster topology reconciliation](./images/topology-mutation-hook/topology-reconciliation.png) -#### GeneratePatches Hook - -A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all templates, the global variables and the template-specific variables. The response contains generated patches. - -Example request: -* Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips. -* Each item in the request will contain the template as a raw object. Additionally information about where the template is used is provided via `holderReference`. -```yaml -variables: -- name: - value: - ... -items: -- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 - holderReference: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: MachineDeployment - namespace: default - name: cluster-md1-xyz - fieldPath: spec.template.spec.infrastructureRef - object: - apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 - kind: AWSMachineTemplate - spec: - ... - variables: - - name: - value: - ... -``` - -Example response: -* The response contains patches instead of full objects to reduce the payload. -* Templates in the request and patches in the response will be correlated via UIDs. -* Like for inline patches external patches are only allowed to change `spec.template.spec`. -```yaml -status: Success # or Failure -message: "error message if status == Failure" -items: -- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9 - patchType: JSONPatch - patch: -``` - -The full OpenAPI specification (draft) of the GeneratePatches hook can be seen via the [Swagger Editor](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/topology-mutation-hook/runtime-sdk-openapi.yaml). - -During implementation we will consider introducing a library to facilitate development of External Patch Extensions. It will provide capabilities like: -* Access builtin variables -* Extract certain templates from a GeneratePatches request (e.g. all bootstrap templates) +The remainder of this section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#introduction) +to avoid duplication. -#### ValidateTopology Hook - -A ValidateTopology call validates the topology after all patches have been applied. The request contains all templates of the Cluster topology, the global variables and the template-specific variables. The response contains the result of the validation. - -Example request: -* The request is the same as the GeneratePatches request except the `uid` fields. We don't - need them as we don't have to correlate anything in the response. -```yaml -variables: -- name: - value: - ... -items: -- holderReference: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: MachineDeployment - namespace: default - name: cluster-md1-xyz - fieldPath: spec.template.spec.infrastructureRef - object: - apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 - kind: AWSMachineTemplate - spec: - ... - variables: - - name: - value: - ... -``` - -Example response: -```yaml -status: Success # or Failure -message: "error message if status == Failure" -``` - -The full OpenAPI specification (draft) of the ValidateTopology hook can be seen via the [Swagger Editor](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/topology-mutation-hook/runtime-sdk-openapi.yaml). +#### Definitions +This section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#definitions) +to avoid duplication. #### Guidelines -For general Runtime Extension developer guidelines please refer to the [developer guide in the Runtime SDK proposal](https://github.com/kubernetes-sigs/cluster-api/blob/75b39db545ae439f4f6203b5e07496d3b0a6aa75/docs/proposals/20220221-runtime-SDK.md#runtime-extensions-developer-guide). This section outlines guidelines specific to External Patch Extensions: - -* **Input validation**: An External Patch Extension must always validate its input, i.e. it must validate that all variables exist and have the right type and it must validate the kind and apiVersion of the templates which should be patched. -* **Timeouts**: As External Patch Extensions are called during each Cluster topology reconciliation, they must respond as fast as possible (<=200ms) to avoid delaying individual reconciles and congestion. -* **Availability**: An External Patch Extension must be always available, otherwise Cluster topologies won’t be reconciled anymore. -* **Side Effects**: An External Patch Extension must not make out-of-band changes. If necessary external data can be retrieved, but be aware of performance impact. -* **Deterministic results**: For a given request (a set of templates and variables) an External Patch Extension must always return the same response (a set of patches). Otherwise the Cluster topology will never reach a stable state. -* **Idempotence**: An External Patch Extension must only return patches if changes to the templates are required, i.e. unnecessary patches when the template is already in the desired state must be avoided. -* **Avoid Dependencies**: An External Patch Extension must be independent of other External Patch Extensions. However if dependencies cannot be avoided, it is possible to control the order in which patches are executed via the ClusterClass. +This section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-topology-mutation-hook.md#guidelines) +to avoid duplication. #### clusterctl alpha topology plan diff --git a/docs/proposals/20220414-runtime-hooks.md b/docs/proposals/20220414-runtime-hooks.md index 3ba27cc48007..25b9d97c7823 100644 --- a/docs/proposals/20220414-runtime-hooks.md +++ b/docs/proposals/20220414-runtime-hooks.md @@ -29,12 +29,6 @@ superseded-by: * [Proposal](#proposal) * [User Stories](#user-stories) * [Runtime hook definitions](#runtime-hook-definitions) - * [Before Cluster Create](#before-cluster-create) - * [After Control Plane Initialized](#after-control-plane-initialized) - * [Before Cluster Upgrade](#before-cluster-upgrade) - * [After Control Plane Upgrade](#after-control-plane-upgrade) - * [After Cluster Upgrade](#after-cluster-upgrade) - * [Before Cluster Delete](#before-cluster-delete) * [Runtime Extensions developer guide](#runtime-extensions-developer-guide) * [Security Model](#security-model) * [Risks and Mitigations](#risks-and-mitigations) @@ -126,227 +120,13 @@ Below is a description for the Runtime Hooks introduced by this proposal. ![runtime-hooks](images/runtime-hooks/runtime-hooks.png) - -#### Before Cluster Create - -This hook is called after the Cluster object has been created by the user, immediately before all the objects which are part of a Cluster topology(*) are going to be created. Runtime Extension implementers can use this hook to determine/prepare add-ons for the Cluster and block the creation of those objects until everything is ready. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterCreateRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterCreateResponse -status: Success -message: "error message if status == Failure" -retryAfterSeconds: 10 -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - -(*) The objects which are part of a Cluster topology are the infrastructure Cluster, the control plane, the MachineDeployments and the templates derived from the ClusterClass. - - -#### After Control Plane Initialized - -This hook is called after the ControlPlane for the Cluster is marked as available for the first time. Runtime Extension implementers can use this hook to execute tasks, for example component installation on workload clusters, that are only possible once the Control Plane is available. This hook does not block any further changes to the Cluster. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterControlPlaneInitializedRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterControlPlaneInitializedResponse -status: Success -message: "error message if status == Failure" -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - - -#### Before Cluster Upgrade - -This hook is called after the Cluster object has been updated with a new spec.topology.version by the user, and immediately before the new version is going to be propagated to the control plane (*). Runtime Extension implementers can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane and Workers. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterUpgradeRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -fromKubernetesVersion: "v1.21.2" -toKubernetesVersion: "v1.22.0" -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterUpgradeResponse -status: Success -message: "error message if status == Failure" -retryAfterSeconds: 10 -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - -* Under normal circumstances spec.topology.version gets propagated to the control plane immediately; however if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations to complete before starting the new upgrade. - -#### After Control Plane Upgrade - -This hook is called after the control plane has been upgraded to the version specified in spec.topology.version, and immediately before the new version is going to be propagated to the MachineDeployments existing in the Cluster. Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to workers until everything is ready. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterControlPlaneUpgradeRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -kubernetesVersion: "v1.22.0" -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterControlPlaneUpgradeResponse -status: Success -message: "error message if status == Failure" -retryAfterSeconds: 10 -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - - -#### After Cluster Upgrade - -This hook is called after the Cluster, control plane and workers have been upgraded to the version specified in spec.topology.version. Runtime Extensions implementers can use this hook to execute post-upgrade add-on tasks. This hook does not block any further changes or upgrades to the Cluster. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterClusterUpgradeRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -kubernetesVersion: "v1.22.0" -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: AfterClusterUpgradeResponse -status: Success -message: "error message if status == Failure" -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - -#### Before Cluster Delete - -This hook is called after the Cluster has been deleted by the user, and immediately before objects existing in the Cluster are going to be deleted. Runtime Extension implementers can use this hook to execute cleanup tasks for the add-ons and block deletion of the Cluster and descendant objects until everything is ready. - -##### Example Request: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterDeleteRequest -cluster: - apiVersion: cluster.x-k8s.io/v1beta1 - kind: Cluster - metadata: - name: test-cluster - namespace: test-ns - spec: - ... - status: - ... -``` - -##### Example Response: - -```yaml -apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1 -kind: BeforeClusterDeleteResponse -status: Success -message: "error message if status == Failure" -retryAfterSeconds: 10 -``` - -For additional details, refer to the [Draft OpenAPI spec](https://editor.swagger.io/?url=https://raw.githubusercontent.com/kubernetes-sigs/cluster-api/main/docs/proposals/images/runtime-hooks/runtime-hooks-openapi.yaml). - +The remainder of this section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md#definitions) +to avoid duplication. ### Runtime Extensions developer guide -All guidelines defined in the [Runtime SDK](https://github.com/kubernetes-sigs/cluster-api/blob/b48a6ed07ac2bd353f99000270510369f4baa1a5/docs/proposals/20220221-runtime-SDK.md) apply to the implementation of Runtime Extensions of the hooks defined in this proposal. - -TL;DR; Runtime Extensions are components that should be designed, written and deployed with great caution given that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could potentially block lifecycle transitions from happening. - -Following recommendations are especially relevant: - -* [Blocking and non Blocking](https://github.com/kubernetes-sigs/cluster-api/blob/b48a6ed07ac2bd353f99000270510369f4baa1a5/docs/proposals/20220221-runtime-SDK.md#blocking-hooks) -* [Error management](https://github.com/kubernetes-sigs/cluster-api/blob/b48a6ed07ac2bd353f99000270510369f4baa1a5/docs/proposals/20220221-runtime-SDK.md#error-management) -* [Avoid dependencies](https://github.com/kubernetes-sigs/cluster-api/blob/b48a6ed07ac2bd353f99000270510369f4baa1a5/docs/proposals/20220221-runtime-SDK.md#avoid-dependencies) - +This section has been moved to the Cluster API [book](../../docs/book/src/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.md#guidelines) +to avoid duplication. ### Security Model diff --git a/test/extension/main.go b/test/extension/main.go index 8acd24bfa38b..e60b9d98eef9 100644 --- a/test/extension/main.go +++ b/test/extension/main.go @@ -45,7 +45,7 @@ var ( setupLog = ctrl.Log.WithName("setup") - // flags. + // Flags. profilerAddress string webhookPort int webhookCertDir string @@ -197,7 +197,7 @@ func main() { os.Exit(1) } - setupLog.Info("starting RuntimeExtension", "version", version.Get().String()) + setupLog.Info("Starting Runtime Extension server", "version", version.Get().String()) if err := webhookServer.Start(ctx); err != nil { setupLog.Error(err, "error running webhook server") os.Exit(1)