diff --git a/docs/proposals/20220725-managed-kubernetes.md b/docs/proposals/20220725-managed-kubernetes.md index 9012d0c3355f..3e19df72e709 100644 --- a/docs/proposals/20220725-managed-kubernetes.md +++ b/docs/proposals/20220725-managed-kubernetes.md @@ -72,10 +72,10 @@ superseded-by: - **Unmanaged Kubernetes** - a Kubernetes cluster where a cluster admin is responsible for provisioning and operating the control plane and worker nodes. In Cluster API this traditionally means a Kubeadm bootstrapped cluster on infrastructure machines (virtual or physical). - **Managed Worker Node** - an individual Kubernetes worker node where the underlying compute (vm or bare-metal) is provisioned and managed by the service provider. This usually includes the joining of the newly provisioned node into a Managed Kubernetes cluster. The lifecycle is normally controlled via a higher level construct such as a Managed Node Group. - **Managed Node Group** - is a service that a service provider offers that automates the provisioning of managed worker nodes. Depending on the service provider this group of nodes could contain a fixed number of replicas or it might contain a dynamic pool of replicas that auto-scales up and down. Examples are Node Pools in GCP and EKS managed node groups. -- **Cluster Infrastructure Provider (Infrastructure)** - an Infrastructure provider supplies whatever prerequisites are necessary for creating & running clusters such as networking, load balancers, firewall rules, and so on. ([docs](https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html)) -- **ControlPlane Provider (ControlPlane)** - a control plane provider instantiates a Kubernetes control plane consisting of k8s control plane components such as kube-apiserver, etcd, kube-scheduler and kube-controller-manager. ([docs](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html#control-plane-provider)) -- **MachineDeployment** - a MachineDeployment orchestrates deployments over a fleet of MachineSets, which is an immutable abstraction over Machines. ([docs](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-deployment.html)) -- **MachinePool (experimental)** - a MachinePool is similar to a MachineDeployment in that they both define configuration and policy for how a set of machines are managed. While the MachineDeployment uses MachineSets to orchestrate updates to the Machines, MachinePool delegates the responsibility to a cloud provider specific resource such as AWS Auto Scale Groups, GCP Managed Instance Groups, and Azure Virtual Machine Scale Sets. ([docs](https://github.com/kubernetes-sigs/cluster-api/blob/bf51a2502f9007b531f6a9a2c1a4eae1586fb8ca/docs/proposals/20190919-machinepool-api.md)) +- **Cluster Infrastructure Provider (Infrastructure)** - an Infrastructure provider supplies whatever prerequisites are necessary for creating & running clusters such as networking, load balancers, firewall rules, and so on. ([docs](../book/src/developer/providers/cluster-infrastructure.md)) +- **ControlPlane Provider (ControlPlane)** - a control plane provider instantiates a Kubernetes control plane consisting of k8s control plane components such as kube-apiserver, etcd, kube-scheduler and kube-controller-manager. ([docs](../book/src/developer/architecture/controllers/control-plane.html#control-plane-provider)) +- **MachineDeployment** - a MachineDeployment orchestrates deployments over a fleet of MachineSets, which is an immutable abstraction over Machines. ([docs](../book/src/developer/architecture/controllers/machine-deployment.html)) +- **MachinePool (experimental)** - a MachinePool is similar to a MachineDeployment in that they both define configuration and policy for how a set of machines are managed. While the MachineDeployment uses MachineSets to orchestrate updates to the Machines, MachinePool delegates the responsibility to a cloud provider specific resource such as AWS Auto Scale Groups, GCP Managed Instance Groups, and Azure Virtual Machine Scale Sets. ([docs](./20190919-machinepool-api.md)) ## Summary @@ -98,34 +98,38 @@ A good example here is the API server load balancer: ### Goals -- Provide a recommendation of a consistent approach for representing managed Kubernetes services in CAPI for new implementations. -- Reach a consensus on how ClusterClass should be supported for managed Kubernetes -- As a result of the recommendations of this proposal we should update the [Provider Implementers](https://cluster-api.sigs.k8s.io/developer/providers/implementers.html) documentation to aid with future provider implementations. +- Provide a recommendation of a consistent approach for representing Managed Kubernetes services in CAPI for new implementations. + - It would be ideal for there to be consistency between providers when it comes to representing Managed Kubernetes services. However, it's unrealistic to ask providers to refactor their existing implementations. +- Ensure the recommendation provides a working model for Managed Kubernetes integration with ClusterClass. +- As a result of the recommendations of this proposal we should update the [Provider Implementers](../book/src/developer/providers/implementers.md) documentation to aid with future provider implementations. ### Non-Goals/Future Work -- Require the existing implementations in CAPA & Cluster API Provider Azure (CAPZ) to converge on the chosen approach. - - It would be ideal for there to be consistency between providers when it comes to representing managed Kubernetes services. However, it's unrealistic to ask providers to refactor their implementations. - - If providers such as CAPA & CAPZ would like to follow the guidance then discussions should be facilitated. +- Enforce the Managed Kubernetes recommendations as a requirement for Cluster API providers when they implement Managed Kubernetes. + - If providers that have already implemented Managed Kubernetes and would like guidance on if/how they could move to be aligned with the recommendations of this proposal then discussions should be facilitated. - Provide advice in this proposal on how to refactor the existing implementations of managed Kubernetes in CAPA & CAPZ. - Propose a new architecture or API changes to CAPI for managed Kubernetes - Be a concrete design for the GKE implementation in Cluster API Provider GCP (CAPG). - - A separate proposal will be created for CAPG based on the recommendations of this proposal. + - A separate CAPG proposal will be created for GKE implementation based on the recommendations of this proposal. ## Proposal ### Personas #### Cluster Service Provider + The user hosting cluster control planes, responsible for up-time, UI for fleet wide alerts, configuring a cloud account to host control planes in, views user provisioned infra (available compute). Has cluster admin management. #### Cluster Service Consumer + A user empowered to request control planes, request workers to a service provider, and drive upgrades or modify externalized configuration. #### Cluster Admin + A user with cluster-admin role in the provisioned cluster, but may or may not have power over when/how cluster is upgraded or configured. #### Cluster User + A user who uses a provisioned cluster, which usually maps to a developer. ### User Stories @@ -133,7 +137,7 @@ A user who uses a provisioned cluster, which usually maps to a developer. #### Story 1 As a cluster service consumer, -I want to use Cluster API to provision and manage the lifecycle of a control plane that utilizes my service provider's managed Kubernetes control plane (i.e. EKS, AKS, GKE), +I want to use Cluster API to provision and manage Kubernetes Clusters that utilize my service providers Managed Kubernetes Service (i.e. EKS, AKS, GKE), So that I don’t have to worry about the management/provisioning of control plane nodes, and so I can take advantage of any value add services offered by the service provider. #### Story 2 @@ -171,7 +175,7 @@ So that I can eliminate the responsibility of owning and SREing the Control Plan #### EKS in CAPA -- https://cluster-api-aws.sigs.k8s.io/topics/eks/index.html +- [Docs](https://cluster-api-aws.sigs.k8s.io/topics/eks/index.html) - Feature Status: GA - CRDs - AWSManagedControlPlane - provision EKS cluster @@ -182,19 +186,19 @@ So that I can eliminate the responsibility of owning and SREing the Control Plan - AWSManagedControlPlane with MachinePool / AWSManagedMachinePool - Bootstrap Provider - Cluster API bootstrap provider EKS (CABPE) -* Features +- Features - Provisioning/managing an Amazon EKS Cluster - Upgrading the Kubernetes version of the EKS Cluster - Attaching self-managed machines as nodes to the EKS cluster - Creating a machine pool and attaching it to the EKS cluster (experimental) - Creating a managed machine pool and attaching it to the EKS cluster - Managing “EKS Addons” - - Creating an EKS fargate profile (experimental) + - Creating an EKS Fargate profile (experimental) - Managing aws-iam-authenticator configuration #### AKS in CAPZ -- https://capz.sigs.k8s.io/topics/managedcluster.html +- [Docs](https://capz.sigs.k8s.io/topics/managedcluster.html) - Feature Status: Experimental - CRDs - AzureManagedControlPlane, AzureManagedCluster - provision AKS cluster @@ -204,7 +208,7 @@ So that I can eliminate the responsibility of owning and SREing the Control Plan #### OKE in CAPOCI -- https://github.com/oracle/cluster-api-provider-oci/issues/110 +- [Issue](https://github.com/oracle/cluster-api-provider-oci/issues/110) - Design discussion starting ### Managed Kubernetes API Design Approaches @@ -223,29 +227,29 @@ This option introduces a new single resource kind: ```go type GCPManagedControlPlaneSpec struct { -// Project is the name of the project to deploy the cluster to. -Project string `json:"project"` + // Project is the name of the project to deploy the cluster to. + Project string `json:"project"` -// NetworkSpec encapsulates all things related to the GCP network. -// +optional -Network NetworkSpec `json:"network"` + // NetworkSpec encapsulates all things related to the GCP network. + // +optional + Network NetworkSpec `json:"network"` -// AddonsConfig defines the addons to enable with the GKE cluster. -// +optional -AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` + // AddonsConfig defines the addons to enable with the GKE cluster. + // +optional + AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` -// Logging contains the logging configuration for the GKE cluster. -// +optional -Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` + // Logging contains the logging configuration for the GKE cluster. + // +optional + Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` -// EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled -// +optional -EnableKubernetesAlpha bool + // EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled + // +optional + EnableKubernetesAlpha bool -// ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. -// +optional -ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` -.... + // ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. + // +optional + ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` + .... } ``` @@ -276,41 +280,41 @@ Note that CAPZ had a similar discussion and an [issue](https://github.com/kubern This option introduces 2 new resource kinds: - **GCPManagedControlPlane**: same as in option 1 -- **GCPManagedCluster**: contains the minimum properties in its spec and status to satisfy the [CAPI contract for an infrastructure cluster](https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html) (i.e. ControlPlaneEndpoint, Ready condition). Its controller watches GCPManagedControlPlane and copies the ControlPlaneEndpoint field to GCPManagedCluster to report back to CAPI. This is used as a pass-through layer only. +- **GCPManagedCluster**: contains the minimum properties in its spec and status to satisfy the [CAPI contract for an infrastructure cluster](../book/src/developer/providers/cluster-infrastructure.md) (i.e. ControlPlaneEndpoint, Ready condition). Its controller watches GCPManagedControlPlane and copies the ControlPlaneEndpoint field to GCPManagedCluster to report back to CAPI. This is used as a pass-through layer only. ```go type GCPManagedControlPlaneSpec struct { -// Project is the name of the project to deploy the cluster to. -Project string `json:"project"` + // Project is the name of the project to deploy the cluster to. + Project string `json:"project"` -// NetworkSpec encapsulates all things related to the GCP network. -// +optional -Network NetworkSpec `json:"network"` + // NetworkSpec encapsulates all things related to the GCP network. + // +optional + Network NetworkSpec `json:"network"` -// AddonsConfig defines the addons to enable with the GKE cluster. -// +optional -AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` + // AddonsConfig defines the addons to enable with the GKE cluster. + // +optional + AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` -// Logging contains the logging configuration for the GKE cluster. -// +optional -Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` + // Logging contains the logging configuration for the GKE cluster. + // +optional + Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` -// EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled -// +optional -EnableKubernetesAlpha bool + // EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled + // +optional + EnableKubernetesAlpha bool -// ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. -// +optional -ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` -.... + // ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. + // +optional + ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` + .... } ``` ```go type GCPManagedClusterSpec struct { -// ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. -// +optional -ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` + // ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. + // +optional + ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` } ``` @@ -335,54 +339,54 @@ This option more closely follows the original separation of concerns with the di ```go type GCPManagedControlPlaneSpec struct { -// ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. -// +optional -ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint,omitempty"` + // ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. + // +optional + ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint,omitempty"` -// AddonsConfig defines the addons to enable with the GKE cluster. -// +optional -AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` + // AddonsConfig defines the addons to enable with the GKE cluster. + // +optional + AddonsConfig *AddonsConfig `json:"addonsConfig,omitempty"` -// Logging contains the logging configuration for the GKE cluster. -// +optional -Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` + // Logging contains the logging configuration for the GKE cluster. + // +optional + Logging *ControlPlaneLoggingSpec `json:"logging,omitempty"` -// EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled -// +optional -EnableKubernetesAlpha bool + // EnableKubernetesAlpha will indicate the kubernetes alpha features are enabled + // +optional + EnableKubernetesAlpha bool -... + ... } ``` ```go type GCPManagedClusterSpec struct { -// Project is the name of the project to deploy the cluster to. -Project string `json:"project"` + // Project is the name of the project to deploy the cluster to. + Project string `json:"project"` -// The GCP Region the cluster lives in. -Region string `json:"region"` + // The GCP Region the cluster lives in. + Region string `json:"region"` -// ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. -// +optional -ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` + // ControlPlaneEndpoint represents the endpoint used to communicate with the control plane. + // +optional + ControlPlaneEndpoint clusterv1.APIEndpoint `json:"controlPlaneEndpoint"` -// NetworkSpec encapsulates all things related to the GCP network. -// +optional -Network NetworkSpec `json:"network"` + // NetworkSpec encapsulates all things related to the GCP network. + // +optional + Network NetworkSpec `json:"network"` -// FailureDomains is an optional field which is used to assign selected availability zones to a cluster -// FailureDomains if empty, defaults to all the zones in the selected region and if specified would override -// the default zones. -// +optional -FailureDomains []string `json:"failureDomains,omitempty"` + // FailureDomains is an optional field which is used to assign selected availability zones to a cluster + // FailureDomains if empty, defaults to all the zones in the selected region and if specified would override + // the default zones. + // +optional + FailureDomains []string `json:"failureDomains,omitempty"` -// AdditionalLabels is an optional set of tags to add to GCP resources managed by the GCP provider, in addition to the -// ones added by default. -// +optional -AdditionalLabels Labels `json:"additionalLabels,omitempty"` + // AdditionalLabels is an optional set of tags to add to GCP resources managed by the GCP provider, in addition to the + // ones added by default. + // +optional + AdditionalLabels Labels `json:"additionalLabels,omitempty"` -... + ... } ``` @@ -449,40 +453,40 @@ Some cloud providers also offer Managed Node Groups as part of their Managed Kub There are 2 different ways to represent a group of machines in CAPI: - **MachineDeployments** - you specify the number of replicas of a machine template and CAPI will manage the creation of immutable Machine-Infrastructure Machine pairs via MachineSets. The user is responsible for explicitly declaring how many machines (a.k.a replicas) they want and these are provisioned and joined to the cluster. -- **MachinePools** - are similar to MachineDeployments in that they specify a number of machine replicas to be created and joined to the cluster. However, instead of using MachineSets to manage the lifecycle of individual machines a provider implementer utilses a cloud provided solution to manage the lifecycle of the individual machines instead. Generally with a pool you don’t have to define an exact amount of replicas and instead you have the option to supply a minimum and maximum number of nodes and let the cloud service manage the scaling up and down the number of replicas/nodes. Examples of cloud provided solutions are Auto Scale Groups (ASG) in AWS and Virtual Machine Scale Sets (VMSS) in Azure. +- **MachinePools** - are similar to MachineDeployments in that they specify a number of machine replicas to be created and joined to the cluster. However, instead of using MachineSets to manage the lifecycle of individual machines a provider implementer utilizes a cloud provided solution to manage the lifecycle of the individual machines instead. Generally with a pool you don’t have to define an exact amount of replicas and instead you have the option to supply a minimum and maximum number of nodes and let the cloud service manage the scaling up and down the number of replicas/nodes. Examples of cloud provided solutions are Auto Scale Groups (ASG) in AWS and Virtual Machine Scale Sets (VMSS) in Azure. With the implementation of a managed node group the cloud provider is responsible for managing the lifecycle of the individual machines that are used as nodes. This implies that a machine pool representation is needed which utilises a cloud provided solution to manage the lifecycle of machines. -For our example, GCP offers Node Pools that will manage the lifecycle of a pool machines that can scale up and down. We can use this service to implement machine pools: +For our example, GCP offers Node Pools that will manage the lifecycle of a pool of machines that can scale up and down. We can use this service to implement machine pools: ```go type GCPManagedMachinePoolSpec struct { -// Location specifies where the nodes should be created. -Location []string `json:"location"` + // Location specifies where the nodes should be created. + Location []string `json:"location"` -// The Kubernetes version for the node group. -Version string `json:"version"` + // The Kubernetes version for the node group. + Version string `json:"version"` -// MinNodeCount is the minimum number of nodes for one location. -MinNodeCount int `json:"minNodeCount"` + // MinNodeCount is the minimum number of nodes for one location. + MinNodeCount int `json:"minNodeCount"` -// MaxNodeCount is the maximum number of nodes for one location. -MaxNodeCount int `json:"minNodeCount"` + // MaxNodeCount is the maximum number of nodes for one location. + MaxNodeCount int `json:"minNodeCount"` -... + ... } ``` ### Provider Implementers Documentation -Its recommended that changes are made to the [Provider Implementers documentation](https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html) based on the recommending approach for representing managed Kubernetes in Cluster API. +Its recommended that changes are made to the [Provider Implementers documentation](../book/src/developer/providers/cluster-infrastructure.html) based on the recommending approach for representing managed Kubernetes in Cluster API. Some of the areas of change (this is not an exhaustive list): - A new "implementing managed kubernetes" guide that contains details about how to represent a managed Kubernetes service in CAPI. The content will be based on option 3 from this proposal along with other considerations such as managed node and addon management. -- Update the [Provider contracts documentation](https://cluster-api.sigs.k8s.io/developer/providers/contracts.html) to state that the same kind should not be used to satisfy 2 different provider contracts. -- Update the [Cluster Infrastructure documentation](https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html) to provide guidance on how to populate the `controlPlaneEndpoint` in the scenario where the control plane creates the api server load balancer. We should include sample code. -- Update the [Control Plane Controller](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/control-plane.html) diagram for managed k8s services case. The Control Plane reconcile needs to start when `InfrastructureReady` is true. +- Update the [Provider contracts documentation](../book/src/developer/providers/contracts.md) to state that the same kind should not be used to satisfy 2 different provider contracts. +- Update the [Cluster Infrastructure documentation](../book/src/developer/providers/cluster-infrastructure.html) to provide guidance on how to populate the `controlPlaneEndpoint` in the scenario where the control plane creates the api server load balancer. We should include sample code. +- Update the [Control Plane Controller](../book/src/developer/architecture/controllers/control-plane.html) diagram for managed k8s services case. The Control Plane reconcile needs to start when `InfrastructureReady` is true. ## Other Considerations for CAPI @@ -493,8 +497,8 @@ Some of the areas of change (this is not an exhaustive list): ### clusterctl integration -- `clusterctl` assumes a minimal set of providers (core, bootstrap, control plane, infra) is required to form a valid management cluster.Currently, it does not expect a single provider being many things at the same time. -- EKS in CAPA has its own control plane provider and a bootstrap provider packaged in a single manager. Moving forward, it will be great to separate them out. +- `clusterctl` assumes a minimal set of providers (core, bootstrap, control plane, infra) is required to form a valid management cluster. Currently, it does not expect a single provider being many things at the same time. +- EKS in CAPA has its own control plane provider and a bootstrap provider packaged in a single manager. Moving forward, it would be great to separate them out. ### Add-ons management @@ -502,8 +506,8 @@ Some of the areas of change (this is not an exhaustive list): - [EKS add-ons](https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html) - [AKS add-ons](https://docs.microsoft.com/en-us/azure/aks/integrations) - CAPA and CAPZ enabled support for cloud provider managed addons via API - - CAPA: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/controlplane/eks/api/v1beta1/awsmanagedcontrolplane_types.go#L155 - - CAPZ: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2095 + - [CAPA](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/controlplane/eks/api/v1beta1/awsmanagedcontrolplane_types.go#L155) + - [CAPZ](https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2095) - Managed Kubernetes implementations should be able to opt-in/opt-out of what will be provided by [CAPI’s add-ons orchestration solution](https://github.com/kubernetes-sigs/cluster-api/issues/5491) ## Upgrade Strategy