-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reference use cases for Cluster API #903
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
75314e8
to
55cdc68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. I have a bunch of small comments, but they're really nits, nothing that would block you committing this.
docs/reference-use-cases.md
Outdated
### Specific Architecture Approaches | ||
- As an operator of a management cluster, given that I give operators of workload clusters access to my management cluster, they can launch new workload clusters with control planes that run in the management cluster while the nodes of those workload clusters run elsewhere. | ||
|
||
- As a multi-cluster operator, I would like to provide an EKS-like experience. Where the workload control plane nodes are joined to the management cluster and the control plane config isn’t exposed to the consumer of the workload cluster. This enables me as an operator to manage the control plane nodes for all clusters using tooling like prometheus and fluentd. I can also control the configuration of the workload control plane in accordance with business policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence beginning with "Where the workload control plane nodes..." is an incomplete sentence. Perhaps "... an EKS-like experience in which the workload control plane nodes..."
docs/reference-use-cases.md
Outdated
|
||
- As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to find general information (name, status, access details) about my clusters across multiple providers. | ||
|
||
- As a multi-cluster operator, I want to know what versions of k8s all of my workload clusters are running across multiple providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The entire document uses "Kubernetes" except for this line, which uses "k8s". Perhaps normalize to Kubernetes.
docs/reference-use-cases.md
Outdated
|
||
- As an operator of a management cluster, given that I have a cluster running Cluster API, I would like to upgrade the Cluster API and provider(s) without the users of Cluster API noticing (e.g. due to their API requests not working). | ||
|
||
- As an operator of a management cluster, I want to know what versions of kublet, control plane, OS, etc all of the associated workload clusters are running, so that I can plan upgrades to the management cluster that will not break anyone’s ability to manage their workload clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kublet -> kubelet
comma after "etc"
docs/reference-use-cases.md
Outdated
- As an operator, I want an external CA to sign certificates for the workload cluster control plane. | ||
|
||
- 🔭As an operator, given I have a management cluster and a workload cluster, I want to rotate all the certificates and credentials my machines are using. | ||
- Some certificates might get rotated as part of machine upgrade, but are otherwise the above is out of scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment makes it seem like the use case is invalid. Is it? I wonder if the either use case of the comment should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it was an attempt to clarify a case where rotating certificates might not be out of scope, compared to the general case where it would be out of scope. That said, it would definitely need more clarity and refinement when transferred to a CAEP
docs/reference-use-cases.md
Outdated
|
||
- As an operator of a management cluster, I want to configure whether operators of workload clusters are allowed to open interactive shells onto those clusters machines. | ||
|
||
## Multi-cluster/Multi-provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a Multi-vendor
section, there are now many different k8s distributions, like IBM Cloud Private, OpenShift, K3S etc, and end user may have requirement to install those distributions but not only k8s native cluster.
I opened an issue at #853
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw we do have mentioned vendor's cluster
at https://github.com/kubernetes-sigs/cluster-api/blob/55cdc685cd0319a07b875d793d94ec54e7441d95/docs/reference-use-cases.md#creating-clusters , this is great!
docs/reference-use-cases.md
Outdated
- Uses Cluster API. | ||
- Cares about keeping config similar between many clusters. | ||
|
||
![Cluster API Use Cases](https://user-images.githubusercontent.com/3118335/56234753-d639e980-603a-11e9-8d45-944c547a3280.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some question for this diagram:
- How about adding a table to describe
Management Cluster
,Control Plane Cluster
,Workload Cluster
? - For the second picture, there is
Workload control plane
running onControl Plane Cluster
, what do you mean for this? I think theWorkload control plane
is the workload Cluster master node, why it is running onControl Plane Cluster
? - I found there is no content mentioning if the cluster is HA or not, and I assume the cluster here should support both HA and non-HA cluster, right? How about highlight this in the document?
- What is the difference of those three pictures? How about adding some description for each picture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to provide additional info , at least some descriptions to 3 maps including their basic info and some difference..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to remove the diagram, it's meant to be just as a reference, most of this work has been done in https://docs.google.com/document/d/13OQjn5lxRyiW9itNPjVuP0aN_JvYJByKznQESQEDDt8/edit#heading=h.8b9zw0k5lf83 I'm not sure who specifically contributed the image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the diagram. makes it easier to see some of the use cases. :)
docs/reference-use-cases.md
Outdated
- 🔭 As an operator, I want an external CA to sign certificates for workload cluster kubelets. | ||
|
||
### Upgrades | ||
- As an operator, given I have a management cluster and a workload cluster, I want to patch the OS running on all of my machines (e.g. for a CVE). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be scope of cluster-api? I assume this is job of cloud-admin (off-premise and on-premise)?
after all, cluster api should not see the operating system it's running on and should not be authorized to do so
otherwise this might be a security hole?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably ok to keep it, patching could mean (in some form) to roll out an image upgrade
docs/reference-use-cases.md
Outdated
|
||
- As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage another Kubernetes cluster (create, upgrade, scale, delete). | ||
|
||
- As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage a vendor’s Kubernetes conformant cluster (create, upgrade, scale, delete). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here just want to confirm: vendor's Kubernetes
means different Kubernetes distributions, like IBM Cloud Privare, OpenShift, K3S etc, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe reuse the wording in scope and objects
https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/scope-and-objectives.md
External: A control plane offered and controlled by some system other than Cluster API (e.g., GKE, AKS, EKS, IKS).
technically those should be similar to ICP/openshift/K3S?
light-PSA: Waiting until Monday/Tuesday to go over the feedback, update the PR and present the resulting document to next week's community meeting. |
docs/reference-use-cases.md
Outdated
|
||
- As an operator, I need to have a way to apply labels to Nodes created through ClusterAPI. This will allow me to partition workloads across Nodes/Machines and MachineDeployments. Examples of labels include datacenter, subnet, and hypervisor, which applications can use in affinity policies. | ||
|
||
- As an operator, I want to be able to provision the nodes of a workload cluster on an existing vnet that I don’t have admin control of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be overlapping with I want to be able to use existing infrastructure
. Shall we merge both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more specific and it's probably fine to keep it
docs/reference-use-cases.md
Outdated
|
||
### Specify Affinity and Antiaffinity Groups of Worker Nodes | ||
|
||
- As an operator, I want to be able to set affinity rules on the deployment of Nodes, making sure Machines are either colocated or not colocated on the same host, rack or other dimension. These rules may span multiple sets of Machines and MachineDeployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use case looks to me similar to the ones in above section, "Creating Clusters". Not sure why this deserves to be under separate section?
docs/reference-use-cases.md
Outdated
|
||
- As an operator, when I create a new cluster using Cluster API, I want to be able to take advantage of resource aware topology (e.g. compute, device availability, etc.) to place machines. | ||
|
||
- As an operator, I need to have a way to make minor customizations before kubelet starts while using a standard node image and standard boot script. Example: write a file, disable hyperthreading, load a kernel module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This point repeats in Configuration Updates
docs/reference-use-cases.md
Outdated
|
||
- As a multi-cluster operator, given that I have a management cluster, I want to create workload clusters across multiple providers that are all similarly configured. | ||
|
||
- As a multi-cluster operator, given that I have deployed my clusters via Cluster API, I want to find general information (name, status, access details) about my clusters across multiple providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that you want to provide the ability of viewing all of the cluster resources in different cloud providers for the multi-cluster operator? Like viewing all of the cluster nodes, pods, applications etc? Or the general information just limited to cluster name, status and access detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this would include what's running in the cluster, but rather it's going to be showing the status of those clusters from the information provided by Cluster API
docs/reference-use-cases.md
Outdated
## Multi-cluster/Multi-provider | ||
|
||
### Managing Providers | ||
- As an operator, given I have a management cluster with at least one provider, I would like to install a new provider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does installing a new provider mean, if scaled, nodes of management cluster can be provisioned by any of the installed providers?
For the control plane of the workload clusters, is this incorrect to say that provider of choice will run as part of control plane pods? Am i incorrect in saying that each workload control plane can run with provider of choice by deploying particular provider actuator in the cluster-api stack?
Signed-off-by: Vince Prignano <[email protected]>
|
||
- As an operator, given I have a Kubernetes-conformant cluster, I would like to install Cluster API and a provider on it in a straight-forward process. | ||
|
||
- As an operator, given I have a management cluster that was deployed by Cluster API (via the pivot workflow), I want to manage the lifecycle of my management cluster using Cluster API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would really appreciate any reference link for pivot workflow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there are any links to it, this is referring to the pivot phase I think. The phase is only documented in code afaik.
|
||
## Table of Contents | ||
|
||
* [Reference Use Cases](#cluster-api-reference-use-cases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ToC, we can use doctoc to generate.
|
||
### Creating Clusters | ||
|
||
- As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage another Kubernetes cluster (create, upgrade, scale, delete). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, my cloud provider may not able to access external network without a proxy, so here I'd like to be able to set a proxy for my provisioned VM to access external network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that's in scope for cluster API to setup the proxy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need use Cluster API set up proxy but it should be able to use proxy.
|
||
### Creating Clusters | ||
|
||
- As an operator, given that I have a cluster running Cluster API, I want to be able to use declarative APIs to manage another Kubernetes cluster (create, upgrade, scale, delete). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, I also want to create a mixed environment, like one cluster including both Ubuntu and CentOS nodes etc.
FYI @hchenxa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be doable with MachineClass / imageID depending on the provider
Since this has been pending for a bit now, lets go ahead and get it merged. Please address any additional followup or additions as a separate PR. /lgtm |
/assign @detiber for final review |
- As an operator, I need to have a way to apply labels to Nodes created through ClusterAPI. This will allow me to partition workloads across Nodes/Machines and MachineDeployments. Examples of labels include datacenter, subnet, and hypervisor, which applications can use in affinity policies. | ||
|
||
- As an operator, I want to be able to provision the nodes of a workload cluster on an existing vnet that I don’t have admin control of. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an operator, if there are no mandatory provider-specific fields, it should be possible to create a cluster by only specifying the provider (i.e. there should be defaults for all required upstream Cluster API types).
Signed-off-by: Vince Prignano <[email protected]>
* Adding steps for how to use existing cluster as bootstrap cluster. (#877) * Add proposals template (#879) Signed-off-by: Vince Prignano <[email protected]> * Clarify how to use kuebconfig. (#869) * Add discuss link to README (#885) Signed-off-by: Vince Prignano <[email protected]> * Fix a broken link to Cluster API KEP (#900) * Add Cluster API project scope and objectives (#882) Signed-off-by: Vince Prignano <[email protected]> * Fix API group name written in kubebuilder's annotation comments (#883) * Update github org for the baremetal provider. (#911) The github org was renamed, so use the new URL for the location of the bare metal cluster-api provider. * Add Talos to list of providers (#915) * Add reference use cases for Cluster API (#903) Signed-off-by: Vince Prignano <[email protected]> * Add missing bullet point to staging-use-cases.md (#920) * Added IBM Cloud to Cluster API README. (#928) * Update documentation links to published content (#927) * Add Cluster API logos from CNCF (#916) * Adding Reference Use Cases to README. (#931) * Updating release docs for branching approach now that we are 0.x (#862) I think the previous approach was for pre-versioned branches, but now we probably want to start maintaining release branches - even if in practice we would cut 0.2.0 instead of 0.1.1 * Used doctoc generated toc. (#932) * Update to go 1.12.5 (#937) * Attempt to fix integration tests (#942) - Use specific versions of kind and kustomize instead of installing with `go get` - Update golang version for example provider * Update README.md (#941) * Add shortnames for crds. (#943) * Fix machine object pivoting to the target cluster (#944) * [docs] Update RBAC annotations for example provider (#947) * Remove workstreams from scope and objectives (#948) Signed-off-by: Vince Prignano <[email protected]> * Added ibm cloud to architecture diagram. (#946) * Added comment about cluster API vs cloud providers to readme (#954) * Quit MachineSet reconcile early if DeletionTimestamp is set (#956) Signed-off-by: Vince Prignano <[email protected]> * Cleanup controllers (#958) Signed-off-by: Vince Prignano <[email protected]> * updates Google Cloud branding to mach other usages (#973) * Cannot retrieve machine name (#960) Signed-off-by: clyang82 <[email protected]> * Allow to use foregroundDeletion for MachineDeployments and MachineSets (#953) * Rename controllers test files (#968) Signed-off-by: Vince Prignano <[email protected]> * make cluster-api-manager container run with a non-root user (#955) * Update Gitbook release process (#659) * Remove mermaid module because it is currently unused and does not always install cleanly. * Introduce npm-shrinkwrap so that npm installations are reproducable. * Update gitbook release documentation. * Clarify verification instructions. * Update GitBook. * Remove rendered Gitbook from repo in preparation for using firebase instead. * Install gitbook cli. * Update documentation for netlify. * Add Netlify configuration toml. * Update link to homepage so that it points to the book and not the GitHub repository. * Remove base from netlify.toml. The build script already accounts for the correct location... * Remove reference to no longer existent KEP. :( * Disable redirects until the cluster-api.sigs.k8s.io domain has been created. * Reenable netlify redirects now that the cluster-api.sigs.k8s.io domain exists. * Add versioning and releases to README (#967) Signed-off-by: Vince Prignano <[email protected]> * Add more log for cluster deletion failure (#978) * Update dependencies (#982) Signed-off-by: Vince Prignano <[email protected]>
Signed-off-by: Vince Prignano [email protected]
What this PR does / why we need it:
This PR adds a document outlining use cases for Cluster API.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: