-
Notifications
You must be signed in to change notification settings - Fork 404
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add nodepool governance capability proposal
- Loading branch information
Showing
5 changed files
with
278 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
278 changes: 278 additions & 0 deletions
278
docs/proposals/20220307-nodepool-governance-capability.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,278 @@ | ||
--- | ||
title: NodePool governance capability | ||
authors: | ||
- "@rambohe-ch" | ||
- "@Peeknut" | ||
reviewers: | ||
- "@Fei-Guo" | ||
- "@huangyuqi" | ||
creation-date: 2022-03-07 | ||
last-updated: 2022-03-07 | ||
status: provisional | ||
--- | ||
|
||
# Proposal to add the governance capability of NodePool | ||
## Table of Contents | ||
|
||
- [Proposal to add the governance capability of NodePool](#Proposal to add the governance capability of NodePool) | ||
- [Table of Contents](#table-of-contents) | ||
- [Glossary](#glossary) | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals/Future Work](#non-goalsfuture-work) | ||
- [Proposal](#proposal) | ||
- [Principles](#principles) | ||
- [Architecture](#architecture) | ||
- [Implementation Details](#implementation-details) | ||
- [spirit-controller](#spirit-controller) | ||
- [pool-spirit](#pool-spirit) | ||
- [Operations and Maintenance in NodePool](#operations-and-maintenance-in-nodepool) | ||
- [Write Resources to pool-spirit](#write-resources-to-pool-spirit) | ||
- [kubeconfig for users to access pool-spirit](#kubeconfig-for-users-to-access-pool-spirit) | ||
- [NodePool Autonomy](#nodepool-autonomy) | ||
- [User Stories](#user-stories) | ||
- [Other Problems](#other-problems) | ||
- [H/A Consideration](#ha-consideration) | ||
|
||
## Glossary | ||
Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). | ||
|
||
## Summary | ||
|
||
In order to better manage edge nodes, OpenYurt introduces the concept of NodePool. | ||
Due to the features of node pool, such as network stability between nodes, management convenience | ||
(workload: YurtAppSet, YurtAppDaemon), more users will do things around the node pool. | ||
Meanwhile we want to use the feature of node pool to solve the problems caused by the network in the cloud-edge scenario. | ||
|
||
Therefore, in this proposal, we will introduce `NodePool governance capability`. | ||
By deploying spirit Pod(kube-apiserver and etcd in one pod) in a node pool and designing the cooperation mechanism between YurtHubs, | ||
OpenYurt can support users to operate and maintain in the node pool scope, and solve the problems caused by the cloud-edge network. | ||
|
||
## Motivation | ||
|
||
In native Kubernetes, if master does not receive the heartbeat of the node, the node status is marked as Notready. | ||
Then kube-controller-manager will evict the pods on the NotReady node. | ||
|
||
In edge scenario, not only node failures but also network disconnection can cause the master to fail to receive node heartbeats. | ||
If the pods are evicted and rebuilt due to the disconnection of the cloud-edge network, it will bring challenges to the business. | ||
So there needs to be a more precise way of distinguishing between the two cases. | ||
|
||
OpenYurt provides CRDs such as NodePool, YurtAppSet, and YurtAppDaemon for regional management of resources and workloads. | ||
As the scale of the cluster increases, in order to obtain the resources of the same NodePool, users need to filter a large amount of data, | ||
which greatly affects the efficiency and easily causes errors. | ||
|
||
This proposal makes it possible to solve the above problems through `NodePool governance capability`. | ||
|
||
### Goals | ||
|
||
- NodePool can provide some Non Resource API processing capabilities, such as /version, /api-resources requests. | ||
- The master can distinguish whether the node NotReady is caused by node failure or network disconnection | ||
when the NodePool is not completely disconnected from the cloud (at least one node can be connected to the cloud). | ||
If it is a node failure, the pod will be evicted, and if it is a network disconnection, it will not be evicted. | ||
- Provide NodePool operation and maintenance capabilities. Users can obtain resource in the node pool (kubectl get nodes/pods), | ||
as well as the logs of the nodes in the node pool (such as kubectl logs/exec/cp/top/port-forward/attach, etc.) | ||
|
||
### Non-Goals/Future Work | ||
|
||
- `NodePool governance capability` provides basic operation and maintenance capabilities, | ||
but does not support the CRUD capabilities of resources (such as creating, upgrading, deleting pods, etc.). | ||
- When the node pool is completely disconnected from the cloud, the cloud cannot distinguish | ||
whether the node NotReady is caused by node failure or network disconnection. | ||
Therefore, if the edge node fails at this time, the cloud will not evict and rebuild pods. | ||
That is to say, the high availability of services in the node pool is not involved in this solution. | ||
|
||
## Proposal | ||
|
||
### Principles | ||
|
||
- No modification to k8s, keeping it non-invasive. | ||
- The `NodePool governance capability` has no impact on the OpenYurt, and can be free to go online and offline. | ||
- The cloud center is the source of truth to ensure the consistency of cloud and edge data. | ||
- The design is simple, reliable and stable. | ||
|
||
### Architecture | ||
|
||
To provide `NodePool governance capability`, we add a component to the NodePool, tentatively named `pool-spirit`. | ||
pool-spirit should not replace the kube-apiserver on the cloud, but only provide governance capabilities in NodePool scope. | ||
Its structure is as follows: | ||
|
||
![nodepool-governance](../img/nodepool-governance/img-1.png) | ||
|
||
- The `spirit-controller` deployed in the cloud manages the pool-spirit through YurtAppSet. | ||
When NodePool enables the NodePool governance capability, pool-spirit will be automatically deployed by spirit-controller in NodePool. | ||
- When pool-spirit starts, all YurtHubs in NodePool upload the node scope resources cached on their nodes to pool-spirit, | ||
including pod, configmap, secrets, service, node, lease, serviceaccount, etc. | ||
- When the edge node can connect to the master, YurtHub directly accesses the kube-apiserver, | ||
caches the data returned by the cloud locally, and updates the data to pool-spirit in time. | ||
This ensures that users can obtain the latest resources when operating in the node pool (such as kubectl get). | ||
- When the node can connect to the cloud, YurtHub sends node lease to the cloud. However, | ||
when the node is disconnected from the cloud, YurtHub adds an agent forwarding Annotation to the node lease and | ||
sends it to pool-spirit, then leader YurtHub forwards it to the cloud in real time. | ||
- When NodePool disables the NodePool governance capability, Spirit-Controller will clean up the Pool-Spirit belonging to this NodePool. | ||
|
||
### Implementation Details | ||
|
||
#### spirit-controller | ||
|
||
spirit-controller is used to manage the life cycle of the pool-spirit in each NodePool and deployed as deployment. | ||
|
||
spirit-controller can be described as: | ||
|
||
- Initialize work at startup: | ||
1. spirit-controller will block until the YurtAppSet CRDs are ready. | ||
2. spirit-controller prepares the client certificate to access the kubelet for pool-spirit, saves the certificate in secret and mounts it to pool-spirit. | ||
Note that all pool-spirits can share this client certificate. | ||
3. spirit-controller prepares the client certificate for forwarding node lease to cloud by yurthub, | ||
saves the client certificate in secret and will be used by leader yurthub. | ||
4. spirit-controller creates service for pool-spirit. | ||
5. spirit-controller generates a YurtAppSet Object for managing pool-spirit, and set field 'pool' to empty. | ||
|
||
- Reconcile: | ||
1. spirit-controller will list/watch PoolSpirit CR. When user creates a PoolSpirit CR, spirit-controller adds the NodePool information | ||
to YurtAppSet, so that a pool-spirit instance will be deployed in the NodePool. Note that the spirit-controller refuses to | ||
deploy the pool-spirit when the number of nodes in the NodePool is less than 3, or if a pool-spirit has been deployed in the NodePool. | ||
2. When pool-spirit is scheduled, spirit-controller prepares the tls server certificate for pool-spirit, | ||
saves the certificate in secret and mounts it to pool-spirit. Note that the tls server certificate for each pool-spirit is different | ||
because certificate includes the pool-spirit service clusterIP and the node IP. | ||
3. spirit-controller generates kubeconfig for users to access pool-spirit. The server address in kubeconfig is set to | ||
https://{nodeIP}:10270. In addition, the client certificate authority in kubeconfig should be restricted. For details, | ||
please refer to the [kubeconfig of pool-spirit](#kubeconfig-for-users-to-access-pool-spirit). | ||
4. When the pool-spirit is rebuilt, spirit-controller will clean up and rebuild the tls server certificate. | ||
5. When Spriit CR is deleted, spirit-controller will delete the NodePool information in YurtAppSet. It also cleans up | ||
the certificates of pool-spirit(tls server certificate and kubeconfig). | ||
|
||
Since node autonomy already supported by OpenYurt and [NodePool Autonomy](#nodepool-autonomy) are applicable to | ||
different scenarios, both abilities cannot be enabled at the same time. We do it through admission webhook. | ||
|
||
#### pool-spirit | ||
|
||
pool-spirit will store various resources in the node pool, including node, pod, service, endpoints, endpointslices, etc. | ||
pool-spirit is managed by YurtAppSet and deploys kube-apiserver and etcd in one pod. Here resources in etcd will be stored in memory instead of disk. | ||
|
||
```go | ||
// PoolSpirit CRD | ||
type PoolSpirit Struct { | ||
metav1.TypeMeta | ||
metav1.ObjectMeta | ||
Spec PoolSpiritSpec | ||
Status PoolSpiritStatus | ||
} | ||
|
||
type PoolSpiritSpec struct { | ||
// Version of pool-spirit, which corresponding to the Kubernetes version | ||
Version string | ||
// The NodePool managed by pool-spirit. | ||
NodePool string | ||
} | ||
|
||
type PoolSpiritStatus struct { | ||
// The node where pool-spirit is located. | ||
NodeName string | ||
// Conditions represent the status of pool-spirit, which is filled by the spirit-controller. | ||
Conditions []PoolSpiritCondition | ||
// DelegatedNodes are the nodes in the node pool that are disconnected from the cloud. | ||
DelegatedNodes []string | ||
// OutsidePoolNodes are nodes in the node pool that cannot connect to pool-spirit. | ||
OutsidePoolNodes []string | ||
} | ||
|
||
type PoolSpiritCondition struct { | ||
Type PoolSpiritConditionType | ||
Status ConditionStatus | ||
LastProbeTime metav1.Time | ||
LastTransitionTime metav1.Time | ||
Reason string | ||
Message string | ||
} | ||
|
||
type PoolSpiritConditionType string | ||
|
||
const ( | ||
// PoolSpiritPending indicates that the deployment of pool-spirit is blocked. | ||
//This happens, for example, if the number of nodes in the node pool is less than 3. | ||
PoolSpiritPending PoolSpiritConditionType = "Pending" | ||
// PoolSpiritCertsReady indicates that the certificate used by pool-spirit is ready. | ||
PoolSpiritCertsReady PoolSpiritConditionType = "CertsReady" | ||
// PoolSpiritReady indicates that pool-spirit is ready. | ||
PoolSpiritReady PoolSpiritConditionType = "Ready" | ||
) | ||
|
||
type ConditionStatus string | ||
|
||
const ( | ||
ConditionTrue ConditionStatus = "True" | ||
ConditionFalse ConditionStatus = "False" | ||
ConditionUnknown ConditionStatus = "Unknown" | ||
) | ||
``` | ||
|
||
- Https Server Certificate | ||
|
||
spirit-controller prepares the tls server certificate for kube-apiserer in pool-spirit and mounts it into the pod through | ||
a secret by using the patch feature of YurtAppSet. pool-spirit runs in HostNetWork mode, | ||
and the https server listening address is: https://{nodeIP}:10270. | ||
|
||
- Service Discovery | ||
|
||
pool-spirit provides services by ClusterIP Service in Kubernetes, and all pool-spirits share the service IP. | ||
|
||
In order to ensure that pool-spirit only serves nodes in the same node pool, the annotation of service topology needs to be added to the pool-spirit service. | ||
|
||
#### Operations and Maintenance in NodePool | ||
|
||
In terms of operation and maintenance, pool-spirit supports two types of requests: | ||
|
||
- GET requests for resources in NodePool, such as nodes, pods, etc. | ||
- Native kubernetes operation and maintenance requests for pods in NodePool, such as kubectl logs/exec/cp/attach, etc. | ||
|
||
To support the above capabilities, the following problems need to be solved: | ||
|
||
##### Write Resources to pool-spirit | ||
|
||
In OpenYurt, the data flow between cloud and edge is: kube-apiserver --> yurthub --> kubelet (and other clients). | ||
In order to ensure data consistency and efficiency, pool-spirit reuses the current data flow of OpenYurt. | ||
The data flow of pool-spirit is: kube-apiserver --> yurthub --> pool-spirit. Data in pool-spirit is written by each YurtHub. | ||
|
||
YurtHub updates data to pool-spirit, so it requires Create/Update permissions for resources. After the pool-spirit starts, | ||
we need to prepare the CRD NodePool, clusterrolebinding associated with `system:nodes` group and admin clusterrole in the kube-apiserver. | ||
This ensures that YurtHub can successfully write to etcd using node client certificate. | ||
|
||
![](../img/nodepool-governance/img-2.png) | ||
|
||
##### kubeconfig for users to access pool-spirit | ||
|
||
Kubeconfig is generated by spirit-controller, and the organization configuration of client certificate is: `openyurt:spirits`. | ||
|
||
In addition, add the get permission of the resource and the operation and maintenance permissions(logs/exec) for the | ||
group `openyurt:spirits` to kube-apiserver of pool-spirit. | ||
|
||
#### NodePool Autonomy | ||
|
||
In order to know whether the failure to receive a node lease is caused by node failure or network disconnection, | ||
we design `node-lease proxy mechanism`. | ||
|
||
In the same node pool, when the node is disconnected from the cloud,the leader YurtHub connected to the cloud forwards | ||
the node lease to the cloud. It can be described as: | ||
![](../img/nodepool-governance/img-3.png) | ||
![](../img/nodepool-governance/img-4.png) | ||
**Note:** If the lease of pool-spirit node is also need to be forwarded, the leader YurtHub will give priority to forwarding Node leases of its node. | ||
|
||
The policy of the cloud controller is as follows: | ||
|
||
| | get node lease | get node lease with delegate annotation | don't get node lease | | ||
| ------ | -------------------------------------------- | ---------------------------------------------- | ------------------------------------------ | | ||
| Policy | Node: Ready;<br>Pod: Maintain;<br> Endpoints: Maintain | Node: NotReady;<br>Pod: Maintain;<br>Endpoints: Maintain | Node: NotReady;<br>Pod: Evited;<br>Endponits: Update | | ||
|
||
### User Stories | ||
|
||
1. As a user, I can operate and maintain in the NodePool dimension, and can directly get the resources in the NodePool. | ||
2. As a user, I want node pods not to be evicted when the node is disconnected from the cloud and want to reconstruct pod | ||
in normal nodes when node downtime. | ||
|
||
### Other Problems | ||
|
||
#### H/A Consideration | ||
|
||
Consider that when the pool-spirit fails, each component can be fully rolled back. | ||
Therefore, in order to save resources, only one pool-spirit instance is deployed in each NodePool. |