diff --git a/docs/proposals/20211017-adding-dynamic-load-balancer.md b/docs/proposals/20211017-adding-dynamic-load-balancer.md new file mode 100644 index 00000000000..3071aeb646f --- /dev/null +++ b/docs/proposals/20211017-adding-dynamic-load-balancer.md @@ -0,0 +1,308 @@ +--- +title: Proposal Template +authors: + - "@lindayu17" + - "@gnunu" + - "@zzguang" +reviewers: + - "@rambohe-ch" + - "@Fei-Guo" + - "@kadisi" +creation-date: 2021-10-17 +status: provisional +--- + +# A dynamic load balancer for edge cluster + +## Table of Contents + +[Tools for generating](https://github.com/ekalinin/github-markdown-toc) a table of contents from markdown are available. + +- [Title](#title) + - [Table of Contents](#table-of-contents) + - [Glossary](#glossary) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Story 4](#story-4) + - [Requirements (Optional)](#requirements-optional) + - [Functional Requirements](#functional-requirements) + - [FR1](#fr1) + - [FR2](#fr2) + - [FR3](#fr3) + - [FR4](#fr4) + - [Non-Functional Requirements](#non-functional-requirements) + - [NFR1](#nfr1) + - [NFR2](#nfr2) + - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + - [Risks and Mitigations](#risks-and-mitigations) + - [Alternatives](#alternatives) + - [Upgrade Strategy](#upgrade-strategy) + - [Additional Details](#additional-details) + - [Test Plan [optional]](#test-plan-optional) + - [Implementation History](#implementation-history) + +## Glossary + +Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). + +## Summary + +Dynamic Load Balancer (DLB) is a key feature of cloud/edge native cluster. Requsets, after dispatched by edge ingress, should be further routed to the most appropriate PODs based on various criteria: +1) nodes/PODs with specific devices such as GPU or other accelerators for AI inference; +2) current available resources of PODs including CPU, memory, GPU, etc. +3) other considerations such as debugging, testing, fault injection, rate limiting, etc. + +## Motivation + +There are kinds of workloads whose requests are not just simple networking based, instead would incur sustaining resource consumption of CPU, memory and GPU, etc., such as video analytics and cloud gaming typically. These workloads especially fit edge environment deployment and need traffic management involving current available resources of the backend PODs and nodes. A dynamic load balancer, after ingress and before dynamic PODs, should be inserted to do the traffic mangement for optimal performance of the edge cluster. + +### Goals + +- Allow users to specify requests routing policies; +- Collect system metrics through metrics monitoring services such as Prometheus; +- Analyze and verify the requests and match them with cluster's system capabilities; +- Route requests to proper PODs according to user specified algorithms and policies. + +### Non-Goals/Future Work + +- Metrics services for OpenYurt is not part of this proposal. + +## Proposal + +Dynamic Load Balancer (DLB) operator, and its CRD definition lists below: + +```go + +// Device setting description: +// Device setting defines how to route workloads considering different xPUs. +// cpu: use CPU only as compute device; +// gpu: use GPU ohly as compute device; +// auto:cpu,gpu: multiple device can be used as compute device automatically, +// this setting means use CPU first before it is exhausted. + +// Algorithm setting description: +// Algorithm setting defines how to distribute workloads among different PODs/nodes. +// balance: schedule workload to the POD/node with the most compute resource which is specified by Device field; +// round-robin: schedule workload to the PODs/nodes in round-robin mode, the threshold (ex., FPS) should be taken into consideration as well; +// If the threshold runs lower than a watermark, the next candidate will be evaluated; +// squeeze: schedule workloads to the PODs/nodes as less as possible, inadequate threshold indicates to invoking a new node; +// manual: node1,node2,...: schedule workload to assigned nodes, mostly used for debug purpose. + +type DynamicLBPolicy struct { + Device string `json:"device,omitempty"` //cpu(default), gpu, auto:cpu,gpu... + Algorithm string `json:"algorithm,omitempty"` //balance(default), round-robin, squeeze, manual:nodename + Threshold string `json:"threshold"` //e.g.fps:24 +} + +type UseCase struct { + UseCaseClass string `json:"useCaseClass"` //AI, media, gaming... + UseCaseName string `json:"useCaseName"` //detect, classify... + DeploymentName string `json:"deploymentName"` + DLBPolicy DynamicLBPolicy `json:"dlbPolicy"` +} + +type Resource struct { + CPU int32 `json:"cpu,omitempty"` + GPU int32 `json:"gpu,omitempty"` + MEM int32 `json:"mem,omitempty"` + FPS int32 `json:"fps,omitempty"` +} + +type ReqStat struct { + ReqID string `json:"reqID,omitempty"` + PodID string `json:"podID,omitempty"` + Content string `json:"content,omitempty"` //full request content + Status string `json:"status,omitempty"` //receiving, running, stopping + ReqTop Resource `json:"reqTop,omitempty"` //resource consumption of this request +} + +type PodStat struct { + PodID string `json:"podID,omitempty"` + NodeID string `json:"nodeID,omitempty"` + PodQuota Resource `json:"podQuota,omitempty"` //assigned resources of this POD + PodTop Resource `json:"podTop,omitempty"` //resource consumption of this POD +} + +type NodeStat struct { + NodeID string `json:"nodeID,omitempty"` + NodeQuota Resource `json:"nodeQuota,omitempty"` //all resources of this node + NodeTop Resource `json:"nodeTop,omitempty"` //resource consumption of this node +} + +type UseCaseStat struct { + Usecase UseCase `json:"usecase,omitempty"` + ReqStat []ReqStat `json:"reqStat,omitempty"` //every request for the usecase + PodStat []PodStat `json:"podStat,omitempty"` //every pod stat for the usecase +} + +// DynamicLBSpec defines the desired state of DynamicLB +type DynamicLBSpec struct { + // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster + // Important: Run "make" to regenerate code after modifying this file + + Usecase UseCase `json:"usecase"` +} + +// DynamicLBStatus defines the observed state of DynamicLB +type DynamicLBStatus struct { + // INSERT ADDITIONAL STATUS FIELD - define observed state of cluster + // Important: Run "make" to regenerate code after modifying this file + UsecaseStatList []UseCaseStat `json:"usecaseStatList,omitempty"` + NodeStatList []NodeStat `json:"nodeStatList,omitempty"` + Watermark float32 `json:"watermark,omitempty"` //total resource consumption percentage + +} + +//+kubebuilder:object:root=true +//+kubebuilder:subresource:status + +// DynamicLB is the Schema for the dynamiclbs API +type DynamicLB struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty"` + + Spec DynamicLBSpec `json:"spec,omitempty"` + Status DynamicLBStatus `json:"status,omitempty"` +} + +//+kubebuilder:object:root=true + +// DynamicLBList contains a list of DynamicLB +type DynamicLBList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty"` + Items []DynamicLB `json:"items"` +} + +``` + +Development plan: hopefully this feature can be implementated and merged into OpenYurt Release 0.8. + +### User Stories + +#### Story 1 + +Requests of workloads will be routed to proper PODs according to specific load balance policies. + +#### Story 2 + +Users want to be able to customize request routing rules. + +#### Story 3 + +Users want to be able to decide the target device for their workloads. + +#### Story 4 + +Users want to run workloads with optimal performance expectation. + +### Requirements (Optional) + +#### Functional Requirements + +##### FR1 + +DLB controller collects configuration information such as use case descriptions, user defined rules from CRs, the exposing service and the nodes' basic information, and store them into ConfigMaps for sharing with worker container. + +##### FR2 + +DLB receives requests from ingress of the cluster, then analyzes and verifies the requests and matches them with the cluster's system capabilities. + +##### FR3 + +DLB worker container reads the pushed configuration which includes the basic information of the PODs and the nodes in order to get the current metrics data. + +##### FR4 + +Based on the metrics data and the specified algorithm, DLB worker container routes the requests to the proper PODs. + +#### Non-Functional Requirements + +##### NFR1 + +We suppose metrics service is working correctly in OpenYurt nodepool environment, so Metrics services for OpenYurt is not part of this proposal. + +##### NFR2 + +We suppose that the required xPU device plugins is available, for ex., ones for Intel's discrete GPU. + +### Implementation Details/Notes/Constraints + +The DLB is located after Ingress, and it runs at the nodepool level as ingress. A label/annotation will be defined to indicate whether this DLB is enabled for a nodepool or not. + +The DLB consists of two parts, a controller and a worker. For the part of worker container, we can reuse off the shelf proxy solutions such as Envoy(https://github.com/envoyproxy/envoy) proxy. The reason is that essentially the DLB worker container is a proxy for traffic management and the core function overlaps with the mentioned products. Given that Envoy implements HTTP/gRPC or L4 traffic management, and they are born to do the networking transparently and elegantly, by sidecar for instance, so we can augment them with metrics based traffic management and corresponding configuration information ingestion. This is the data plane. + +And the controller which runs in control plane will do metrics collection, configuration management through go-control-plane(https://github.com/envoyproxy/go-control-plane), it will push the configuration and metrics data updates to the worker container. The latter, based on traffic routing rules and current metrics, routes the requests to the appropriate PODs/nodes. + +Since the traffic should be routed to the designated PODs chosen by specified policies if this DLB is enabled, the original load balance function of K8S's Serivce should be ignored. To ease the usage of DLB, the worker proxy could be injected automatically/manually to replace the Service of a deployment, then the proxy would discover the Service's backend PODs for which to do traffic routing using the augmented algorithms. + + |----------------------------------------------------------------------------| + | | + | OpenYurt Node Pool ----------------------- | + | |node | | + | | ------------ | | + | ----|------>| Workload | | | + | | | | service | | | + | | | ------------ | | + | ---------------------- | |---------------------| | + | |node | | | + | ------------- | ------------- | | | +------->| service |-----|---->|Envoy proxy|--|--| ---------------------- | + | ------------- | | --------------- | | |node | | + | | | |dynamic | | | | ------------ | | + | ------------- | | |load blancer | | |---|------>| Workload | | | +------->| Ingress |--| | --------------- | | | | service | | | + | ------------- | | | | ------------ | | + | |--------------------| | |---------------------| | + | | | + | | ---------------------- | + | | |node | | + | | | ------------ | | + | ----|------>| Workload | | | + | | | service | | | + | | ------------ | | + | |---------------------| | + |----------------------------------------------------------------------------| + +### Risks and Mitigations + +- What are the risks of this proposal and how do we mitigate? Think broadly. + If we reuse the off the shelf product, we need to do augmentation elegantly, for easy development and maintenance. +- How will UX be reviewed and by whom? + DLB shoulde be best used by an End-to-End deployment opetrator for automatic proxy injection. +- How will security be reviewed and by whom? + Security is solved by reusing proxy products. +- Consider including folks that also work outside the SIG or subproject. + +## Alternatives + +The `Alternatives` section is used to highlight and record other possible approaches to delivering the value proposed by a proposal. + +## Upgrade Strategy + +If applicable, how will the component be upgraded? Make sure this is in the test plan. + +Consider the following in developing an upgrade strategy for this enhancement: +- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to keep previous behavior? +- What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade in order to make use of the enhancement? + If the hardware updates, especially new CPU/GPU is engaged, or new inference algorithm is involved for example, we may need to upgrade the software, since the DLB algorithm is depending on the above mentioned. + +## Additional Details + +### Test Plan [optional] + +## Implementation History + +- [ ] MM/DD/YYYY: Proposed idea in an issue or [community meeting] +- [ ] MM/DD/YYYY: Compile a Google Doc following the CAEP template (link here) +- [ ] MM/DD/YYYY: First round of feedback from community +- [ ] MM/DD/YYYY: Present proposal at a [community meeting] +- [ ] MM/DD/YYYY: Open proposal PR +