diff --git a/docs/proposals/20220930-unifying-cloud-edge-comms.md b/docs/proposals/20220930-unifying-cloud-edge-comms.md new file mode 100644 index 00000000000..cf137b68aa1 --- /dev/null +++ b/docs/proposals/20220930-unifying-cloud-edge-comms.md @@ -0,0 +1,244 @@ +--- +title: Unify cloud edge comms solution for OpenYurt +authors: + - "@zzguang" +reviewers: + - "@gnunu" + - "@LindaYu17" +creation-date: 2022-09-30 +last-updated: +status: provisional +--- + +# Unify cloud edge comms solution for OpenYurt + +## Table of Contents + +- [Unify Cloud Edge Comms Solution](#unify-cloud-edge-comms-solution) + - [Table of Contents](#table-of-contents) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Story 4](#story-4) + - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) + - [Implementation History](#implementation-history) + +## Summary + +Current OpenYurt provides 2 independent solutions in cloud edge comms domain, which are Raven and YurtTunnel. +Although they are implemented to meet different user requirements, they belong to the same domain for users. +They are located in different repos, so it's hard to maintain them from the project management perspective. +What's more important, although related docs are provided to users, it may lead to user confusion on how to +select them for their own usage scenarios. +This proposal aims to fix these issues by integrating YurtTunnel into Raven. + +## Motivation + +When Raven and YurtTunnel are combined together, the related implementation for cloud edge comms in dataplane +will be refined so that the related source codes organization will be optimized, so it will be much more easier +to maintain in future. +Besides, providing only one entry to users for their cloud edge comms usage scenarios will definitely improve +the user experience. + +### Goals + +To integrate YurtTunnel into Raven, we want to achieve the following goals: +- Move YurtTunnel implementation from openyurt repo to raven repo. +- Optimize YurtTunnel implementation which include ANP upgrade, iptables manager removement and etc. +- Fuse Raven and YurtTunnel into one unified cloud edge comms solution. + +### Non-Goals/Future Work + +At current stage, we mainly focus on fusing Raven and YurtTunnel into one solution, we will not try to +extend new features for them. + +## Proposal + +We know that YurtTunnel is a DevOps traffic tunnel from cloud to edge, while Raven is more like a data +traffic channel between cloud-edge and edge-edge. When we think to unify these 2 solutions, we prefer to +integrate YurtTunnel into Raven to extend Raven scope to cover YurtTunnel features. +About how to achieve the target in a graceful way, we thought about several solution alternatives. + +### Raven & YurtTunnel fusion + The related solution alternatives are described below in details: + +1). Solution 1: Integrate yurttunnel-server and yurttunnel-agent into raven-agent on cloud and edge node + This solution aims to integrate YurtTunnel logic into raven-agent and hide its details to users completely, + so when users deploy Raven into the cluster, YurtTunnel is enabled by default, we can call it "deep fusion". + + ----------------------------------------- + | Cloud Node | + | --------------------------- | + | | raven-agent | | + | | --------------------- | | + | | | yurttunnel-server | | | + | | --------------------- | | + | --------------------------- | + --------------------|-------------------- + Cloud | + ----------------------------|--------------------------- + Edge | + --------------------|-------------------- + | Edge Node | + | --------------------------- | + | | raven-agent | | + | | --------------------- | | + | | | yurttunnel-agent | | | + | | --------------------- | | + | --------------------------- | + ----------------------------------------- + + +To achieve it, we mainly need to solve the 2 problems: +a). On Edge side, integrate yurttunnel-agent logic into raven-agent, no matter the edge node acts as + gateway or ordinary role. +b). On Cloud side, Integrate yurttunnel-server logic into raven-agent. + +For a), since both raven-agent and yurttunnel-agent are deployed by daemonset to edge nodes, it seems applicable to combine them together. +But for b), we found several tricky issues: + I). The raven-agent is deployed as daemonset on every cloud node, but yurttunnel-server is deployed as deployment with several replicas + for HA scenario, how to judge which cloud nodes to host the yurttunnel-server? + II). If we select the gateway cloud node to host the yurttunnel-server, there would be another issue: + The gateway role will not be elected until user creates a "gateway" CR, so it will lead to yurttunnel-server function depends on gateway CR + creation, which is obviously not reasonable. + III). Even we have ways to find some cloud nodes to host yurttunnel-server, how to expose the yurttunnel-server service since the yurttunnel-server + is integrated into some of the raven-agent pods? + +By the analysis above, we can see that this "deep fusion" design is too ideal to be implemented, it doesn't make sense to +hide all the YurtTunnel details and integrate it deeply into raven-agent. + +2). Solution 2: Integrate yurttunnel-agent into raven-agent while deploying yurttunnel-server independently on cloud side + Since we met several tricky problems while integrating yurttunnel-server into raven-agent on cloud side, how about to + deploy yurttunnel-server independently on cloud side? + + ------------------------------------------- + | Cloud Node | + | --------------- --------------------- | + | | raven-agent | | yurttunnel-server | | + | --------------- --------------------- | + ----------|-------------------|------------ + Cloud | | + ------------------|-------------------|----------------- + Edge | | + ----------|-------------------|------------ + | Edge Node | + | --------------------------- | + | | raven-agent | | + | | --------------------- | | + | | | yurttunnel-agent | | | + | | --------------------- | | + | ---------------------------- | + ------------------------------------------- + +This solution is feasible theoretically,however we know that users don't have to enable Raven and YurtTunnel +features simultaneously, how to handle the condition that users only want to enable one of them? +Besides, this solution aims to fuse Raven and YurtTunnel on Edge side, but leave it alone on Cloud side, which seems not +a consistent design. + +Any other solutions for it? Let's continue to go forward... + +3). Solution 3: Implement a new CRD as a wrapper layer for users + From the user experience point of view, how about to define a new CRD as the main entry for users to + configure Cloud Edge communication? For example, we abstract 3 types of comms usage: nodeName, podIP and nodeIP. + + ------------------------------------------------------ + | Cloud Node | + | ---------------------- | + | | new CRD controller | | + | ---------------------- | + | ---------------------------- | + | | raven-controller-manager | | + | ---------------------------- | + | --------------- ----------------------- | + | | raven-agent | | yurttunnel-server | | + | --------------- ----------------------- | + -------------|--------------------------|------------- + Cloud | | + ---------------------|--------------------------|------------------ + Edge | | + -------------|--------------------------|------------- + | Edge Node | + | --------------- ---------------------- | + | | raven-agent | | yurttunnel-agent | | + | --------------- ---------------------- | + ------------------------------------------------------ + +This solution aims to add an abstraction layer to hide the technical details of current Raven and YurtTunnel, the new +CRD operator is responsible for deploying the corresponding components to the cluster, but it may introduce new issues: +I). It needs to implement a new operator, which improves the complexity. +II). When users select podIP comms method, they need to create gateway CR as well for further configuration, while for + the nodeName method, users don't need to create other CRs, so the user experience is not consistent. +III). If we want to integrate gateway CRD into the new CRD, it also seems tricky because the new CRD is a cluster level + singleton CRD, while users can create many gateway CRs for their usage scenarios. + +It seems we need to think more about it... + +4). Solution 4: Divide Raven into 2 subdomains: DevOps traffic and business data traffic + When we thought why it's so hard to integrate YurtTunnel into Raven in a deep fusion way, we found the reason is + they are totally 2 different solutions for different user requirements, they don't depend on each other and there + are almost nothing in common from design to implementation between them. From the users perspective, they can select + none/one/both of them according to their usage scenarios. Therefore, comparing to the "deep fusion", how about to implement + it in a "shallow" way? + It means that we take YurtTunnel into Raven scope as well, but not merge YurtTunnel components logic into Raven + components, as a result, the extended Raven includes 2 independent subdomains: Cloud to Edge DevOps channel and Cloud-Edge + or Edge-Edge data traffic channel, they are not coupled to each other, users can select them conveniently by + deploying the related components into their cluster. + Of course, to make alignment for the whole design, current Raven and YurtTunnel components need to be renamed to + keep a common style, for example: + +```yurttunnel-agent``` -> ```raven-tunnel-agent``` +```yurttunnel-server``` -> ```raven-tunnel-server``` +```raven-agent``` -> ```raven-gateway-agent``` +```raven-controller-manager``` -> ```raven-gateway-manager``` + + ------------------------------------------------------ + | Cloud Node | + | ------------------------- | + | | raven-gateway-manager | | + | ------------------------- | + | ----------------------- ----------------------- | + | | raven-gateway-agent | | raven-tunnel-server | | + | ----------------------- ----------------------- | + -------------|--------------------------|------------- + Cloud | | + ---------------------|--------------------------|------------------ + Edge | | + -------------|--------------------------|------------- + | Edge Node | + | ----------------------- ---------------------- | + | | raven-gateway-agent | | raven-tunnel-agent | | + | ----------------------- ---------------------- | + ------------------------------------------------------ + +This "shallow fusion" solution has several advantages: +I). The DevOps traffic is separated from the business data traffic, so they will not affect each other. +II). The architecture is clear and it's convenient for users to select for their usage scenarios. +III). It keeps the core logic of current Raven and YurtTunnel unchanged, it can be implemented without much effort. + +Preference: + By evaluating all the alternatives above, I prefer to solution 4 at current stage, if no different opinions, + I will follow it to implement the cloud edge unified comms solution for OpenYurt. + +### User Stories + +#### Story 1 +As an end user, I want to make some DevOps from Cloud to Edge, such as kubectl logs/exec. +#### Story 2 +As an end user, I want to get the edge nodes metrics status through Prometheus/Metrics server from Cloud. +#### Story 3 +As an end user, I want to access another business pod data from one NodePool to another NodePool. +#### Story 4 +As an end user, I want to send some AI data from Edge NodePool to Cloud for next-step processing or storage. + +### Implementation Details/Notes/Constraints + +## Implementation History + +- [ ] 09/30/2022: Draft proposal created +