Skip to content

Commit

Permalink
add proposal to unify cloud edge comms solution
Browse files Browse the repository at this point in the history
Current OpenYurt provides 2 independent solutions in cloud edge comms domain,
which are Raven and YurtTunnel, this proposal aims to integrate YurtTunnel
into Raven, and provide an unified cloud edge comms solution to users.

Signed-off-by: zzguang <[email protected]>
  • Loading branch information
zzguang committed Oct 11, 2022
1 parent eb5f406 commit 0e50085
Showing 1 changed file with 244 additions and 0 deletions.
244 changes: 244 additions & 0 deletions docs/proposals/20220930-unifying-cloud-edge-comms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
---
title: Unify cloud edge comms solution for OpenYurt
authors:
- "@zzguang"
reviewers:
- "@gnunu"
- "@LindaYu17"
creation-date: 2022-09-30
last-updated:
status: provisional
---

# Unify cloud edge comms solution for OpenYurt

## Table of Contents

- [Unify Cloud Edge Comms Solution](#unify-cloud-edge-comms-solution)
- [Table of Contents](#table-of-contents)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals/Future Work](#non-goalsfuture-work)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Story 3](#story-3)
- [Story 4](#story-4)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [Implementation History](#implementation-history)

## Summary

Current OpenYurt provides 2 independent solutions in cloud edge comms domain, which are Raven and YurtTunnel.
Although they are implemented to meet different user requirements, they belong to the same domain for users.
They are located in different repos, so it's hard to maintain them from the project management perspective.
What's more important, although related docs are provided to users, it may lead to user confusion on how to
select them for their own usage scenarios.
This proposal aims to fix these issues by integrating YurtTunnel into Raven.

## Motivation

When Raven and YurtTunnel are combined together, the related implementation for cloud edge comms in dataplane
will be refined so that the related source codes organization will be optimized, so it will be much more easier
to maintain in future.
Besides, providing only one entry to users for their cloud edge comms usage scenarios will definitely improve
the user experience.

### Goals

To integrate YurtTunnel into Raven, we want to achieve the following goals:
- Move YurtTunnel implementation from openyurt repo to raven repo.
- Optimize YurtTunnel implementation which include ANP upgrade, iptables manager removement and etc.
- Fuse Raven and YurtTunnel into one unified cloud edge comms solution.

### Non-Goals/Future Work

At current stage, we mainly focus on fusing Raven and YurtTunnel into one solution, we will not try to
extend new features for them.

## Proposal

We know that YurtTunnel is a DevOps traffic tunnel from cloud to edge, while Raven is more like a data
traffic channel between cloud-edge and edge-edge. When we think to unify these 2 solutions, we prefer to
integrate YurtTunnel into Raven to extend Raven scope to cover YurtTunnel features.
About how to achieve the target in a graceful way, we thought about several solution alternatives.

### Raven & YurtTunnel fusion
The related solution alternatives are described below in details:

1). Solution 1: Integrate yurttunnel-server and yurttunnel-agent into raven-agent on cloud and edge node
This solution aims to integrate YurtTunnel logic into raven-agent and hide its details to users completely,
so when users deploy Raven into the cluster, YurtTunnel is enabled by default, we can call it "deep fusion".

-----------------------------------------
| Cloud Node |
| --------------------------- |
| | raven-agent | |
| | --------------------- | |
| | | yurttunnel-server | | |
| | --------------------- | |
| --------------------------- |
--------------------|--------------------
Cloud |
----------------------------|---------------------------
Edge |
--------------------|--------------------
| Edge Node |
| --------------------------- |
| | raven-agent | |
| | --------------------- | |
| | | yurttunnel-agent | | |
| | --------------------- | |
| --------------------------- |
-----------------------------------------


To achieve it, we mainly need to solve the 2 problems:
a). On Edge side, integrate yurttunnel-agent logic into raven-agent, no matter the edge node acts as
gateway or ordinary role.
b). On Cloud side, Integrate yurttunnel-server logic into raven-agent.

For a), since both raven-agent and yurttunnel-agent are deployed by daemonset to edge nodes, it seems applicable to combine them together.
But for b), we found several tricky issues:
I). The raven-agent is deployed as daemonset on every cloud node, but yurttunnel-server is deployed as deployment with several replicas
for HA scenario, how to judge which cloud nodes to host the yurttunnel-server?
II). If we select the gateway cloud node to host the yurttunnel-server, there would be another issue:
The gateway role will not be elected until user creates a "gateway" CR, so it will lead to yurttunnel-server function depends on gateway CR
creation, which is obviously not reasonable.
III). Even we have ways to find some cloud nodes to host yurttunnel-server, how to expose the yurttunnel-server service since the yurttunnel-server
is integrated into some of the raven-agent pods?

By the analysis above, we can see that this "deep fusion" design is too ideal to be implemented, it doesn't make sense to
hide all the YurtTunnel details and integrate it deeply into raven-agent.

2). Solution 2: Integrate yurttunnel-agent into raven-agent while deploying yurttunnel-server independently on cloud side
Since we met several tricky problems while integrating yurttunnel-server into raven-agent on cloud side, how about to
deploy yurttunnel-server independently on cloud side?

-------------------------------------------
| Cloud Node |
| --------------- --------------------- |
| | raven-agent | | yurttunnel-server | |
| --------------- --------------------- |
----------|-------------------|------------
Cloud | |
------------------|-------------------|-----------------
Edge | |
----------|-------------------|------------
| Edge Node |
| --------------------------- |
| | raven-agent | |
| | --------------------- | |
| | | yurttunnel-agent | | |
| | --------------------- | |
| ---------------------------- |
-------------------------------------------

This solution is feasible theoretically,however we know that users don't have to enable Raven and YurtTunnel
features simultaneously, how to handle the condition that users only want to enable one of them?
Besides, this solution aims to fuse Raven and YurtTunnel on Edge side, but leave it alone on Cloud side, which seems not
a consistent design.

Any other solutions for it? Let's continue to go forward...

3). Solution 3: Implement a new CRD as a wrapper layer for users
From the user experience point of view, how about to define a new CRD as the main entry for users to
configure Cloud Edge communication? For example, we abstract 3 types of comms usage: nodeName, podIP and nodeIP.

------------------------------------------------------
| Cloud Node |
| ---------------------- |
| | new CRD controller | |
| ---------------------- |
| ---------------------------- |
| | raven-controller-manager | |
| ---------------------------- |
| --------------- ----------------------- |
| | raven-agent | | yurttunnel-server | |
| --------------- ----------------------- |
-------------|--------------------------|-------------
Cloud | |
---------------------|--------------------------|------------------
Edge | |
-------------|--------------------------|-------------
| Edge Node |
| --------------- ---------------------- |
| | raven-agent | | yurttunnel-agent | |
| --------------- ---------------------- |
------------------------------------------------------

This solution aims to add an abstraction layer to hide the technical details of current Raven and YurtTunnel, the new
CRD operator is responsible for deploying the corresponding components to the cluster, but it may introduce new issues:
I). It needs to implement a new operator, which improves the complexity.
II). When users select podIP comms method, they need to create gateway CR as well for further configuration, while for
the nodeName method, users don't need to create other CRs, so the user experience is not consistent.
III). If we want to integrate gateway CRD into the new CRD, it also seems tricky because the new CRD is a cluster level
singleton CRD, while users can create many gateway CRs for their usage scenarios.

It seems we need to think more about it...

4). Solution 4: Divide Raven into 2 subdomains: DevOps traffic and business data traffic
When we thought why it's so hard to integrate YurtTunnel into Raven in a deep fusion way, we found the reason is
they are totally 2 different solutions for different user requirements, they don't depend on each other and there
are almost nothing in common from design to implementation between them. From the users perspective, they can select
none/one/both of them according to their usage scenarios. Therefore, comparing to the "deep fusion", how about to implement
it in a "shallow" way?
It means that we take YurtTunnel into Raven scope as well, but not merge YurtTunnel components logic into Raven
components, as a result, the extended Raven includes 2 independent subdomains: Cloud to Edge DevOps channel and Cloud-Edge
or Edge-Edge data traffic channel, they are not coupled to each other, users can select them conveniently by
deploying the related components into their cluster.
Of course, to make alignment for the whole design, current Raven and YurtTunnel components need to be renamed to
keep a common style, for example:

```yurttunnel-agent``` -> ```raven-tunnel-agent```
```yurttunnel-server``` -> ```raven-tunnel-server```
```raven-agent``` -> ```raven-gateway-agent```
```raven-controller-manager``` -> ```raven-gateway-manager```

------------------------------------------------------
| Cloud Node |
| ------------------------- |
| | raven-gateway-manager | |
| ------------------------- |
| ----------------------- ----------------------- |
| | raven-gateway-agent | | raven-tunnel-server | |
| ----------------------- ----------------------- |
-------------|--------------------------|-------------
Cloud | |
---------------------|--------------------------|------------------
Edge | |
-------------|--------------------------|-------------
| Edge Node |
| ----------------------- ---------------------- |
| | raven-gateway-agent | | raven-tunnel-agent | |
| ----------------------- ---------------------- |
------------------------------------------------------

This "shallow fusion" solution has several advantages:
I). The DevOps traffic is separated from the business data traffic, so they will not affect each other.
II). The architecture is clear and it's convenient for users to select for their usage scenarios.
III). It keeps the core logic of current Raven and YurtTunnel unchanged, it can be implemented without much effort.

Preference:
By evaluating all the alternatives above, I prefer to solution 4 at current stage, if no different opinions,
I will follow it to implement the cloud edge unified comms solution for OpenYurt.

### User Stories

#### Story 1
As an end user, I want to make some DevOps from Cloud to Edge, such as kubectl logs/exec.
#### Story 2
As an end user, I want to get the edge nodes metrics status through Prometheus/Metrics server from Cloud.
#### Story 3
As an end user, I want to access another business pod data from one NodePool to another NodePool.
#### Story 4
As an end user, I want to send some AI data from Edge NodePool to Cloud for next-step processing or storage.

### Implementation Details/Notes/Constraints

## Implementation History

- [ ] 09/30/2022: Draft proposal created

0 comments on commit 0e50085

Please sign in to comment.