Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: proposal for enhancement of edge autonomy #2015

Merged
merged 2 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/img/state_machine.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions docs/proposals/20240411-edge-autonomy-enhancement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
| title | authors | reviewers | creation-date | last-updated | status |
|:-------------------------:|----------------------------------------|-------------|---------------|--------------|--------|
| Edge-autonomy-enhancement | @vie-serendipity @rambohe-ch @JameKeal | @rambohe-ch | 2024-03-26 | 2024-04-07 | |
# Edge-autonomy-enhancement
## Table of Contents
<!-- TOC -->
* [Edge-autonomy-enhancement](#edge-autonomy-enhancement)
* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals/Future Work](#non-goalsfuture-work)
* [Proposal](#proposal)
* [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
* [Error key](#error-key)
* [Erase error key on each request](#erase-error-key-on-each-request)
* [Autonomy manager](#autonomy-manager)
* [Controller](#controller)
* [Implementation History](#implementation-history)
<!-- TOC -->
## Summary
This proposal aims to enhance autonomy ability of yurthub. Based on [improve yurthub cache](https://github.com/openyurtio/openyurt/blob/224082855310f4a60d74301d935d1c316340a72b/docs/proposals/20230715-improve-yurthub-cache.md),
which will optimize the cache mechanism, this proposal will asynchronously enable node autonomy and incorporate a new module for
status report.
## Motivation
Currently, users can enable autonomy for a node by annotating "node.beta.openyurt.io/autonomy". With this annotation, the control plane does not evict the pods on the node.
but the node itself is not directly aware of the annotation,
which means that the node will only start interacting with the local data when it is disconnected from the network and actually turn on the autonomy.
Therefore, the current control plane does not validate the autonomy status of nodes and does not report any problems on the node's side.
This is not consistent with real-world scenarios. For example, deploying new pods may cause a disk cache write to fail,
which in turn affects the autonomy state of the node.
### Goals
- Asynchronously enable node autonomy
- Add a new module for status report
- Enhance the ability of cache
### Non-Goals/Future Work
## Proposal
The annotation operation of the user and the approval operation of the control plane should be asynchronous.
So it is necessary to add a new controller in control plane to approve or reject the annotation operation of users.
The node side have to report its own status to the condition so that controller can list&watch nodes to get real status of nodes.
### Implementation Details/Notes/Constraints
#### Error key
The hub proxies the local request and connects to api server, then writes all the objects responded to the node to disk.
It is logical to save all the keys that failed to be written, saves them in the error key, the error key is stored in memory as a hash, and then persists to disk.
Notably it is very important to record fail reasons of error key so that they can be reported in the node condition.
#### Erase error key on each request
Every time get&lis&... check if there is any in the error key, write it if there is, and remove it from the error key when it succeeds.
#### Autonomy manager
Incorporate a new module mainly responsible for updating node autonomy status and re-fetching objects according to error key.
- Every fixed period of time, retrieve the content of the object from the api server according to the error key, and then brush off the corresponding error key.
- Check the error key periodically, if the error key is not empty for three consecutive times, set the node condition AutonomyState to Unknown.

Unknown state: at the moment of user annotation, the node is successfully autonomous, and then due to disk write burst and other reasons, the node's autonomy state is affected, so it is changed to Unknown.
`const AutonomyState v1.NodeConditionType = "AutonomyState"`
### Controller
The user annotates the node to request for node autonomy, and the controller goes to the list&watch edge node and labels the node with `node.autonomy.openyurt.io/status=true/false` based on the condition of the node AutonomyState,
thus indicating that the control plane approves or denies node autonomy request.
- node's condition state change(finite state machine)

![node's condition state change](../img/state_machine.jpg)
## Implementation History
- [ ] 04/11/2024: Draft proposal created
- [ ] 04/17/2023: Present proposal at the community meeting
2 changes: 1 addition & 1 deletion pkg/yurthub/yurtcoordinator/coordinator.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ type Coordinator interface {
// 2. Pool-Scoped resources have been synced with cloud, through list/watch
// 3. local cache has been uploaded to yurt-coordinator
IsReady() (cachemanager.CacheManager, bool)
// IsCoordinatorHealthy will return the poolCacheManager and true if the yurt-coordinator is healthy.
// IsHealthy will return the poolCacheManager and true if the yurt-coordinator is healthy.
// We assume coordinator is healthy when the elect status is LeaderHub and FollowerHub.
IsHealthy() (cachemanager.CacheManager, bool)
}
Expand Down
Loading