Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request]NodePool Governance Capability: remove nodelifecycle controller and add pool-coordinator controller in yurt-controller-manager component. #776

Closed
Peeknut opened this issue Mar 10, 2022 · 3 comments · Fixed by #1040 or #1151
Assignees
Labels

Comments

@Peeknut
Copy link
Member

Peeknut commented Mar 10, 2022

What would you like to be added:

We will remove the original NodeLifeCycle controller in Yurt-Controller-Manager component, so native Kube-Controller-Manager doesn't need to disable NodeLifeCycle controller when startup. at the same time, we will add a new controller and webhook for providing edge autonomy feature in Yurt-Controller-Manager component.

  • provide two edge autonomy mode for workload, node autonomy and nodepool autonomy.

    • node autonomy: the pod will not be evicted from node even if the node crashed. if end user want to recreate or migrate the pod, end user should delete and recreate the pod manually.
    • nodepool autonomy: the pod will not be evicted from node when only cloud-edge network disconnected, and the pod will be evicted to other normal node when edge node is un-normal(like node crashed)
  • add a new annotation(apps.openyurt.io/autonomy) for workload to select the appropriate autonomy mode for its needs.

This issue is used for tracking the newly added controller(named as pool-coordinator controller), and the controller will provide the following feature:

  • when the node lease is delegated to report continuously for more then 40s(4 time heartbeat interval), the node should be recognized that cloud-edge network disconnected, but the pods on node should be kept. so add a un-schedulable taint to the node, so newly created pods will not be scheduled to this node.

Notes: For details, please refer to the proposal: #772

Why is this needed:
As mentioned in the proposal(#772) "NodePool Autonomy", in the node pool, all yurthubs connected to the cloud will elect a leader, and the leader yurthub will act as a heartbeat proxy to report the heartbeat of disconnected nodes to the cloud. The Yurt-Controller-Manager judges that the logic of the node has changed, and it needs to consider the two situations of node autonomy and NodePool autonomy.

others
/kind feature

@Peeknut Peeknut added the kind/feature kind/feature label Mar 10, 2022
@gnunu
Copy link
Member

gnunu commented Mar 11, 2022

/assign @gnunu

@gnunu
Copy link
Member

gnunu commented May 24, 2022

Progress update:
We will re-enable k8s kube-controller-manager's nodelifecycle controller, for less modification of vanilla k8s. From workload perspective, we will sress on pod management instead of node. For this, we can add validating webhook to check pod operation, especially delete in the case of API initiated eviction.

The works done so far:

  1. CA and server certificates gerneration on init;
  2. pod delete validation for eviction for node annotated autonomy;
  3. some tests related to eviction.

@stale
Copy link

stale bot commented Aug 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 30, 2022
@stale stale bot removed the wontfix label Sep 1, 2022
@rambohe-ch rambohe-ch moved this to In Progress in controlplane-v1.2 Nov 15, 2022
@rambohe-ch rambohe-ch changed the title [feature request]NodePool Governance Capability: Modify Yurt-Controller-Manager [feature request]NodePool Governance Capability: remove nodelifecycle controller and add pool-coordinator controller in yurt-controller-manager component. Nov 29, 2022
@github-project-automation github-project-automation bot moved this from In Progress to Done in controlplane-v1.2 Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
3 participants