You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
There is only one single yurt-controller-manager Pod in my k8s. When an edge node failed to access apiserver but can communicate with other nodes, the taint of node.openyurt.io/unschedulable is added successfully.
Somehow yurt-controller-manager failed to renew lease and reboot. Also it took ~12 minutes to do nothing. During this time, the failure node recovers but the taint can never be removed after yurt-controller-manager reboot.
I0213 06:25:52.270685 1 poolcoordinator_controller.go:122] taint edgeworker01: key node.openyurt.io/unschedulable already exists, nothing to do
E0213 06:26:02.361335 1 poolcoordinator_controller.go:164] Operation cannot be fulfilled on nodes "edgeworker01": the object has been modified; please apply your changes to the latest version and try again
E0213 06:37:51.646804 1 leaderelection.go:330] error retrieving resource lock kube-system/yurt-controller-manager: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/yurt-controller-manager?timeout=10s": context deadline exceeded
I0213 06:37:51.647067 1 leaderelection.go:283] failed to renew lease kube-system/yurt-controller-manager: timed out waiting for the condition
F0213 06:37:51.647111 1 controllermanager.go:248] leaderelection lost
What you expected to happen:
The taint "node.openyurt.io/unschedulable" should be removed when failure node recovers no matter whether yurt-controller-manager reboots or not.
How to reproduce it (as minimally and precisely as possible):
Rare case.
What happened:
There is only one single yurt-controller-manager Pod in my k8s. When an edge node failed to access apiserver but can communicate with other nodes, the taint of node.openyurt.io/unschedulable is added successfully.
Somehow yurt-controller-manager failed to renew lease and reboot. Also it took ~12 minutes to do nothing. During this time, the failure node recovers but the taint can never be removed after yurt-controller-manager reboot.
What you expected to happen:
The taint "node.openyurt.io/unschedulable" should be removed when failure node recovers no matter whether yurt-controller-manager reboots or not.
How to reproduce it (as minimally and precisely as possible):
Rare case.
Anything else we need to know?:
From pkg/controller/poolcoordinator/delegatelease/poolcoordinator_controller.go, the Counter never gets chance to increment after reboot and failure node recovery.
Environment:
kubectl version
): 1.22others
/kind bug
The text was updated successfully, but these errors were encountered: