Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kured reboot lock held by non-existent (scaled in) node #74

Open
jackfrancis opened this issue Jun 7, 2021 · 2 comments
Open

kured reboot lock held by non-existent (scaled in) node #74

jackfrancis opened this issue Jun 7, 2021 · 2 comments
Labels

Comments

@jackfrancis
Copy link
Owner

In dynamic cluster scenarios, I've observed that kured can be deadlocked due to the reboot lock being held be a node that has been removed from the cluster. Let's investigate a preferred configuration to mitigate the blockage of forward progress when this occurs.

@jackfrancis
Copy link
Owner Author

E.g.:

time="2021-06-07T16:14:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:15:11Z" level=info msg="Reboot required"
time="2021-06-07T16:15:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:16:11Z" level=info msg="Reboot required"
time="2021-06-07T16:16:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:17:11Z" level=info msg="Reboot required"
time="2021-06-07T16:17:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:18:11Z" level=info msg="Reboot required"
time="2021-06-07T16:18:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:19:11Z" level=info msg="Reboot required"
time="2021-06-07T16:19:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:20:11Z" level=info msg="Reboot required"
time="2021-06-07T16:20:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:21:11Z" level=info msg="Reboot required"
time="2021-06-07T16:21:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:22:11Z" level=info msg="Reboot required"
time="2021-06-07T16:22:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:23:11Z" level=info msg="Reboot required"
time="2021-06-07T16:23:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:24:11Z" level=info msg="Reboot required"
time="2021-06-07T16:24:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
time="2021-06-07T16:25:11Z" level=info msg="Reboot required"
time="2021-06-07T16:25:11Z" level=warning msg="Lock already held: k8s-pool1-38125224-vmss00000h"
$ k get nodes
NAME                            STATUS   ROLES    AGE     VERSION
k8s-master-38125224-0           Ready    master   2d19h   v1.20.7
k8s-master-38125224-1           Ready    master   2d19h   v1.20.7
k8s-master-38125224-2           Ready    master   2d19h   v1.20.7
k8s-pool1-38125224-vmss000000   Ready    agent    2d19h   v1.20.7
k8s-pool1-38125224-vmss000002   Ready    agent    2d19h   v1.20.7

@jackfrancis
Copy link
Owner Author

Log messages make this clear:

time="2021-06-07T15:46:40Z" level=info msg="Lock TTL not set, lock will remain until being released"

We need to make sure we include the lock-ttl configuration:

https://github.com/weaveworks/kured/blob/main/cmd/kured/main.go#L115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant