Leader election after node reboot fails #397

anishakj · 2021-10-04T04:08:39Z

Description

In VMware cluster, some pods are stuck in ProviderFailed state, and leader election function, provided by operator SDK, is unable to process that, so new pods are stuck in wait cycle.
Fix scans owners of leader lock config map, and deletes pods in ProviderFailed status.

Importance

must-have

Location

cmd/manager/main.go

Suggestions for an improvement

Customise the leader.Become function of operator-sdk to include pre-checks

anishakj self-assigned this Oct 4, 2021

anishakj mentioned this issue Oct 4, 2021

Fixing Leader election failure #396

Merged

anishakj changed the title ~~Leader election after node fails~~ Leader election after node reboot fails Oct 4, 2021

derekm closed this as completed in #396 Oct 6, 2021

anishakj added this to the Release 0.2.14 milestone May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader election after node reboot fails #397

Leader election after node reboot fails #397

anishakj commented Oct 4, 2021

Leader election after node reboot fails #397

Leader election after node reboot fails #397

Comments

anishakj commented Oct 4, 2021

Description

Importance

Location

Suggestions for an improvement