-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduced leaderElection options including: --leader-elect-lease-duration
, --leader-elect-renew-deadline
, --leader-elect-retry-period
#4158
Conversation
Signed-off-by: huangyanfeng <[email protected]>
/assign @RainbowMango |
--leader-elect-lease-duration
, --leader-elect-renew-deadline
, --leader-elect-retry-period
If I remember correctly, they all have default values, could you please explain why the default values do not fit? |
Have you figured out why the instance can't renew the lease? Normally the default value is long enough, it might not work just raise the timeout. (I'm saying that doesn't mean I don't like this patch, but trying to figure out the root cause.) |
karmada-apiserver disruption can happen for multiple reasons, including We have seen all of these cases, and more, disrupt connections. Many controllers and operators rely on the karmada-apiserver for making leader election. the karmada-apiserver downtime tolerance is floor(renewDeadline/retryPeriod)*retryPeriod-retryPeriod. When using the default configuration, tolerance is floor(10/2)*2-2 =8s. In actual production, Leader election needs to be able to tolerate 60s of interruptions. Recommended defaults are LeaseDuration=137s, RenewDealine=107s, RetryPeriod=26s. In addition, let me explain the original source of this PR and why we are so concerned about component restarts. In our large-scale environment, the number of some CRs is tens of thousands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks for the detailed clarification and feedback.
It looks good:
--leader-elect
Enable leader election, which must be true when running multi instances. (default true)
--leader-elect-lease-duration duration
The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively
the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
--leader-elect-renew-deadline duration
The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable
if leader election is enabled. (default 10s)
--leader-elect-resource-name string
The name of resource object that is used for locking during leader election. (default "karmada-scheduler")
--leader-elect-resource-namespace string
The namespace of resource object that is used for locking during leader election. (default "karmada-system")
--leader-elect-retry-period duration
The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
scheduler add LeaderElection config
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: