Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period` #4158

yanfeng1992 · 2023-10-20T07:29:44Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

scheduler add LeaderElection config

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period`, the default value not changed compared to previous version.

Signed-off-by: huangyanfeng <[email protected]>

yanfeng1992 · 2023-10-20T08:52:11Z

/assign @RainbowMango

RainbowMango · 2023-10-20T11:49:32Z

If I remember correctly, they all have default values, could you please explain why the default values do not fit?

yanfeng1992 · 2023-10-20T12:43:22Z

Because in our environment, karmada-scheduler has restarted many times due to lease reasons.

The default configuration is too short and strict for our actual environment.

RainbowMango · 2023-10-20T12:48:40Z

Have you figured out why the instance can't renew the lease? Normally the default value is long enough, it might not work just raise the timeout.

(I'm saying that doesn't mean I don't like this patch, but trying to figure out the root cause.)

yanfeng1992 · 2023-10-23T02:58:34Z

karmada-apiserver disruption can happen for multiple reasons, including
1.karmada-apiserver rollout on a non-HA cluster （This does not exist in a production environment）
2.networking disruption on the host running the client
3.networking disruption on the host running the server

We have seen all of these cases, and more, disrupt connections. Many controllers and operators rely on the karmada-apiserver for making leader election.

the karmada-apiserver downtime tolerance is floor(renewDeadline/retryPeriod)*retryPeriod-retryPeriod. When using the default configuration, tolerance is floor(10/2)*2-2 =8s. In actual production, Leader election needs to be able to tolerate 60s of interruptions. Recommended defaults are LeaseDuration=137s, RenewDealine=107s, RetryPeriod=26s.

In addition, let me explain the original source of this PR and why we are so concerned about component restarts. In our large-scale environment, the number of some CRs is tens of thousands.
Every time karmada-scheduler restarts, duplicated type crb and rb will also be rescheduled, causing subsequent rb and crb to be queued.
Every time karmada-controller-manager is restarted, all objects need to be reconciled. If the synchronization of the informer cache is not completed within 30s at this time, the apiserver of the member cluster will be frequently accessed, causing the client to limit the current flow, resulting in the execution_controller's overall syncWork speed being very slow.
These are some of the problems I encountered in large-scale scenarios and the reasons for my preliminary analysis. If there are any errors or incompleteness, I hope you can point them out.

@RainbowMango

RainbowMango

/lgtm
/approve

Thanks for the detailed clarification and feedback.

It looks good:

      --leader-elect                                                                                                                                                                               
                Enable leader election, which must be true when running multi instances. (default true)
      --leader-elect-lease-duration duration                                                                                                                                                       
                The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively
                the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
      --leader-elect-renew-deadline duration                                                                                                                                                       
                The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable
                if leader election is enabled. (default 10s)
      --leader-elect-resource-name string                                                                                                                                                          
                The name of resource object that is used for locking during leader election. (default "karmada-scheduler")
      --leader-elect-resource-namespace string                                                                                                                                                     
                The namespace of resource object that is used for locking during leader election. (default "karmada-system")
      --leader-elect-retry-period duration                                                                                                                                                         
                The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)

karmada-bot · 2023-10-24T03:07:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

scheduler add LeaderElection config

5f1c05d

Signed-off-by: huangyanfeng <[email protected]>

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 20, 2023

karmada-bot requested review from lonelyCZ and RainbowMango October 20, 2023 07:29

karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 20, 2023

karmada-bot assigned RainbowMango Oct 20, 2023

yanfeng1992 changed the title ~~scheduler add LeaderElection config~~ Introduced leaderElection options including: --leader-elect-lease-duration, --leader-elect-renew-deadline, --leader-elect-retry-period Oct 20, 2023

RainbowMango approved these changes Oct 24, 2023

View reviewed changes

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2023

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2023

karmada-bot merged commit 670d3c3 into karmada-io:master Oct 24, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period` #4158

Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period` #4158

yanfeng1992 commented Oct 20, 2023 •

edited

Loading

yanfeng1992 commented Oct 20, 2023

RainbowMango commented Oct 20, 2023

yanfeng1992 commented Oct 20, 2023 •

edited

Loading

RainbowMango commented Oct 20, 2023

yanfeng1992 commented Oct 23, 2023 •

edited

Loading

RainbowMango left a comment

karmada-bot commented Oct 24, 2023

Introduced leaderElection options including: --leader-elect-lease-duration, --leader-elect-renew-deadline, --leader-elect-retry-period #4158

Introduced leaderElection options including: --leader-elect-lease-duration, --leader-elect-renew-deadline, --leader-elect-retry-period #4158

Conversation

yanfeng1992 commented Oct 20, 2023 • edited Loading

yanfeng1992 commented Oct 20, 2023

RainbowMango commented Oct 20, 2023

yanfeng1992 commented Oct 20, 2023 • edited Loading

RainbowMango commented Oct 20, 2023

yanfeng1992 commented Oct 23, 2023 • edited Loading

RainbowMango left a comment

Choose a reason for hiding this comment

karmada-bot commented Oct 24, 2023

Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period` #4158

Introduced leaderElection options including: `--leader-elect-lease-duration`, `--leader-elect-renew-deadline`, `--leader-elect-retry-period` #4158

yanfeng1992 commented Oct 20, 2023 •

edited

Loading

yanfeng1992 commented Oct 20, 2023 •

edited

Loading

yanfeng1992 commented Oct 23, 2023 •

edited

Loading