add configurable throughput for clients #91

sonasingh46 · 2021-08-27T19:45:11Z

Signed-off-by: Ashutosh Kumar [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:
This change adds two new flags "kube-api-qps" and "kube-api-burst" to configure the QPS and Burst of clients to the API server as the default value is not effective at larger scale.
This change also uses a separate client for leader election go routine.
This change also replace the native go flag library with github.com/spf13/pflag
Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

None

k8s-ci-robot · 2021-08-27T19:45:19Z

Welcome @sonasingh46!

It looks like this is your first PR to kubernetes-csi/external-health-monitor 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-health-monitor has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-08-27T19:45:19Z

Hi @sonasingh46. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xing-yang · 2021-08-27T22:44:30Z

/ok-to-test

xing-yang · 2021-08-27T22:44:38Z

Please add a release note.

xing-yang · 2021-08-27T22:50:09Z

cmd/csi-external-health-monitor-controller/main.go

+
+	retryIntervalStart = flag.Duration("retry-interval-start", time.Second, "Initial retry interval of failed pv monitoring. It doubles with each failure, up to retry-interval-max. Default is 1 second.")
+	retryIntervalMax   = flag.Duration("retry-interval-max", 5*time.Minute, "Maximum retry interval of failed pv monitoring. Default is 5 minutes.")
+	kubeAPIQPS         = flag.Float32("kube-api-qps", 5, "QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0.")


Since we are trying to decrease the API call frequency, I wonder if we should set the default lower than 5.0.
@pohly @msau42 what do you think?

I'm not sure. You would have to run a scalability experiment to determine which default works better. Also note that client-side throttling is being replaced by API priority and fairness.

IMHO more important than tweaks like this is to look at why health monitor causes API traffic. Wasn't it because it establishes watches for objects that change a lot (nodes, pods)? Then throttling won't help at all because it is applied to outbound requests, not incoming watch updates.

API priority and fairness is now beta. Have you run any experiments that can demonstrate that this change is still effective?

xing-yang · 2021-08-27T23:00:40Z

Can you make the following workqueue ratelimited as well? Also check if there's anything else in the NodeWatcher that should be configurable.
https://github.com/kubernetes-csi/external-health-monitor/blob/v0.4.0/pkg/controller/node_watcher.go#L86

sonasingh46 · 2021-08-28T07:48:31Z

@xing-yang

Can you make the following workqueue ratelimited as well? Also check if there's anything else in the NodeWatcher that should be configurable.

Sure, raised a commit for this. I do not see anything more configurable in the node watcher. Though there is DefaultNodeNotReadyTimeDuration flag but I am not sure if this should be configurable.
https://github.com/kubernetes-csi/external-health-monitor/blob/master/pkg/controller/node_watcher.go#L40

pohly · 2021-08-30T08:17:38Z

Fixes ##76

I have my doubts whether it actually fixes any of that... please remove that and manually close the issue after it has been confirmed that the issue is gone.

sonasingh46 · 2021-08-31T18:27:19Z

I have my doubts whether it actually fixes any of that... please remove that and manually close the issue after it has been confirmed that the issue is gone.

Done

a0f195c Merge pull request kubernetes-csi#106 from msau42/fix-canary 7100c12 Only set staging registry when running canary job b3c65f9 Merge pull request kubernetes-csi#99 from msau42/add-release-process e53f3e8 Merge pull request kubernetes-csi#103 from msau42/fix-canary d129462 Document new method for adding CI jobs are new K8s versions e73c2ce Use staging registry for canary tests 2c09846 Add cleanup instructions to release-notes generation 60e1cd3 Merge pull request kubernetes-csi#98 from pohly/kubernetes-1-19-fixes 0979c09 prow.sh: fix E2E suite for Kubernetes >= 1.18 3b4a2f1 prow.sh: fix installing Go for Kubernetes 1.19.0 1fbb636 Merge pull request kubernetes-csi#97 from pohly/go-1.15 82d108a switch to Go 1.15 d8a2530 Merge pull request kubernetes-csi#95 from msau42/add-release-process 843bddc Add steps on promoting release images 0345a83 Merge pull request kubernetes-csi#94 from linux-on-ibm-z/bump-timeout 1fdf2d5 cloud build: bump timeout in Prow job 41ec6d1 Merge pull request kubernetes-csi#93 from animeshk08/patch-1 5a54e67 filter-junit: Fix gofmt error 0676fcb Merge pull request kubernetes-csi#92 from animeshk08/patch-1 36ea4ff filter-junit: Fix golint error f5a4203 Merge pull request kubernetes-csi#91 from cyb70289/arm64 43e50d6 prow.sh: enable building arm64 image 0d5bd84 Merge pull request kubernetes-csi#90 from pohly/k8s-staging-sig-storage 3df86b7 cloud build: k8s-staging-sig-storage c5fd961 Merge pull request kubernetes-csi#89 from pohly/cloud-build-binfmt db0c2a7 cloud build: initialize support for running commands in Dockerfile be902f4 Merge pull request kubernetes-csi#88 from pohly/multiarch-windows-fix 340e082 build.make: optional inclusion of Windows in multiarch images 5231f05 build.make: properly declare push-multiarch 4569f27 build.make: fix push-multiarch ambiguity 17dde9e Merge pull request kubernetes-csi#87 from pohly/cloud-build bd41690 cloud build: initial set of shared files 9084fec Merge pull request kubernetes-csi#81 from msau42/add-release-process 6f2322e Update patch release notes generation command 0fcc3b1 Merge pull request kubernetes-csi#78 from ggriffiths/fix_csi_snapshotter_rbac_version_set d8c76fe Support local snapshot RBAC for pull jobs c1bdf5b Merge pull request kubernetes-csi#80 from msau42/add-release-process ea1f94a update release tools instructions 152396e Merge pull request kubernetes-csi#77 from ggriffiths/snapshotter201_update 7edc146 Update snapshotter to version 2.0.1 git-subtree-dir: release-tools git-subtree-split: a0f195cc2ddc2a1f07d4d3e46fc08187db358f94

k8s-triage-robot · 2021-11-29T22:22:38Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

xing-yang · 2021-12-07T03:50:36Z

/remove-lifecycle stale

xing-yang · 2021-12-20T15:34:38Z

Although this may not have fixed #76, it is still useful to make QPS and Burst settings configurable. While API server priority and fairness focuses on providing a granular rate limit mechanism on in-flight requests on the server side, QPS and Burst settings serve as a rate limiting mechanism on Kube API client side.

Signed-off-by: Ashutosh Kumar <[email protected]>

k8s-triage-robot · 2022-05-07T21:25:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-06-06T22:06:46Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot · 2022-06-14T23:36:41Z

@sonasingh46: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-07-14T23:49:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-07-14T23:49:35Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

xing-yang · 2022-09-12T23:03:22Z

/reopen

k8s-ci-robot · 2022-09-12T23:03:24Z

@xing-yang: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-09-12T23:03:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sonasingh46
Once this PR has been reviewed and has the lgtm label, please ask for approval from msau42 by writing /assign @msau42 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-09-12T23:03:39Z

@sonasingh46: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-csi-external-health-monitor	`d2eaf25`	link	true	`/test pull-kubernetes-csi-external-health-monitor`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-triage-robot · 2022-10-12T23:57:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-10-12T23:57:35Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 27, 2021

k8s-ci-robot requested review from chrishenzie and j-griffith August 27, 2021 19:45

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 27, 2021

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2021

xing-yang reviewed Aug 27, 2021

View reviewed changes

xing-yang assigned xing-yang and msau42 Aug 27, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2021

sonasingh46 added 2 commits February 7, 2022 02:42

add configurable throughput for clients

fba209e

Signed-off-by: Ashutosh Kumar <[email protected]>

add rate limiting workqueue for node watcher

d2eaf25

sonasingh46 force-pushed the api_throttle branch from 11d561e to d2eaf25 Compare February 6, 2022 21:13

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 6, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2022

k8s-ci-robot closed this Jul 14, 2022

pohly mentioned this pull request Aug 8, 2022

KEP-1432 Move volume health monitoring to beta kubernetes/enhancements#3321

Closed

k8s-ci-robot reopened this Sep 12, 2022

k8s-ci-robot closed this Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add configurable throughput for clients #91

add configurable throughput for clients #91

sonasingh46 commented Aug 27, 2021 •

edited

Loading

k8s-ci-robot commented Aug 27, 2021

k8s-ci-robot commented Aug 27, 2021

xing-yang commented Aug 27, 2021

xing-yang commented Aug 27, 2021

xing-yang Aug 27, 2021

pohly Aug 30, 2021

msau42 Aug 31, 2021

xing-yang commented Aug 27, 2021 •

edited

Loading

sonasingh46 commented Aug 28, 2021 •

edited

Loading

pohly commented Aug 30, 2021

sonasingh46 commented Aug 31, 2021

k8s-triage-robot commented Nov 29, 2021

xing-yang commented Dec 7, 2021

xing-yang commented Dec 20, 2021

k8s-triage-robot commented May 7, 2022

k8s-triage-robot commented Jun 6, 2022

k8s-ci-robot commented Jun 14, 2022

k8s-triage-robot commented Jul 14, 2022

k8s-ci-robot commented Jul 14, 2022

xing-yang commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-triage-robot commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

add configurable throughput for clients #91

add configurable throughput for clients #91

Conversation

sonasingh46 commented Aug 27, 2021 • edited Loading

k8s-ci-robot commented Aug 27, 2021

k8s-ci-robot commented Aug 27, 2021

xing-yang commented Aug 27, 2021

xing-yang commented Aug 27, 2021

xing-yang Aug 27, 2021

Choose a reason for hiding this comment

pohly Aug 30, 2021

Choose a reason for hiding this comment

msau42 Aug 31, 2021

Choose a reason for hiding this comment

xing-yang commented Aug 27, 2021 • edited Loading

sonasingh46 commented Aug 28, 2021 • edited Loading

pohly commented Aug 30, 2021

sonasingh46 commented Aug 31, 2021

k8s-triage-robot commented Nov 29, 2021

xing-yang commented Dec 7, 2021

xing-yang commented Dec 20, 2021

k8s-triage-robot commented May 7, 2022

k8s-triage-robot commented Jun 6, 2022

k8s-ci-robot commented Jun 14, 2022

k8s-triage-robot commented Jul 14, 2022

k8s-ci-robot commented Jul 14, 2022

xing-yang commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-ci-robot commented Sep 12, 2022

k8s-triage-robot commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

sonasingh46 commented Aug 27, 2021 •

edited

Loading

xing-yang commented Aug 27, 2021 •

edited

Loading

sonasingh46 commented Aug 28, 2021 •

edited

Loading