Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Enable configuring the kube config timeout #8865

Conversation

cnmcavoy
Copy link
Contributor

What this PR does / why we need it:

Allows configuring the kubernetes client configuration timeout. Also moves the other related configuration into a standard place and reduces duplicated logic in various main functions.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8864

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 15, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @cnmcavoy!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 15, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @cnmcavoy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cecilerobertmichon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cnmcavoy cnmcavoy force-pushed the cnmcavoy/configurable-kube-config branch from c85cc2d to 1339b83 Compare June 15, 2023 17:15
controllers/remote/flags.go Outdated Show resolved Hide resolved
Comment on lines +54 to +47
restConfig := ctrl.GetConfigOrDie()
restConfig.QPS = restConfigQPS
restConfig.Burst = restConfigBurst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason Timeout is not also defined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubernetes clients used for the management clusters never had a timeout assigned in any of the various main.go and I didn't want to introduce a unintended change in behavior.

If we prefer consistency, I can add the timeout here as well.


fs.IntVar(&restConfigBurst, "kube-api-burst", 30,
"Maximum number of queries that should be allowed in one burst from the controller client to the Kubernetes API server. Default 30")
remote.AddRestConfigFlags(fs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper is very obviously pulling its weight! 🙌

@nojnhuh
Copy link
Contributor

nojnhuh commented Jun 19, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 19, 2023
@fabriziopandini
Copy link
Member

/hold
If I'm not wrong the solution proposed is not addressing the problem being discussed (making the timeout for draining workload clusters configurable), but instead it configures the timeout for management cluster clients.
see also #8864 (comment)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2023
@killianmuldoon
Copy link
Contributor

If I'm not wrong the solution proposed is not addressing the problem being discussed (making the timeout for draining workload clusters configurable), but instead it configures the timeout for management cluster clients.

Agreed this doesn't address the issue with drain, but this does configure both workload and management cluster clients AFAIK - just not for drain.

@fabriziopandini
Copy link
Member

Agreed this doesn't address the issue with drain, but this does configure both workload and management cluster clients AFAIK - just not for drain.

I'm not sure this applies to workload clusters, could you kindly point me to where this is happening

@killianmuldoon
Copy link
Contributor

I'm not sure this applies to workload clusters, could you kindly point me to where this is happening

You can see this on the change in #8882. The defaultClientTimeout is set in the remote.RESTConfig function. This is called in a number of places to create clients for workload clusters, including in the ClusterCacheTracker's createAccessor here:

config, err := RESTConfig(ctx, t.controllerName, t.client, cluster)

@sbueringer
Copy link
Member

This feels like a very implicit way of configuring the ClusterCacheTracker

@cnmcavoy cnmcavoy force-pushed the cnmcavoy/configurable-kube-config branch from 1339b83 to 2f681d3 Compare June 20, 2023 18:42
@cnmcavoy
Copy link
Contributor Author

This feels like a very implicit way of configuring the ClusterCacheTracker

Can you clarify? I'm not really sure what you mean.

If I'm not wrong the solution proposed is not addressing the problem being discussed (making the timeout for draining workload clusters configurable), but instead it configures the timeout for management cluster clients.

I commented over in the issue, but the HTTP timeout is very much what we are interested in. If we feel that making the QPS flags consistent is an unwanted change, I can revert that part and make this PR only the timeout.

@sbueringer
Copy link
Member

sbueringer commented Jun 21, 2023

This feels like a very implicit way of configuring the ClusterCacheTracker

Can you clarify? I'm not really sure what you mean.

Sorry that was not very actionable and clear feedback :). What I meant is I don't like to introduce flags, which are then writing package global variables and thus affect the behavior of the ClusterCacheTracker and the RESTConfig func.

There are folks using CAPI as a library and use ClusterCacheTracker directly. I don't want to force them to have to use the flags to be able to configure the timeouts.

I would prefer something like what we did with flags.TLSOptions. Provide a util to define the flags and then explicitly hand them over (in the TLSOptions case to the webhookserver, in our case here to the ClusterCacheTracker).

If I'm not wrong the solution proposed is not addressing the problem being discussed (making the timeout for draining workload clusters configurable), but instead it configures the timeout for management cluster clients.

I commented over in the issue, but the HTTP timeout is very much what we are interested in. If we feel that making the QPS flags consistent is an unwanted change, I can revert that part and make this PR only the timeout.

I think you're right that this would change the timeout used for draining (but also more). Let's continue the discussion on the issue and once we have consensus come back to the PR.

@cnmcavoy
Copy link
Contributor Author

Closed in preference to #8917

@cnmcavoy cnmcavoy closed this Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow configuring kube client timeouts
6 participants