-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm chart): Make env vars configurable and auto configure go runtime #1412
Conversation
Welcome @sslavic! |
Hi @sslavic. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8c0f077
to
9a9d2a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @sslavic. I think a boolean value to set the GOMAXPROCS
& GOMEMLIMIT
API might be better, I think it could be a single value for the whole chart. Then adding extraEnvVars
to each container would allow this to be infinitely customisable for anyone who wanted an alternate implementation.
9a9d2a4
to
e8a8138
Compare
@stevehipwell PTAL |
Thanks @sslavic, this looks good to me in principal. @serathius can you see any issue with setting /ok-to-test |
Could we please move this forward to unblock 0.7.0 chart release? |
/triage accepted |
e8a8138
to
20e4322
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sslavic The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@dgrisonnet I'm happy from the Helm perspective but I'd like a second opinion form one of the core maintainers. @sslavic could you add an entry under the |
…ntime Signed-off-by: Stevo Slavic <[email protected]>
20e4322
to
755bec0
Compare
Chart CHANGELOG has been updated, @stevehipwell PTAL |
Hard to say, I haven't seen those variables used anywhere in Kubernetes ecosystem, maybe because lack of awareness, maybe because it doesn't bring any tangible benefit. I would not recommend making this a default setting without prior testing. |
Looking at golang/go#33803, it proposes to set |
@serathius can you please expand on this? Please also take into consideration and compare the tradeoffs involved against current state where Go runtime for metrics-server (and sidecar) is left to defaults which e.g. for GKE managed metrics-server results in lots of CPUThrottlingHigh paging. This PR is workaround for Go runtime issue golang/go#33803 Many projects use https://github.com/uber-go/automaxprocs as workaround.
Using automaxprocs would be even more invasive, compared to using approach this PR proposes. GOMEMLIMIT is very useful too, but relatively new - can't expect many project to be using it at this point. The new defaults can be opted out completely or tuned, by
IMO these new defaults make metrics-server better out of the box, reducing the chance of CPUThrottlingHigh paging. Hope is GKE managed metrics-server will have this change propagated to it too 🤞🏻 |
@serathius is there a reason why MS couldn't use uber-go/automaxprocs? |
automaxprocs is based on CPU limits, so in case of metrics-server which has no limits by default it wouldn't change anything - that is good in backward compatibility perspective, but not for the goal which is to reduce chance of the CPU throttling high issue by default out of the box. Btw there are articles devoted to this issue, see https://github.com/robusta-dev/alert-explanations/wiki/CPUThrottlingHigh-on-metrics-server-(Prometheus-alert) by @aantn - IMO it's not good that e.g. on GKE the only option is to silence the alert and let metrics-server misbehave, live with its Go runtime not being configured. Using automaxprocs has another downside compared to the solution proposed in the PR - it can't be as easily opted out, we'd need at least extra env vars support for that; even then it wouldn't be as effective e.g. when it comes to ease of propagating the high CPU throttling fix by default even to the managed metrics-server services like the one on GKE. |
Uber may open-source automaxprocs equivalent for GOMEMLIMIT uber-go/automaxprocs#56 (comment) I still think env vars calculated from resources assigned in the infra code is more transparent, lighter weight, less invasive and more flexible when compared to using the libraries. |
No, just someone needs to test it, compile results, show improvement, and send a PR.
Just that proposed solution is not a complete fix and without a tests showing an improvement we should not enable it by default. My suggestion would be to keep MS components and helm releases consistent. If we want to add envs in helm, I would recommend not making Go ens default, but wait for the binary to test and adopt https://github.com/uber-go/automaxprocs |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I ran into this while proposing similar changes in other Helm charts, inspired by traefik/traefik-helm-chart#1029. I'm definitely not the expert on whether or not these changes are actually beneficial, but on that PR there are a series of references that look quite promising to me. Maybe they can help in understanding the possible benefits of merging this? |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it: PR extends the metrics-server helm chart with support for configuring environment variables and it automatically configures Go runtime (GOMAXPROCS and GOMEMLIMIT) to make the runtime aware of resources assigned to the metrics-server and addon-resizer containers. This reduces likelihood of CPUThrottlingHigh paging and OOM crashes.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #