Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-installer: remove resource limits #948

Merged
merged 1 commit into from
Oct 23, 2024
Merged

node-installer: remove resource limits #948

merged 1 commit into from
Oct 23, 2024

Conversation

Freax13
Copy link
Contributor

@Freax13 Freax13 commented Oct 22, 2024

There's no strong reason for having resource limits on the node-installer and given the unpredictable nature of the kernel's resource accounting, we should remove the resource limits so that we don't run into unexpected OOMs.

Fixes f5da52d
Cc @blenessy

@Freax13 Freax13 added the no changelog PRs not listed in the release notes label Oct 22, 2024
Copy link
Contributor

@burgerdev burgerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still set a request based on the average case, but removing the limit to accommodate the worst case sgtm.

There's no strong reason for having resource limits on the
node-installer and given the unpredictable nature of the kernel's
resource accounting, we should remove the resource limits so that we
don't run into unexpected OOMs.

Fixes f5da52d
@katexochen
Copy link
Member

On Kubernetes, the best practice is to set memory limit=request.

https://home.robusta.dev/blog/kubernetes-memory-limit

cc @3u13r

@3u13r
Copy link
Member

3u13r commented Oct 23, 2024

On Kubernetes, the best practice is to set memory limit=request.

https://home.robusta.dev/blog/kubernetes-memory-limit

Ah, so I remembered correctly. I was just not sure to bring it up. Ideally the node-installer is more akin to a (daemon set) job than a application deployment which actually does something with the requested memory, right?
The question is, can we have a feasible limit if the needed memory scales with the amount of memory on the host?
Alternatively, we can add a somewhat high limit and then explain the failure in the docs and how the user can increase it?

@burgerdev
Copy link
Contributor

On Kubernetes, the best practice is to set memory limit=request.

https://home.robusta.dev/blog/kubernetes-memory-limit

cc @3u13r

This may be true for apps, but is not for essential system components. We don't want to compete with normal pods for resources.

@katexochen
Copy link
Member

This may be true for apps, but is not for essential system components. We don't want to compete with normal pods for resources.

I'd say that this is a rather subjective measurement of what is "essential". There could be other important applications running in the same cluster. The concept of contrast is a "next to", so we shouldn't think we're the most critical component.

@burgerdev
Copy link
Contributor

The question is, can we have a feasible limit if the needed memory scales with the amount of memory on the host? Alternatively, we can add a somewhat high limit and then explain the failure in the docs and how the user can increase it?

The problem is that we need memory in a single burst and don't have much influence over how it's accounted. Setting a request=limit=1G is a little extreme and I'd say a small request with a large limit may be fair, too, but don't quite see what the limit would buy us.

@burgerdev
Copy link
Contributor

Funny thing: the linked blog post even has a section on our issue, I think (Unintuitive page-cache behaviour can lead to unecessary OOMs).

@katexochen
Copy link
Member

@Freax13 Freax13 merged commit 7f50e0c into main Oct 23, 2024
9 checks passed
@Freax13 Freax13 deleted the tom/no-limits branch October 23, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog PRs not listed in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants