Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

Closed
kylos101 opened this issue Jul 5, 2022 · 3 comments
Closed

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

kylos101 opened this issue Jul 5, 2022 · 3 comments
Assignees
Labels
aspect: performance anything related to performance type: bug Something isn't working

Comments

@kylos101
Copy link
Contributor

kylos101 commented Jul 5, 2022

Bug description

When you do this, the IDE will become slow or inaccessible.

Steps to recreate

Run this internal sample program to recreate.

Workspace affected

No response

Expected behavior

The workspace would ideally not become sluggish.

Example repository

No response

Anything else?

Next steps:
Recreate the behavior and determine the cause. Is it simply exceeding 9GB, regardless of the pod limit being 12 or 16GB? Our XL nodes have a memory limit of 16GB...do they encounter the same trouble at 9GB or 12GB?

Internal Slack reference

@kylos101 kylos101 added the type: bug Something isn't working label Jul 5, 2022
@kylos101 kylos101 moved this to In Progress in 🌌 Workspace Team Jul 5, 2022
@kylos101 kylos101 added the aspect: performance anything related to performance label Jul 5, 2022
@Furisto
Copy link
Member

Furisto commented Jul 6, 2022

Testing on XL workspaces does not reveal sluggishness when approaching the memory limit.
https://www.loom.com/share/74a0dcb1d2c1403599ae446d509340ac

@Furisto
Copy link
Member

Furisto commented Jul 6, 2022

Default workspace get extremely sluggish if a lot of memory is consumed. The reason for this is different values for the cgroup compared to XL workspaces.

Default
memory.high -> 10307919872 (9.6 Gibibyte)
memory.max -> 12884901888 (12 Gibibyte)
memory.min -> 3435970560 (3.2 Gibibyte)

XL
memory.high -> max
memory.max -> 17179869184 (16 Gibibyte)
memory.min -> 13743894528 (12 Gibibyte)

Once the workspace gets over the memory.high limit, performance is abysmal. This also explains why we do not see this behavior in XL workspaces because there memory.high is unlimited. Looking at memory psi of the workspace confirms that memory is the culprit:

some avg10=95.00 avg60=95.32 avg300=79.69 total=507270123
full avg10=94.33 avg60=94.72 avg300=79.05 total=502552889

There is no pressure at all on cpu and io.

@Furisto
Copy link
Member

Furisto commented Jul 6, 2022

This behavior is described here and in more detail here. The relevant parts from the KEP are:

This proposal sets requests.memory to memory.min for protecting container memory requests. limits.memory is set to memory.max (this is consistent with existing memory.limit_in_bytes for cgroups v1, we do nothing because cgroup_v2 has implemented for that).
We also introduce memory.high to throttle container memory overcommit allocation. It will be set based on a formula:
memory.high=limits.memory/node allocatable memory * memory throttling factor

If container sets limits.memory, we set memory.high=pod.spec.containers[i].resources.limits[memory] * memory throttling factor for container level cgroup if memory.high>memory.min

The value of the throttling factor is 0.8 by default and can be influenced through the Kubelet configuration:
https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

memoryThrottlingFactorfloat64 | MemoryThrottlingFactor specifies the factor multiplied by the memory limit or node allocatable memory when setting the cgroupv2 memory.high value to enforce MemoryQoS. Decreasing this factor will set lower high limit for container cgroups and put heavier reclaim pressure while increasing will put less reclaim pressure. See http://kep.k8s.io/2570 for more details. Default: 0.8

@aledbf aledbf closed this as completed Jul 6, 2022
Repository owner moved this from In Progress to Done in 🌌 Workspace Team Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: performance anything related to performance type: bug Something isn't working
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants