workspaces get sluggish as you approach 9GB or the pod memory limit #11156

kylos101 · 2022-07-05T17:19:39Z

Bug description

When you do this, the IDE will become slow or inaccessible.

Steps to recreate

Run this internal sample program to recreate.

Workspace affected

No response

Expected behavior

The workspace would ideally not become sluggish.

Example repository

No response

Anything else?

Next steps:
Recreate the behavior and determine the cause. Is it simply exceeding 9GB, regardless of the pod limit being 12 or 16GB? Our XL nodes have a memory limit of 16GB...do they encounter the same trouble at 9GB or 12GB?

Internal Slack reference

Furisto · 2022-07-06T08:48:26Z

Testing on XL workspaces does not reveal sluggishness when approaching the memory limit.
https://www.loom.com/share/74a0dcb1d2c1403599ae446d509340ac

Furisto · 2022-07-06T10:31:45Z

Default workspace get extremely sluggish if a lot of memory is consumed. The reason for this is different values for the cgroup compared to XL workspaces.

Default
memory.high -> 10307919872 (9.6 Gibibyte)
memory.max -> 12884901888 (12 Gibibyte)
memory.min -> 3435970560 (3.2 Gibibyte)

XL
memory.high -> max
memory.max -> 17179869184 (16 Gibibyte)
memory.min -> 13743894528 (12 Gibibyte)

Once the workspace gets over the memory.high limit, performance is abysmal. This also explains why we do not see this behavior in XL workspaces because there memory.high is unlimited. Looking at memory psi of the workspace confirms that memory is the culprit:

some avg10=95.00 avg60=95.32 avg300=79.69 total=507270123
full avg10=94.33 avg60=94.72 avg300=79.05 total=502552889

There is no pressure at all on cpu and io.

Furisto · 2022-07-06T11:06:47Z

This behavior is described here and in more detail here. The relevant parts from the KEP are:

This proposal sets requests.memory to memory.min for protecting container memory requests. limits.memory is set to memory.max (this is consistent with existing memory.limit_in_bytes for cgroups v1, we do nothing because cgroup_v2 has implemented for that).
We also introduce memory.high to throttle container memory overcommit allocation. It will be set based on a formula:
memory.high=limits.memory/node allocatable memory * memory throttling factor

If container sets limits.memory, we set memory.high=pod.spec.containers[i].resources.limits[memory] * memory throttling factor for container level cgroup if memory.high>memory.min

The value of the throttling factor is 0.8 by default and can be influenced through the Kubelet configuration:
https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

memoryThrottlingFactorfloat64 | MemoryThrottlingFactor specifies the factor multiplied by the memory limit or node allocatable memory when setting the cgroupv2 memory.high value to enforce MemoryQoS. Decreasing this factor will set lower high limit for container cgroups and put heavier reclaim pressure while increasing will put less reclaim pressure. See http://kep.k8s.io/2570 for more details. Default: 0.8

kylos101 added the type: bug Something isn't working label Jul 5, 2022

kylos101 added this to 🌌 Workspace Team Jul 5, 2022

kylos101 moved this to In Progress in 🌌 Workspace Team Jul 5, 2022

kylos101 assigned Furisto Jul 5, 2022

kylos101 added the aspect: performance anything related to performance label Jul 5, 2022

aledbf closed this as completed Jul 6, 2022

Repository owner moved this from In Progress to Done in 🌌 Workspace Team Jul 6, 2022

utam0k mentioned this issue Sep 26, 2022

[ws-daemon] avoid hitting workspace memory limits #13254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

kylos101 commented Jul 5, 2022 •

edited

Loading

Furisto commented Jul 6, 2022

Furisto commented Jul 6, 2022 •

edited

Loading

Furisto commented Jul 6, 2022 •

edited

Loading

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

workspaces get sluggish as you approach 9GB or the pod memory limit #11156

Comments

kylos101 commented Jul 5, 2022 • edited Loading

Bug description

Steps to recreate

Workspace affected

Expected behavior

Example repository

Anything else?

Furisto commented Jul 6, 2022

Furisto commented Jul 6, 2022 • edited Loading

Furisto commented Jul 6, 2022 • edited Loading

kylos101 commented Jul 5, 2022 •

edited

Loading

Furisto commented Jul 6, 2022 •

edited

Loading

Furisto commented Jul 6, 2022 •

edited

Loading