Starting new workspace causes it to fail with OOM error from kubelet #8253

sagor999 · 2022-02-16T18:00:47Z

Bug description

Sometimes workspaces fail to start due to OOM error from kubelet.
We suspect it happens when node is at capacity, but has workspaces that are still terminating.
It seems like k8s scheduler ignores terminating pods, but kubelet doesn't.
So scheduler schedules a pod on a node that it think it should be able to run on, but then kubelet rejects it with OOM error.

This seem to be related to this:
kubernetes/kubernetes#106884
kubernetes/kubernetes#104560

I will create a controller that will cordon node when it reached maximum capacity of workspaces on it as a temporary workaround for this issue.

Steps to reproduce

Workspace affected

No response

Expected behavior

No response

Example repository

No response

Anything else?

No response

sagor999 · 2022-02-16T18:01:59Z

Related: #8238

sagor999 · 2022-02-16T18:03:53Z

Related: #7969
#7969

sagor999 · 2022-02-17T22:28:26Z

new approach based on @aledbf fix:
currently ws-manager just creates the pod and expects that it will get created.
Since k8s 1.22 that is no longer valid approach, and instead it should try to create the pod, and if it failed, try again. As it can fail due to out of resource errors (like cpu or memory).

sagor999 added the team: workspace Issue belongs to the Workspace team label Feb 16, 2022

sagor999 self-assigned this Feb 16, 2022

sagor999 added this to 🌌 Workspace Team Feb 16, 2022

sagor999 moved this to In Progress in 🌌 Workspace Team Feb 16, 2022

This was referenced Feb 17, 2022

[ws-manager] Wait for workspace pod to be ready #8258

Closed

[ws-manager] Wait for workspace pod to be ready #8289

Merged

roboquat closed this as completed in #8289 Feb 18, 2022

Repository owner moved this from In Progress to Done in 🌌 Workspace Team Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starting new workspace causes it to fail with OOM error from kubelet #8253

Starting new workspace causes it to fail with OOM error from kubelet #8253

sagor999 commented Feb 16, 2022

sagor999 commented Feb 16, 2022

sagor999 commented Feb 16, 2022

sagor999 commented Feb 17, 2022

Starting new workspace causes it to fail with OOM error from kubelet #8253

Starting new workspace causes it to fail with OOM error from kubelet #8253

Comments

sagor999 commented Feb 16, 2022

Bug description

Steps to reproduce

Workspace affected

Expected behavior

Example repository

Anything else?

sagor999 commented Feb 16, 2022

sagor999 commented Feb 16, 2022

sagor999 commented Feb 17, 2022