-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-daemon] avoid hitting workspace memory limits #13254
Comments
@kylos101 This problem happens with cgroup v2(SaaS), right? I'd like to know if we have to consider cgroup v1. |
@utam0k it happens with cgroup v1 on self-hosted, not sure about saas and cgroup v2. |
Oops, that was unexpected. Thanks. |
For cgroup v1 it can't support memory QoS. Currently we avoid OOM by clearing the cache periodically, but this will not be enough for the self-hosted customer. The easiest workaround is to upgrade cgroup v2. But I'll give it some thought. |
I'll have to experiment a bit more to fully understand the behavior, but what about having a two-level cgroup? |
In cgroup v2, this is different, but there are already two levels of cgroup. |
It might be possible to make this second level value user-configurable by giving the user the cgroup namespace; I don't see any reason not to use the cgroup namespace in cgroup v1 as far as I know, so it should be possible. Well, this is an optional feature. |
@kylos101 Let me restate this to ensure that we are on the same page: We currently have (slightly simplified)
What you would like is for the user processes to run inside a different cgroup than workspacekit, supervisor and ide. Is that correct? |
As a side note, I am having users in support tickets whom are reporting this to run the following in a workspace and then to send me over the created log file: for i in `seq 1 60` ; do date |tee -a memory.log ; free --mega | tee -a memory.log ; echo '' >> memory.log ; echo '' >> memory.log ; sleep 60 ; done |
@Furisto Potentially? What would the user experience be like if we did that? For example, would user processes be oom killed within the workspace, making it less likely for the workspace to be oom killed? Assuming the IDE and extensions memory usage remains low. |
I expect the user's process to be OOM because the mem hard limit of the user's cgroup is smaller. The supervisor should remain and the workspace should remain.l |
Another way to use |
I have been thinking in the same direction, specifically I was inspired by what Facebook has done with oomd. Oomd would not work for self-hosted as it only supports cgroup v2, but the principle is interesting. We could use oomd for SaaS and write our own implementation for cgroup v1 or just use our own implementation for both. Another option that could be simpler to implement and work for cgroup v1 and v2 would be to modify the oom_score_adj value. If we adjust the oom score of IDE, we would have to run this in a go routine I think as child processes would inherit the value of the parent but that is not too bad. I will spend some time prototyping this approach. |
Some more info on the second approach: The badness heuristic is used to determine which process should be killed in out of memory situtations by the OOM killer. The result of this calculation is surfaced by /proc/[pid]/oom_score for each process. A higher score means that a process is more likely to be killed. The score can be adjusted (provided the writing process has CAP_SYS_RESOURCE))) by writing to either /proc/[pid]/oom_adj (deprecated) or the newer /proc/[pid]/oom_score_adj. Values between +1000 and -1000 can be written to /proc/[pid]/oom_score_adj. A value of -1000 means that the process will never be killed by the OOM killer. Whenever an out of memory situation occurs out_of_memory is called. Depending on how the out of memory condition was caused (either memory is scarce system wide or just in the cgroup) either all processes or the processes in the cgroup and its child cgroups are scanned and for each process the badness score is calculated. Some tasks like kernel tasks and the init process are exempt and cannot be killed. The badness score finds the task with the highest memory consumption (as measured by the tasks resident set size (RSS), pagetable and swap space usage) and then modifies it with the oom_score_adj. The resident set size is the memory consumed by file memory, anonymous memory and shem (which is tmpfs memory, SysV shared memory, |
@Furisto PSI is a very interesting tech to me, but I was concerned that oom(PSI) might lead to complications for our product. WDYT? I think simple is better. I think all approaches are effective in their own way. Excellent. As for cgroup v2, can't memory QoS deal with this to some extent? But this issue hasn't been put into the schedules column. Why not list the methods in order of cost-effectiveness first, and then try them one at a time to examine their effectiveness? How about trying |
I think we should socialize in RFC before attempting. WDYT?, @Furisto |
@Furisto Thanks ❤️ |
Here's an example of a workspace getting OOMKilled on the saas / with cgroup v2. 🤔 @Furisto @utam0k may I ask you to update the RFC accordingly? |
@kylos101 @Furisto FYI: It happened four times in the last 7 days on the gitpod.io |
This has not been deployed with gen77 (probably due to the way it was deployed). Need to wait for gen78. |
Is your feature request related to a problem? Please describe
Sometimes the Workspace container gets oomkilled because the node's oomkiller is triggered due to high memory utilization.
Thanks, @aledbf , for the idea!
@csweichel @Furisto @utam0k wdyt? 🤔 @atduarte for awareness.
Describe the behaviour you'd like
Wrap the cgroup inside the workspace with a lower memory limit. Like by 5% and then the process inside the pod will be killed, instead of the workspace container.
Describe alternatives you've considered
n/a
Additional context
Refer to this internal incident for background.
Front conversations
The text was updated successfully, but these errors were encountered: