Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent workspaces from being OOM killed #14507

Merged
merged 3 commits into from
Nov 10, 2022
Merged

Prevent workspaces from being OOM killed #14507

merged 3 commits into from
Nov 10, 2022

Conversation

Furisto
Copy link
Member

@Furisto Furisto commented Nov 8, 2022

Description

Prevent workspaces from being OOM killed. See the RFC for details.

Related Issue(s)

Fixes #13254

How to test

  • Open workspace
  • SSH into the VM
  • Check the processes in the workspace (use /proc/[pid]/oom_score_adj). It should be -1000 for tier 1 processes and -100 for tier 2 processes

Release Notes

Ensure that workspace do not crash due to OOM kills

Werft options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-integration-tests=all
    Valid options are all, workspace, webapp, ide

@Furisto Furisto added the team: workspace Issue belongs to the Workspace team label Nov 8, 2022
@Furisto Furisto self-assigned this Nov 8, 2022
@Furisto Furisto requested review from a team November 8, 2022 10:37
@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-fo-oom-score.4 because the annotations in the pull request description changed
(with .werft/ from main)

Copy link
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just read the RFC, sounds like a clever solution 🙂

I've made a small suggestion just in case we want to easily observe if this strategy is working as we expected or not, feel free to just ignore the suggestion if it doesn't make sense!

}
}

func (c *ProcessPriorityV2) adaptOOMScore(procType ProcessType, pid int64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add 2 counters here:

  • One for everytime the we try to adjust the OOM Score
  • Another everytime we fail to adjust the OOM Score

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I feel that just logging if we cannot adapt the oom score is enough.

Comment on lines +91 to +96
cgroup.ProcessWorkspaceKit: config.OOMScores.Tier1,
cgroup.ProcessSupervisor: config.OOMScores.Tier1,
cgroup.ProcessCodeServer: config.OOMScores.Tier1,
cgroup.ProcessIDE: config.OOMScores.Tier1,
cgroup.ProcessCodeServerHelper: config.OOMScores.Tier2,
cgroup.ProcessWebIDEHelper: config.OOMScores.Tier2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Furisto may I ask, did you socialize this improvement with the IDE team? I ask because it would make sense to get their feedback on this grouping and tier assignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gitpod-io/engineering-ide What do you think about this assignment? Tier 1 processes would not be oom killed with this change and for Tier 2 processes it is less likely. You can see how we identity the processes here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@Furisto Furisto requested a review from a team November 10, 2022 10:35
Copy link
Contributor

@jenting jenting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@roboquat roboquat merged commit 8c61412 into main Nov 10, 2022
@roboquat roboquat deleted the fo/oom-score branch November 10, 2022 12:43
@roboquat roboquat added the deployed: IDE IDE change is running in production label Nov 11, 2022
@atduarte
Copy link
Contributor

atduarte commented Nov 29, 2022

@Furisto is this missing the deployed: workspace and deployed label? Noticed as this didn't appear in the November's changelog 🤔

@Furisto
Copy link
Member Author

Furisto commented Nov 29, 2022

@atduarte It is deployed, but not enabled currently on SaaS because gen77 was not deployed from this.

@atduarte
Copy link
Contributor

atduarte commented Nov 29, 2022

@Furisto got it! Thank you 🙏

@roboquat roboquat added the deployed: workspace Workspace team change is running in production label Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production deployed: workspace Workspace team change is running in production release-note size/L team: IDE team: SID team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ws-daemon] avoid hitting workspace memory limits
7 participants