[RFE] Prevent the OOM killer to hit critical services #1427

pothos · 2024-04-16T09:17:01Z

Current situation

When running low on memory Flatcar currently relies on the kernel's OOM killer to kill processes. Flatcar does not make use of systemd-oomd yet. When the kernel kills processes, it can hit critical system services.

Impact

Hitting critical system services can render the system unresponsive as observed by @jepio.

Ideal future situation

Instead of killing processes as last resort we can use systemd-oomd to evaluate cgroups memory usage and terminate cgroups instead of single processes and do this earlier than the kernel would do to ensure that the system stays responsive. Terminating whole cgroups means that the action is more coordinated and impactful than killing random child or parent processes. Using the cgroup memory accounting means that the termination hits something that is responsible for the OOM than when the kernel OOM killer would do.

To prevent both the kernel OOM killer and systemd-oomd to hit critical services one can set OOMScoreAdjust= and MemoryMin=.
To steer the systemd-oomd towards killing a certain unit one can set ManagedOOMSwap=kill and ManagedOOMMemoryPressure=kill.

Implementation options

Enable systemd-oomd by default on Flatcar.
Set OOMScoreAdjust= and MemoryMin= for critical service units.
Set a drop-in for docker .scope units to have ManagedOOMSwap=kill and ManagedOOMMemoryPressure=kill.

Additional information

Docker containers run under docker-….scope which is part of system.slice. The same is true for other user-defined workloads that don't spawn new cgroups directly under the root slice. Therefore, setting protections for the system slice is probably too broad and we would really have to identify which units we need to keep running and maintain this "allow list" as long as the upstream units don't set the OOMScoreAdjust= and MemoryMin= already.

The text was updated successfully, but these errors were encountered:

till · 2024-04-16T09:27:51Z

We move workloads into a slice to avoid them breaking the system.

Been doing it for a couple years atp, never got to having crashes of Flatcar/OS components.

jepio · 2024-04-17T08:31:17Z

@till can you share the details of your config? we might draw inspiration from that

till · 2024-04-17T10:16:48Z

@jepio We do this for docker currently, so we configure cgroup-parent in /etc/docker/daemon.json.

The slice itself looks similar to this:

# https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
[Slice]
CPUAccounting=yes
CPUQuota={{ cpu_quota_percent }}%
MemoryAccounting=yes
# Systemd > 231 (ignored for older versions)
MemoryHigh={{ memory_high_percent }}%
MemoryMax={{ memory_max_percent }}%
MemorySwapMax=0
# Systemd 219, as on CoreOS7
MemoryLimit={{ memory_limit_mb }}M

[Install]
Before=docker.service

pothos added the kind/feature A feature request label Apr 16, 2024

github-project-automation bot added this to Flatcar tactical, release planning, and roadmap Apr 16, 2024

github-project-automation bot moved this to 📝 Needs Triage in Flatcar tactical, release planning, and roadmap Apr 16, 2024

pothos moved this from 📝 Needs Triage to 🪵Backlog in Flatcar tactical, release planning, and roadmap Apr 16, 2024

github-actions bot mentioned this issue Apr 22, 2024

Monthly contributions report 2024-03-22 - 2024-04-21 #1435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Prevent the OOM killer to hit critical services #1427

[RFE] Prevent the OOM killer to hit critical services #1427

pothos commented Apr 16, 2024

till commented Apr 16, 2024

jepio commented Apr 17, 2024

till commented Apr 17, 2024

[RFE] Prevent the OOM killer to hit critical services #1427

[RFE] Prevent the OOM killer to hit critical services #1427

Comments

pothos commented Apr 16, 2024

Current situation

Impact

Ideal future situation

Implementation options

Additional information

till commented Apr 16, 2024

jepio commented Apr 17, 2024

till commented Apr 17, 2024