You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running low on memory Flatcar currently relies on the kernel's OOM killer to kill processes. Flatcar does not make use of systemd-oomd yet. When the kernel kills processes, it can hit critical system services.
Impact
Hitting critical system services can render the system unresponsive as observed by @jepio.
Ideal future situation
Instead of killing processes as last resort we can use systemd-oomd to evaluate cgroups memory usage and terminate cgroups instead of single processes and do this earlier than the kernel would do to ensure that the system stays responsive. Terminating whole cgroups means that the action is more coordinated and impactful than killing random child or parent processes. Using the cgroup memory accounting means that the termination hits something that is responsible for the OOM than when the kernel OOM killer would do.
To prevent both the kernel OOM killer and systemd-oomd to hit critical services one can set OOMScoreAdjust= and MemoryMin=.
To steer the systemd-oomd towards killing a certain unit one can set ManagedOOMSwap=kill and ManagedOOMMemoryPressure=kill.
Implementation options
Enable systemd-oomd by default on Flatcar.
Set OOMScoreAdjust= and MemoryMin= for critical service units.
Set a drop-in for docker .scope units to have ManagedOOMSwap=kill and ManagedOOMMemoryPressure=kill.
Additional information
Docker containers run under docker-….scope which is part of system.slice. The same is true for other user-defined workloads that don't spawn new cgroups directly under the root slice. Therefore, setting protections for the system slice is probably too broad and we would really have to identify which units we need to keep running and maintain this "allow list" as long as the upstream units don't set the OOMScoreAdjust= and MemoryMin= already.
The text was updated successfully, but these errors were encountered:
Current situation
When running low on memory Flatcar currently relies on the kernel's OOM killer to kill processes. Flatcar does not make use of
systemd-oomd
yet. When the kernel kills processes, it can hit critical system services.Impact
Hitting critical system services can render the system unresponsive as observed by @jepio.
Ideal future situation
Instead of killing processes as last resort we can use systemd-oomd to evaluate cgroups memory usage and terminate cgroups instead of single processes and do this earlier than the kernel would do to ensure that the system stays responsive. Terminating whole cgroups means that the action is more coordinated and impactful than killing random child or parent processes. Using the cgroup memory accounting means that the termination hits something that is responsible for the OOM than when the kernel OOM killer would do.
To prevent both the kernel OOM killer and systemd-oomd to hit critical services one can set
OOMScoreAdjust=
andMemoryMin=
.To steer the systemd-oomd towards killing a certain unit one can set
ManagedOOMSwap=kill
andManagedOOMMemoryPressure=kill
.Implementation options
Enable systemd-oomd by default on Flatcar.
Set
OOMScoreAdjust=
andMemoryMin=
for critical service units.Set a drop-in for docker
.scope
units to haveManagedOOMSwap=kill
andManagedOOMMemoryPressure=kill
.Additional information
Docker containers run under
docker-….scope
which is part ofsystem.slice
. The same is true for other user-defined workloads that don't spawn new cgroups directly under the root slice. Therefore, setting protections for the system slice is probably too broad and we would really have to identify which units we need to keep running and maintain this "allow list" as long as the upstream units don't set theOOMScoreAdjust=
andMemoryMin=
already.The text was updated successfully, but these errors were encountered: