Load average can over-estimate demand for CPU #1114

sharnoff · 2024-10-16T01:59:49Z

Background

The autoscaler-agent calculates the "goal CU" based on demand for CPU, using the guest kernel's 1-minute load average metric.

Load average represents an exponentially weighted moving average, updated every 5 seconds, based on the instantaneous number of running or runnable tasks at that moment in time — i.e., it's an average of the queue size.

For workloads that are spiky in their parallelism, this can result in dramatic over-estimations if we interpret it as "demand" for CPU time. If there's 4x as many tasks as CPUs, each task may contribute 4x as much as they should to our measure of "demand" (because fair scheduling would result in all tasks being in the queue for 4x as long).

In practice we believe this issue is quite rare (hence: why this is marked as "tech debt"), but it's still worth addressing.

For more on load average, refer to:

Example of a user hitting this: https://neondb.slack.com/archives/C03TN5G758R/p1728409813336859

Implementation ideas

Don't use load average...?

It'd still be useful to get a measure of how much demand for CPU there is, but load average clearly doesn't give us that (and unfortunately CPU time won't, either).

sharnoff · 2024-11-08T00:56:33Z

There is an open RFC that will fix this issue here: https://www.notion.so/neondatabase/131f189e004780b2915ef2fdb95bae6a

sharnoff added the a/tech_debt Area: related to tech debt label Oct 16, 2024

sharnoff self-assigned this Nov 8, 2024

sharnoff mentioned this issue Nov 11, 2024

daemon,agent: Use custom load1 for CPU scaling #1136

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load average can over-estimate demand for CPU #1114

Load average can over-estimate demand for CPU #1114

sharnoff commented Oct 16, 2024 •

edited

Loading

sharnoff commented Nov 8, 2024 •

edited

Loading

Load average can over-estimate demand for CPU #1114

Load average can over-estimate demand for CPU #1114

Comments

sharnoff commented Oct 16, 2024 • edited Loading

Background

Implementation ideas

sharnoff commented Nov 8, 2024 • edited Loading

sharnoff commented Oct 16, 2024 •

edited

Loading

sharnoff commented Nov 8, 2024 •

edited

Loading