-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376
Comments
Tracking this in the 22.2 stability list since it's tripped up several roachtests through opaque ooms (backlinks above) and users are likely to run into it. Needs an owner, ideally someone from the Repl side. |
We understand what seems like the dominant class of memory-build up better now thanks to #98576. In short, when a follower is overloaded, SSTs will pile up in both the raft receive queue, and, to a larger degree, the |
We see that on 2xlarge this test runs likely runs into its EBS bandwidth limits. The easiest way to avoid that is to switch to a beefier machine, which doubles its bandwidth limits. We should also survive being bandwidth-limited, but currently don't do reliably - this is tracked in cockroachdb#73376. Epic: CRDB-25503 Release note: None
Description
In #71802 (comment), we are seeing occasional failures due to nodes running out of memory. The heap profiles show large amounts of memory allocated by loading sideloaded SSTs into memory for appending to followers. Each individual raft leader will only pull ~one SST per append (due to our 32kb max-append-size target) but it may do so for each follower, meaning that for every leader in the system, we can expect at most
num_followers * sst_size
to be pulled into memory per raft cycle. Unfortunately, outgoing messages are buffered and so even a single group might put a theoretical limit of 10k SSTs into memory.We don't have a single group but potentially tens of thousands of them, and theoretically each of them can do the above (though they all share the 10k limit or messages will be dropped wholesale). In practice, the quota pool should, on each leader, prevent too many SSTs from entering the raft layer before they've been fully distributed to the followers. The quota pool size is half the raft log truncation threshold which is 16mb, i.e. we have an 8mb proposal quota, so really, assuming SSTs that are no larger than 8mb, we expect to have only 8mb*num_followers in flight at any given time, per local raft leader.
Here we saw the heap profile track 2.11GiB. Unfortunately, we don't have the artifacts any more but even with them, it might be difficult to find out whether we are dealing with a small number of extraordinarily large SSTs vs a homogeneous flood of reasonably-sized SSTs. Still, investigating another occurrence would be helpful, in particular with an eye of when during the restore the problem occurs.
Action items
Jira issue: CRDB-11564
Epic CRDB-39898
The text was updated successfully, but these errors were encountered: