kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

tbg · 2021-12-02T09:39:55Z

Description

In #71802 (comment), we are seeing occasional failures due to nodes running out of memory. The heap profiles show large amounts of memory allocated by loading sideloaded SSTs into memory for appending to followers. Each individual raft leader will only pull ~one SST per append (due to our 32kb max-append-size target) but it may do so for each follower, meaning that for every leader in the system, we can expect at most num_followers * sst_size to be pulled into memory per raft cycle. Unfortunately, outgoing messages are buffered and so even a single group might put a theoretical limit of 10k SSTs into memory.
We don't have a single group but potentially tens of thousands of them, and theoretically each of them can do the above (though they all share the 10k limit or messages will be dropped wholesale). In practice, the quota pool should, on each leader, prevent too many SSTs from entering the raft layer before they've been fully distributed to the followers. The quota pool size is half the raft log truncation threshold which is 16mb, i.e. we have an 8mb proposal quota, so really, assuming SSTs that are no larger than 8mb, we expect to have only 8mb*num_followers in flight at any given time, per local raft leader.

Here we saw the heap profile track 2.11GiB. Unfortunately, we don't have the artifacts any more but even with them, it might be difficult to find out whether we are dealing with a small number of extraordinarily large SSTs vs a homogeneous flood of reasonably-sized SSTs. Still, investigating another occurrence would be helpful, in particular with an eye of when during the restore the problem occurs.

Action items

add a histogram of raft append sizes (making sure that it allows us to distinguish between few large vs many reasonable msgs)
switch from the cardinality-based queuing here to msg-size-based, and selectively drop messages that don't fit into the queue (how to size the queue will be an open question)

Jira issue: CRDB-11564

Epic CRDB-39898

The text was updated successfully, but these errors were encountered:

irfansharif · 2022-08-18T15:39:58Z

Tracking this in the 22.2 stability list since it's tripped up several roachtests through opaque ooms (backlinks above) and users are likely to run into it. Needs an owner, ideally someone from the Repl side.

tbg · 2023-03-16T13:56:57Z

We understand what seems like the dominant class of memory-build up better now thanks to #98576. In short, when a follower is overloaded, SSTs will pile up in both the raft receive queue, and, to a larger degree, the RawNode.raft.raftLog.unstable slice. We see many GBs of data in these locations combined, but it doesn't seem like any individual replica dominates - it's death by a few dozen moderately severe cuts (i.e. 30 * 200 mb or thereabouts).

We see that on 2xlarge this test runs likely runs into its EBS bandwidth limits. The easiest way to avoid that is to switch to a beefier machine, which doubles its bandwidth limits. We should also survive being bandwidth-limited, but currently don't do reliably - this is tracked in cockroachdb#73376. Epic: CRDB-25503 Release note: None

tbg added C-investigation Further steps needed to qualify. C-label will change. A-kv-replication Relating to Raft, consensus, and coordination. labels Dec 2, 2021

erikgrinaker mentioned this issue May 5, 2022

roachtest: import/tpcc/warehouses=4000/geo failed [raft sideload oom] #76824

Closed

stevendanna mentioned this issue May 17, 2022

roachtest: import/tpcc/warehouses=4000/geo failed #81186

Closed

exalate-issue-sync bot added the T-kv-replication label May 19, 2022

tbg mentioned this issue Jun 3, 2022

kvserver: throttle writes on followers #79215

Closed

adityamaru mentioned this issue Jul 29, 2022

roachtest: import/tpcc/warehouses=4000/geo failed (job session ID missing) #85310

Closed

irfansharif mentioned this issue Aug 18, 2022

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [raft sideload oom] #86101

Closed

blathers-crl bot added the T-kv KV Team label Aug 18, 2022

exalate-issue-sync bot removed the T-kv KV Team label Aug 22, 2022

erikgrinaker mentioned this issue Aug 29, 2022

roachtest: tpcc/mixed-headroom/n5cpu16 failed #83079

Closed

erikgrinaker mentioned this issue Sep 12, 2022

roachtest: increase memory for tests that see Raft OOMs #87809

Closed

stevendanna mentioned this issue Sep 19, 2022

roachtest: import/tpcc/warehouses=4000/geo failed #87811

Closed

erikgrinaker mentioned this issue Sep 29, 2022

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #88986

Closed

erikgrinaker mentioned this issue Oct 18, 2022

roachtest: import/tpch/nodes=8 failed #90021

Closed

pav-kv mentioned this issue Feb 23, 2023

roachtest: restore/tpce/8TB/aws/nodes=10/cpus=8 failed #97019

Closed

erikgrinaker mentioned this issue Mar 14, 2023

kvserver: reduce reproposals of AddSSTable and other large commands #98563

Open

tbg changed the title ~~kvserver: unbounded memory use when appending sideloaded proposals~~ kvserver: unbounded memory use when falling behind on sideloaded MsgApp Mar 16, 2023

tbg mentioned this issue Mar 16, 2023

roachtest: use c5, not c5d, for restore 8tb test #98767

Closed

adityamaru mentioned this issue Jul 10, 2023

roachtest: restore/tpce/8TB/aws/nodes=10/cpus=8 failed [CRDB-25503 replication send oom] #106496

Closed

irfansharif mentioned this issue Jul 10, 2023

roachtest: restore/tpce/32TB/inc-count=400/aws/nodes=15/cpus=16 failed #106486

Closed

irfansharif mentioned this issue Jul 10, 2023

roachtest: restore/tpce/400GB/aws/nodes=4/cpus=8 failed [CRDB-25503 replication send oom] #106248

Closed

pav-kv self-assigned this Aug 8, 2023

pav-kv mentioned this issue Aug 9, 2023

roachtest: bump AWS provisioned write bandwidth for all restore tests #107609

Closed

rhu713 mentioned this issue Sep 18, 2023

roachtest: restore/tpce/8TB/aws/nodes=10/cpus=8 failed #110764

Closed

pav-kv mentioned this issue Sep 26, 2023

kvserver: add roachtests for raft memory pressure #111259

Open

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

tbg commented Dec 2, 2021 •

edited by exalate-issue-sync bot

Loading

irfansharif commented Aug 18, 2022 •

edited

Loading

tbg commented Mar 16, 2023

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

kvserver: unbounded memory use when falling behind on sideloaded MsgApp #73376

Comments

tbg commented Dec 2, 2021 • edited by exalate-issue-sync bot Loading

Description

Action items

irfansharif commented Aug 18, 2022 • edited Loading

tbg commented Mar 16, 2023

tbg commented Dec 2, 2021 •

edited by exalate-issue-sync bot

Loading

irfansharif commented Aug 18, 2022 •

edited

Loading