kvserver: snapshot rate limiting can overshoot #58920

tbg · 2021-01-13T08:29:48Z

Describe the problem

We limit the transfer rate of outgoing snapshots via the snapshot.{recovery,rebalance}.max_rate cluster setting. However, this is not a per-node limit as it ought to be because

each Store has a raft snapshot queue, so there can be #Stores outgoing snapshots at any given time
within each Store, the replicate queue and the raft snapshot queue may both be sending a snapshot (to different recipients) at any given time

In aggregate, with a configured rate limit of N mb/s, we may in the worst case see 2 * #Stores * N mb/s of outgoing bandwidth used. This can then saturate the available bandwidth and cause high tail latencies or worse, stability issues.

To Reproduce

Expected behavior

The cluster settings mentioned above should place an upper bound on the bandwidth allocated to snapshots. Note that it isn't good enough to simply share a rate limiter between all snapshot senders on the node because the queues also use the rate limit to compute their context deadline. We need to explicitly allocate from the budget. It may be easier to allow only one snapshot inflight, though that then comes with awkward blocking.

Additional data / screenshots

Environment:

Additional context

Jira issue: CRDB-3359

The text was updated successfully, but these errors were encountered:

giorgosp · 2021-02-21T21:32:50Z

Hey @tbg! I could help with this one, is it up for grabs?

I am thinking that each queue may keep using the original setting value for the process timeout. makeRateLimitedTimeoutFunc() already multiplies the snapshot flight duration by an order of magnitude, due topermittedRangeScanSlowdown(=10).

cockroach/pkg/kv/kvserver/queue.go

Lines 86 to 88 in ea9074b

    
           totalBytes := stats.KeyBytes + stats.ValBytes + stats.IntentBytes + stats.SysBytes 
        
           estimatedDuration := time.Duration(totalBytes/snapshotRate) * time.Second 
        
           timeout := estimatedDuration * permittedRangeScanSlowdown

So process timeout would still be 8s * 10 instead of, let's say, 2s * 10 (where 2mb/s would be the queue's budget).

For rate limiting snapshot streaming, each queue could budget a separate rate limiter, that would propagate all the way down to sendSnapshot().sendSnapshot() would stop using snapshotRateLimit() to create a rate limiter directly from the setting, but will always use the passed rate limiter.

cockroach/pkg/kv/kvserver/store_snapshot.go

Lines 1113 to 1127 in be2ed80

    
           // Consult cluster settings to determine rate limits and batch sizes. 
        
           targetRate, err := snapshotRateLimit(st, header.Priority) 
        
           if err != nil { 
        
           	return errors.Wrapf(err, "%s", to) 
        
           } 
        
           batchSize := snapshotSenderBatchSize.Get(&st.SV) 
        
           // Convert the bytes/sec rate limit to batches/sec. 
        
           // 
        
           // TODO(peter): Using bytes/sec for rate limiting seems more natural but has 
        
           // practical difficulties. We either need to use a very large burst size 
        
           // which seems to disable the rate limiting, or call WaitN in smaller than 
        
           // burst size chunks which caused excessive slowness in testing. Would be 
        
           // nice to figure this out, but the batches/sec rate limit works for now. 
        
           limiter := rate.NewLimiter(targetRate/rate.Limit(batchSize), 1 /* burst size */)

This fix should be simple to implement. Even simpler if we would consolidate the two settings into one, see #39200.

We could also follow up with something smarter and more complex, like sendSnapshot() dynamically adjusting the rate limit depending on how many in-flight snapshots it thinks are there. But I haven't thought this out at all.

What do you think? :)

tbg · 2021-11-26T15:00:30Z

PS for posterity, I'd talked to Giorgo on the community slack back in February and we decided it wasn't a good project for now.

tbg · 2021-11-26T15:01:55Z

A similar problem to that stated above exists with the store snapshot semaphore. A node allows one incoming snapshot per store but this is supposed to be per node. We have users running with multiple stores and there are reports of instability during periods of the allocator sending multiple snapshots at once.

erikgrinaker · 2021-11-29T09:36:08Z

A node allows one incoming snapshot per store but this is supposed to be per node.

I'm not sure if this is necessarily true. I believe the reason for limiting these is mostly because applying multiple snapshots will exceed the IO capacity of the underlying storage. However, each store is expected to have separate storage, so there is a large performance benefit in applying snapshots in parallel to multiple stores. This will come with some memory overhead, but I suppose that's the cost of running multiple stores and operators need to provision nodes accordingly.

tbg · 2021-12-02T09:41:52Z

IIRC the original motivation was avoiding overloading the network (which is shared between stores) but both are valid reasons for throttling.

github-actions · 2023-09-05T11:10:04Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

erikgrinaker · 2023-09-05T12:14:03Z

Related to #103879.

tbg added A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jan 13, 2021

jlinder added the T-kv KV Team label Jun 16, 2021

tbg added A-kv-distribution Relating to rebalancing and leasing. and removed A-kv-replication Relating to Raft, consensus, and coordination. labels Sep 22, 2021

github-actions bot added the no-issue-activity label Sep 5, 2023

erikgrinaker removed the no-issue-activity label Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: snapshot rate limiting can overshoot #58920

kvserver: snapshot rate limiting can overshoot #58920

tbg commented Jan 13, 2021 •

edited by cockroach-jira-scripts

Loading

giorgosp commented Feb 21, 2021 •

edited

Loading

tbg commented Nov 26, 2021

tbg commented Nov 26, 2021

erikgrinaker commented Nov 29, 2021

tbg commented Dec 2, 2021

github-actions bot commented Sep 5, 2023

erikgrinaker commented Sep 5, 2023

kvserver: snapshot rate limiting can overshoot #58920

kvserver: snapshot rate limiting can overshoot #58920

Comments

tbg commented Jan 13, 2021 • edited by cockroach-jira-scripts Loading

giorgosp commented Feb 21, 2021 • edited Loading

tbg commented Nov 26, 2021

tbg commented Nov 26, 2021

erikgrinaker commented Nov 29, 2021

tbg commented Dec 2, 2021

github-actions bot commented Sep 5, 2023

erikgrinaker commented Sep 5, 2023

tbg commented Jan 13, 2021 •

edited by cockroach-jira-scripts

Loading

giorgosp commented Feb 21, 2021 •

edited

Loading