Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: snapshot rate limiting can overshoot #58920

Open
tbg opened this issue Jan 13, 2021 · 7 comments
Open

kvserver: snapshot rate limiting can overshoot #58920

tbg opened this issue Jan 13, 2021 · 7 comments
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jan 13, 2021

Describe the problem

We limit the transfer rate of outgoing snapshots via the snapshot.{recovery,rebalance}.max_rate cluster setting. However, this is not a per-node limit as it ought to be because

  1. each Store has a raft snapshot queue, so there can be #Stores outgoing snapshots at any given time
  2. within each Store, the replicate queue and the raft snapshot queue may both be sending a snapshot (to different recipients) at any given time

In aggregate, with a configured rate limit of N mb/s, we may in the worst case see 2 * #Stores * N mb/s of outgoing bandwidth used. This can then saturate the available bandwidth and cause high tail latencies or worse, stability issues.

To Reproduce

Expected behavior

The cluster settings mentioned above should place an upper bound on the bandwidth allocated to snapshots. Note that it isn't good enough to simply share a rate limiter between all snapshot senders on the node because the queues also use the rate limit to compute their context deadline. We need to explicitly allocate from the budget. It may be easier to allow only one snapshot inflight, though that then comes with awkward blocking.

Additional data / screenshots

Environment:

Additional context

Jira issue: CRDB-3359

@tbg tbg added A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jan 13, 2021
@giorgosp
Copy link
Contributor

giorgosp commented Feb 21, 2021

Hey @tbg! I could help with this one, is it up for grabs?

I am thinking that each queue may keep using the original setting value for the process timeout. makeRateLimitedTimeoutFunc() already multiplies the snapshot flight duration by an order of magnitude, due topermittedRangeScanSlowdown(=10).

totalBytes := stats.KeyBytes + stats.ValBytes + stats.IntentBytes + stats.SysBytes
estimatedDuration := time.Duration(totalBytes/snapshotRate) * time.Second
timeout := estimatedDuration * permittedRangeScanSlowdown

So process timeout would still be 8s * 10 instead of, let's say, 2s * 10 (where 2mb/s would be the queue's budget).

For rate limiting snapshot streaming, each queue could budget a separate rate limiter, that would propagate all the way down to sendSnapshot().sendSnapshot() would stop using snapshotRateLimit() to create a rate limiter directly from the setting, but will always use the passed rate limiter.

// Consult cluster settings to determine rate limits and batch sizes.
targetRate, err := snapshotRateLimit(st, header.Priority)
if err != nil {
return errors.Wrapf(err, "%s", to)
}
batchSize := snapshotSenderBatchSize.Get(&st.SV)
// Convert the bytes/sec rate limit to batches/sec.
//
// TODO(peter): Using bytes/sec for rate limiting seems more natural but has
// practical difficulties. We either need to use a very large burst size
// which seems to disable the rate limiting, or call WaitN in smaller than
// burst size chunks which caused excessive slowness in testing. Would be
// nice to figure this out, but the batches/sec rate limit works for now.
limiter := rate.NewLimiter(targetRate/rate.Limit(batchSize), 1 /* burst size */)

This fix should be simple to implement. Even simpler if we would consolidate the two settings into one, see #39200.

We could also follow up with something smarter and more complex, like sendSnapshot() dynamically adjusting the rate limit depending on how many in-flight snapshots it thinks are there. But I haven't thought this out at all.

What do you think? :)

@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@tbg tbg added A-kv-distribution Relating to rebalancing and leasing. and removed A-kv-replication Relating to Raft, consensus, and coordination. labels Sep 22, 2021
@tbg
Copy link
Member Author

tbg commented Nov 26, 2021

PS for posterity, I'd talked to Giorgo on the community slack back in February and we decided it wasn't a good project for now.

@tbg
Copy link
Member Author

tbg commented Nov 26, 2021

A similar problem to that stated above exists with the store snapshot semaphore. A node allows one incoming snapshot per store but this is supposed to be per node. We have users running with multiple stores and there are reports of instability during periods of the allocator sending multiple snapshots at once.

@erikgrinaker
Copy link
Contributor

A node allows one incoming snapshot per store but this is supposed to be per node.

I'm not sure if this is necessarily true. I believe the reason for limiting these is mostly because applying multiple snapshots will exceed the IO capacity of the underlying storage. However, each store is expected to have separate storage, so there is a large performance benefit in applying snapshots in parallel to multiple stores. This will come with some memory overhead, but I suppose that's the cost of running multiple stores and operators need to provision nodes accordingly.

@tbg
Copy link
Member Author

tbg commented Dec 2, 2021

IIRC the original motivation was avoiding overloading the network (which is shared between stores) but both are valid reasons for throttling.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@erikgrinaker
Copy link
Contributor

Related to #103879.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants