-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: snapshot rate limiting can overshoot #58920
Comments
Hey @tbg! I could help with this one, is it up for grabs? I am thinking that each queue may keep using the original setting value for the process timeout. cockroach/pkg/kv/kvserver/queue.go Lines 86 to 88 in ea9074b
So process timeout would still be 8s * 10 instead of, let's say, 2s * 10 (where 2mb/s would be the queue's budget). For rate limiting snapshot streaming, each queue could budget a separate rate limiter, that would propagate all the way down to cockroach/pkg/kv/kvserver/store_snapshot.go Lines 1113 to 1127 in be2ed80
This fix should be simple to implement. Even simpler if we would consolidate the two settings into one, see #39200. We could also follow up with something smarter and more complex, like What do you think? :) |
PS for posterity, I'd talked to Giorgo on the community slack back in February and we decided it wasn't a good project for now. |
A similar problem to that stated above exists with the store snapshot semaphore. A node allows one incoming snapshot per store but this is supposed to be per node. We have users running with multiple stores and there are reports of instability during periods of the allocator sending multiple snapshots at once. |
I'm not sure if this is necessarily true. I believe the reason for limiting these is mostly because applying multiple snapshots will exceed the IO capacity of the underlying storage. However, each store is expected to have separate storage, so there is a large performance benefit in applying snapshots in parallel to multiple stores. This will come with some memory overhead, but I suppose that's the cost of running multiple stores and operators need to provision nodes accordingly. |
IIRC the original motivation was avoiding overloading the network (which is shared between stores) but both are valid reasons for throttling. |
We have marked this issue as stale because it has been inactive for |
Related to #103879. |
Describe the problem
We limit the transfer rate of outgoing snapshots via the
snapshot.{recovery,rebalance}.max_rate
cluster setting. However, this is not a per-node limit as it ought to be becauseStore
has a raft snapshot queue, so there can be#Stores
outgoing snapshots at any given timeStore
, the replicate queue and the raft snapshot queue may both be sending a snapshot (to different recipients) at any given timeIn aggregate, with a configured rate limit of
N mb/s
, we may in the worst case see2 * #Stores * N mb/s
of outgoing bandwidth used. This can then saturate the available bandwidth and cause high tail latencies or worse, stability issues.To Reproduce
Expected behavior
The cluster settings mentioned above should place an upper bound on the bandwidth allocated to snapshots. Note that it isn't good enough to simply share a rate limiter between all snapshot senders on the node because the queues also use the rate limit to compute their context deadline. We need to explicitly allocate from the budget. It may be easier to allow only one snapshot inflight, though that then comes with awkward blocking.
Additional data / screenshots
Environment:
Additional context
Jira issue: CRDB-3359
The text was updated successfully, but these errors were encountered: