-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: replicate queue should more aggressively upreplicate #79318
Comments
FYI @lidorcarmel @kvoli |
I'm adding in a roachperf benchmark for this issue - ref #79940 and #80383. We currently take priority into account, with actions associated with up replication reasonable high up. This affects the next replica to process in the base queue: Do you think the issue is cadence of the replicate queue? i.e. it's blocking single consumer where it may take a while to actually perform each action. Are we processing replicas faster than queueing here? It may be that we are limited by the processing rate during up-replication. I'll look further into the queue length on the benchmark. |
related #79453 |
Thanks for looking into this @kvoli. I'm not very familiar with the details here, but I think the priority only applies to replicas that are already added to the queue. However, replicas are only added to the queue every 10 minutes, either in random order or ordered by range ID: cockroach/pkg/kv/kvserver/scanner.go Lines 234 to 236 in ea9ca8f
cockroach/pkg/kv/kvserver/scanner.go Lines 281 to 285 in ea9ca8f
So I suppose the priority would only come into play when there is a queue backlog, and then only for the replicas that are in the backlog. So I think it could take up to 10 minutes even with those priorities? |
That seems right. I believe this issue is also faced in decommissioning - as @lidorcarmel linked above. Aayush's solution to retry replicas #81005 seems like a promising direction. Even then, the 10 minutes is still an issue. Is there any reason why we couldn't "push" up-replication leaseholders into the queue? Given 10 minutes is a long tail for a worst case. I can see where this pattern might devolve into multiple scanner cadences with different objectives if we don't have an event to trigger enqueuing, however at the moment the indiscriminate store scanner seems too blunt an instrument for different timeliness requirements. |
We have marked this issue as stale because it has been inactive for |
We've often seen the replicate queue being very slow to process underreplicated ranges in large clusters (up to 1 million ranges).
The replica scanner is responsible for enqueueing replicas through the queue, with a target of 10 minutes per pass, and a minimum 10 milliseconds between each replica, as well as the queue processing time. It does not take replica states into account at all.
We need to make sure that the replica queue will prioritize underreplicated ranges and aggressively try to upreplicate them.
Jira issue: CRDB-14689
The text was updated successfully, but these errors were encountered: