Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: replicate queue should more aggressively upreplicate #79318

Open
erikgrinaker opened this issue Apr 4, 2022 · 6 comments
Open

kvserver: replicate queue should more aggressively upreplicate #79318

erikgrinaker opened this issue Apr 4, 2022 · 6 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 4, 2022

We've often seen the replicate queue being very slow to process underreplicated ranges in large clusters (up to 1 million ranges).

The replica scanner is responsible for enqueueing replicas through the queue, with a target of 10 minutes per pass, and a minimum 10 milliseconds between each replica, as well as the queue processing time. It does not take replica states into account at all.

We need to make sure that the replica queue will prioritize underreplicated ranges and aggressively try to upreplicate them.

Jira issue: CRDB-14689

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Apr 4, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 6, 2022
@mwang1026
Copy link

FYI @lidorcarmel @kvoli

@kvoli
Copy link
Collaborator

kvoli commented Apr 25, 2022

I'm adding in a roachperf benchmark for this issue - ref #79940 and #80383.

We currently take priority into account, with actions associated with up replication reasonable high up.

https://github.com/kvoli/cockroach/blob/4380161366180cbab17c1d4c70e2951f08e179fa/pkg/kv/kvserver/allocator.go#L147-L181

This affects the next replica to process in the base queue:

https://github.com/kvoli/cockroach/blob/4380161366180cbab17c1d4c70e2951f08e179fa/pkg/kv/kvserver/queue.go#L1219-L1222

Do you think the issue is cadence of the replicate queue? i.e. it's blocking single consumer where it may take a while to actually perform each action.

Are we processing replicas faster than queueing here? It may be that we are limited by the processing rate during up-replication. I'll look further into the queue length on the benchmark.

@lidorcarmel
Copy link
Contributor

related #79453

@erikgrinaker
Copy link
Contributor Author

Thanks for looking into this @kvoli. I'm not very familiar with the details here, but I think the priority only applies to replicas that are already added to the queue. However, replicas are only added to the queue every 10 minutes, either in random order or ordered by range ID:

for _, q := range rs.queues {
q.MaybeAddAsync(ctx, repl, rs.clock.NowAsClockTimestamp())
}

rs.replicas.Visit(func(repl *Replica) bool {
count++
shouldStop = rs.waitAndProcess(ctx, start, repl)
return !shouldStop
})

So I suppose the priority would only come into play when there is a queue backlog, and then only for the replicas that are in the backlog. So I think it could take up to 10 minutes even with those priorities?

@kvoli
Copy link
Collaborator

kvoli commented May 13, 2022

So I suppose the priority would only come into play when there is a queue backlog, and then only for the replicas that are in the backlog. So I think it could take up to 10 minutes even with those priorities?

That seems right. I believe this issue is also faced in decommissioning - as @lidorcarmel linked above.

Aayush's solution to retry replicas #81005 seems like a promising direction. Even then, the 10 minutes is still an issue.

Is there any reason why we couldn't "push" up-replication leaseholders into the queue? Given 10 minutes is a long tail for a worst case.

I can see where this pattern might devolve into multiple scanner cadences with different objectives if we don't have an event to trigger enqueuing, however at the moment the indiscriminate store scanner seems too blunt an instrument for different timeliness requirements.

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

No branches or pull requests

5 participants