kvserver: replicas do not acquire a lease and quiesce after restart #83444

kvoli · 2022-06-27T18:33:19Z

Describe the problem

After restarting, the majority of replicas do not acquire a lease and remain ticking rather than quiescent. This is despite no foreground traffic being present that would tick the ranges.

The number of leaseholders still steadily increases, however at a rate of 0.02 per second, which in a cluster with a large number of ranges will take multiple days.

To Reproduce

# Note that this partially reproduces the issue, where it stalls between 1-20k. 
roachprod create $email -n 1 --gce-machine-type=n1-standard-4
roachprod stage $email release v22.1.1
roachprod start $email
roachprod run $email:1 -- 'for x in {1..48000}; do echo "create table t$x(id uuid primary key);" >> tables; done'
roachprod run $email:1 -- './cockroach sql --file tables --insecure'
sleep 600
roachprod stop $email
sleep 600
roachprod start $email

Expected behavior
Ranges should quiesce after a restart, meaning they acquire their lease.

Environment:

CockroachDB version v22.1.1
Server OS: Ubuntu v20.04.4 LTS

Additional context

50-70% Idle CPU usage, mostly spent in ticking ranges.

Jira issue: CRDB-17073

Jira issue: CRDB-19917

blathers-crl · 2022-06-27T18:33:24Z

cc @cockroachdb/replication

kvoli · 2022-06-27T20:35:57Z

This repro is not exactly what we've seen in the wild.

It does stall, however it reaches a higher number of leases before grinding to a glacial pace:

It will still take multiple days for the replicas to all acquire a lease in this repro, we stall with 13k remaining.

The events where we have seen this occur with a much lower % of leases acquired and then stall were in (1) ranges with lots of data (not the case here) and (2) cpu constrained environments (1 vcpu for 20k tables).

aayushshah15 · 2022-06-28T21:03:31Z

We discussed this in the KV weekly meeting and decided that we should first understand how it is that a fraction of the ranges are acquiring leases but the other fraction aren't.

My intuition here is that its one of the queues deciding to process some replicas but not the others. In particular, I think its something like the consistency checker queue that only processes ranges once every 24 hours, so it won't reprocess the fraction of replicas that have already been checked in the last 24 hours.

@kvoli one way to validate or invalidate that theory would be to disable the consistency checker queue in your reproduction and see if none or very few of the ranges end up getting leases. If so, we'll know this is all copacetic and maybe we can consider one of the approaches we discussed in the team meeting.

kvoli · 2022-06-29T17:55:32Z

Further Investigations

Running some more testing. With the consistency queue enabled and disabled, we see that the initial rise of leaseholders and quiescence is not in large part contributed to by the consistency queue. However after 70 minutes this ceases, and the cluster with the consistency queue enabled continues to add leaseholders at a slow pace whilst the cluster without adds 0.

yellow = rate(raft log queue successes)
red = rate(leaseholders)
blue = rate(consistency queue successes)

With Consistency Queue

Without Consistency Queue

The initial increase, before stalling is correlated to the raft log queue processing successes. The cluster with the consistency queue enabled has a slightly higher rate of leases added, which can be attributed to the queue processing.

When looking at the logs, the last SQL stats job completed at 14:39, which corresponds to when the rate of lease acquisition went to 0 for the cluster with no consistency queue.

I220629 14:39:58.816215 5628854 jobs/registry.go:1134 ⋮ [n1] 85890  AUTO CREATE STATS job 774777073713840129: stepping through state succeeded with error: <nil>

It seems apparent then that only the SQL stats job and consistency queue are forcing lease acquisition and allowing replicas to quiesce, when there is no foreground activity on those ranges.

Problem

So from the above, it appears that cockroach has no mechanism specifically to quiesce replicas following a restart. Rather, we rely on foreground traffic, sql stats collection and queue processing to force lease acquisition, which enables quiescing when there's no activity following.

We should instrument a method for ranges to quiesce after a restart that is independent and targets this specific problem.

@aayushshah15 had two possible solutions, at a high level: forcing the lease acquisition on the raft leader, to enable quiescing or allow quiescing without needing a leaseholder for the range.

petermattis · 2022-06-29T18:10:24Z

I recall in the distant past that the replicate queue acquired the lease (if a range didn't hold the lease) in order to determine whether rebalancing/up-replication was required. Does that no longer take place?

kvoli · 2022-06-29T19:46:47Z

I recall in the distant past that the replicate queue acquired the lease (if a range didn't hold the lease) in order to determine whether rebalancing/up-replication was required.

Replicas without leases will currently not be added to the replicate_queue by the replicaScanner. Only manually enqueuing will add them. So I don't think these problem replicas are even being added to begin with.

It will when processing, however it requires them to have a computed action for that range (store imbalance, decom, upreplication etc)

Does that no longer take place?

In the process loop it still does try and acquire a lease.

refs

Acquire a lease if possible when processing a replica:

cockroach/pkg/kv/kvserver/replicate_queue.go

Line 532 in c13a702

if _, pErr := repl.redirectOnOrAcquireLease(ctx); pErr != nil {

Manually enqueuing seems to also try and acquire if possible:

cockroach/pkg/kv/kvserver/store.go

Line 3376 in c13a702

hasLease, pErr := repl.getLeaseForGossip(ctx)

This patch adds a synchronous call to acquire a lease when a replica is being checked for enqueing into the replication queue, if it satisfies necessary conditions. The necessary conditions are that it is currently the raft leader and that the lease status is expired. This ensures that following a node restart, a replicas with a valid lease will be instated within the replica scanner interval. resolves cockroachdb#83444 Release note: None

This patch adds a check that will enqueue replicas that do not have a valid lease into the replicate queue. The replicate and base queue already will attempt to acquire the lease when processing these replicas. This ensures that following a node restart, a replica with a valid lease will be installed within the replica scanner interval, for all ranges. resolves cockroachdb#83444 Release note: None

83597: Colocate auth logging with auth metric for consistency r=rafiss a=ecwall refs #83224 Release note (bug fix): Move connection OK log and metric to same location after auth completes for consistency. This resolves an inconsistency (see linked isssue) in the DB console where the log and metric did not match. 83731: kvserver: acquire replica lease on queue check r=nvanbenschoten a=kvoli This patch adds a check within the replication for when a replica is the raft leader and does not have a valid lease. The necessary conditions are that it is currently the raft leader and that the lease status is expired. This ensures that following a node restart, a replicas with a valid lease will be installed within the replica scanner interval. **single nodes 10k ranges with change** ![image](https://user-images.githubusercontent.com/39606633/176971656-317c38d3-7103-47a0-a18a-d9f29c49baa5.png) **5 node, 3k ranges** *without change* ![image](https://user-images.githubusercontent.com/39606633/177620933-56cfe528-c45c-429f-a4d9-9d3ba90fe8e1.png) *with change* ![image](https://user-images.githubusercontent.com/39606633/177621186-ee467043-47d5-4279-bb69-5478e7ad445a.png) resolves #83444 Release note: None 84044: ui: option to search exact statement on SQL Activity r=maryliag a=maryliag Previously, when doing a search on SQL Activity page, it was returning all statements that contained all terms from the search, but not necessarily on the same order. This commit adds an option when you wrap the search in quotes it will only return results with the exact match in order. https://www.loom.com/share/442c6eaee84b4c71a1acdef0b63b74bf Release note (ui change): Ability to search for the exact terms in order when wrapping the search in quotes. 84047: sql: remove unused error return value in a method of connExecutor r=yuzefovich a=yuzefovich Found while looking into #83935. Release note: None 84082: roachtest: skip multitenant/fairness r=cucaroach a=cucaroach Informs: #83994 Release note: None 84085: roachtest: fix zipping of artifacts to include other zips r=srosenberg a=renatolabs When artifacts are zipped in preparation for being published to TeamCity, other zip files are skipped. The idea is that we won't try to recursively zip artifacts.zip itself, or debug.zip, which is published separately. However, some tests (notably, `tpchvec`) download their own zip files in the `logs` directory so that they'll be available for analysis when a test fails. While there was an intention to skip only top-level zip files (as indicated by existing comments), the code itself would skip any zip files found in the artifacts directory. This commit updates the zipping logic to skip only toplevel zip files, allowing tests to write their own zip files to the `logs` directory and have them available for inspection later. Release note: None. Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Renato Costa <[email protected]>

This patch adds a check that will enqueue replicas that do not have a valid lease into the replicate queue. The replicate and base queue already will attempt to acquire the lease when processing these replicas. This ensures that following a node restart, a replica with a valid lease will be installed within the replica scanner interval, for all ranges. resolves cockroachdb#83444 Release note: None

kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Jun 27, 2022

blathers-crl bot added T-kv KV Team T-kv-replication labels Jun 27, 2022

kvoli self-assigned this Jun 28, 2022

kvoli mentioned this issue Jul 1, 2022

kvserver: acquire replica lease on queue check #83731

Merged

craig bot closed this as completed in ad0a2dd Jul 8, 2022

exalate-issue-sync bot removed the T-kv-replication label Sep 30, 2022

This was referenced Oct 18, 2022

release-22.1: kvserver: acquire replica lease on queue check #90188

Merged

release-21.2: kvserver: acquire replica lease on queue check #90190

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: replicas do not acquire a lease and quiesce after restart #83444

kvserver: replicas do not acquire a lease and quiesce after restart #83444

kvoli commented Jun 27, 2022 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Jun 27, 2022

kvoli commented Jun 27, 2022

aayushshah15 commented Jun 28, 2022

kvoli commented Jun 29, 2022

petermattis commented Jun 29, 2022 •

edited

Loading

kvoli commented Jun 29, 2022 •

edited

Loading

kvserver: replicas do not acquire a lease and quiesce after restart #83444

kvserver: replicas do not acquire a lease and quiesce after restart #83444

Comments

kvoli commented Jun 27, 2022 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Jun 27, 2022

kvoli commented Jun 27, 2022

aayushshah15 commented Jun 28, 2022

kvoli commented Jun 29, 2022

Further Investigations

Problem

petermattis commented Jun 29, 2022 • edited Loading

kvoli commented Jun 29, 2022 • edited Loading

kvoli commented Jun 27, 2022 •

edited by cockroach-jira-scripts

Loading

petermattis commented Jun 29, 2022 •

edited

Loading

kvoli commented Jun 29, 2022 •

edited

Loading