Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: replicas do not acquire a lease and quiesce after restart #83444

Closed
kvoli opened this issue Jun 27, 2022 · 6 comments · Fixed by #83731
Closed

kvserver: replicas do not acquire a lease and quiesce after restart #83444

kvoli opened this issue Jun 27, 2022 · 6 comments · Fixed by #83731
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@kvoli
Copy link
Collaborator

kvoli commented Jun 27, 2022

Describe the problem

After restarting, the majority of replicas do not acquire a lease and remain ticking rather than quiescent. This is despite no foreground traffic being present that would tick the ranges.

The number of leaseholders still steadily increases, however at a rate of 0.02 per second, which in a cluster with a large number of ranges will take multiple days.

To Reproduce

# Note that this partially reproduces the issue, where it stalls between 1-20k. 
roachprod create $email -n 1 --gce-machine-type=n1-standard-4
roachprod stage $email release v22.1.1
roachprod start $email
roachprod run $email:1 -- 'for x in {1..48000}; do echo "create table t$x(id uuid primary key);" >> tables; done'
roachprod run $email:1 -- './cockroach sql --file tables --insecure'
sleep 600
roachprod stop $email
sleep 600
roachprod start $email

Expected behavior
Ranges should quiesce after a restart, meaning they acquire their lease.

Environment:

  • CockroachDB version v22.1.1
  • Server OS: Ubuntu v20.04.4 LTS

Additional context

50-70% Idle CPU usage, mostly spent in ticking ranges.

image

Jira issue: CRDB-17073

Jira issue: CRDB-19917

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Jun 27, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jun 27, 2022

cc @cockroachdb/replication

@kvoli
Copy link
Collaborator Author

kvoli commented Jun 27, 2022

This repro is not exactly what we've seen in the wild.

It does stall, however it reaches a higher number of leases before grinding to a glacial pace:
image
image

It will still take multiple days for the replicas to all acquire a lease in this repro, we stall with 13k remaining.

The events where we have seen this occur with a much lower % of leases acquired and then stall were in (1) ranges with lots of data (not the case here) and (2) cpu constrained environments (1 vcpu for 20k tables).

@aayushshah15
Copy link
Contributor

We discussed this in the KV weekly meeting and decided that we should first understand how it is that a fraction of the ranges are acquiring leases but the other fraction aren't.

My intuition here is that its one of the queues deciding to process some replicas but not the others. In particular, I think its something like the consistency checker queue that only processes ranges once every 24 hours, so it won't reprocess the fraction of replicas that have already been checked in the last 24 hours.

@kvoli one way to validate or invalidate that theory would be to disable the consistency checker queue in your reproduction and see if none or very few of the ranges end up getting leases. If so, we'll know this is all copacetic and maybe we can consider one of the approaches we discussed in the team meeting.

@kvoli kvoli self-assigned this Jun 28, 2022
@kvoli
Copy link
Collaborator Author

kvoli commented Jun 29, 2022

Further Investigations

Running some more testing. With the consistency queue enabled and disabled, we see that the initial rise of leaseholders and quiescence is not in large part contributed to by the consistency queue. However after 70 minutes this ceases, and the cluster with the consistency queue enabled continues to add leaseholders at a slow pace whilst the cluster without adds 0.

yellow = rate(raft log queue successes)
red = rate(leaseholders)
blue = rate(consistency queue successes)

With Consistency Queue

image

Without Consistency Queue

image

The initial increase, before stalling is correlated to the raft log queue processing successes. The cluster with the consistency queue enabled has a slightly higher rate of leases added, which can be attributed to the queue processing.

When looking at the logs, the last SQL stats job completed at 14:39, which corresponds to when the rate of lease acquisition went to 0 for the cluster with no consistency queue.

I220629 14:39:58.816215 5628854 jobs/registry.go:1134 ⋮ [n1] 85890  AUTO CREATE STATS job 774777073713840129: stepping through state succeeded with error: <nil>

image

It seems apparent then that only the SQL stats job and consistency queue are forcing lease acquisition and allowing replicas to quiesce, when there is no foreground activity on those ranges.

Problem

So from the above, it appears that cockroach has no mechanism specifically to quiesce replicas following a restart. Rather, we rely on foreground traffic, sql stats collection and queue processing to force lease acquisition, which enables quiescing when there's no activity following.

We should instrument a method for ranges to quiesce after a restart that is independent and targets this specific problem.

@aayushshah15 had two possible solutions, at a high level: forcing the lease acquisition on the raft leader, to enable quiescing or allow quiescing without needing a leaseholder for the range.

@petermattis
Copy link
Collaborator

petermattis commented Jun 29, 2022

I recall in the distant past that the replicate queue acquired the lease (if a range didn't hold the lease) in order to determine whether rebalancing/up-replication was required. Does that no longer take place?

@kvoli
Copy link
Collaborator Author

kvoli commented Jun 29, 2022

I recall in the distant past that the replicate queue acquired the lease (if a range didn't hold the lease) in order to determine whether rebalancing/up-replication was required.

Replicas without leases will currently not be added to the replicate_queue by the replicaScanner. Only manually enqueuing will add them. So I don't think these problem replicas are even being added to begin with.

It will when processing, however it requires them to have a computed action for that range (store imbalance, decom, upreplication etc)

Does that no longer take place?

In the process loop it still does try and acquire a lease.

refs

Acquire a lease if possible when processing a replica:

if _, pErr := repl.redirectOnOrAcquireLease(ctx); pErr != nil {

Manually enqueuing seems to also try and acquire if possible:

hasLease, pErr := repl.getLeaseForGossip(ctx)

kvoli added a commit to kvoli/cockroach that referenced this issue Jul 1, 2022
This patch adds a synchronous call to acquire a lease when a replica is
being checked for enqueing into the replication queue, if it satisfies
necessary conditions. The necessary conditions are that it is currently
the raft leader and that the lease status is expired.

This ensures that following a node restart, a replicas with a valid
lease will be instated within the replica scanner interval.

resolves cockroachdb#83444

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Jul 8, 2022
This patch adds a check that will  enqueue replicas that do not have a
valid lease into the replicate queue. The replicate and base queue already
will attempt to acquire the lease when processing these replicas.

This ensures that following a node restart, a replica with a valid lease
will be installed within the replica scanner interval, for all ranges.

resolves cockroachdb#83444

Release note: None
craig bot pushed a commit that referenced this issue Jul 8, 2022
83597: Colocate auth logging with auth metric for consistency r=rafiss a=ecwall

refs #83224

Release note (bug fix): Move connection OK log and metric to same location
after auth completes for consistency. This resolves an inconsistency 
(see linked isssue) in the DB console where the log and metric did not match.


83731: kvserver: acquire replica lease on queue check r=nvanbenschoten a=kvoli

This patch adds a check within the replication for when a replica is the
raft leader and does not have a valid lease. The necessary conditions
are that it is currently the raft leader and that the lease status is
expired.

This ensures that following a node restart, a replicas with a valid
lease will be installed within the replica scanner interval.

**single nodes 10k ranges with change**

![image](https://user-images.githubusercontent.com/39606633/176971656-317c38d3-7103-47a0-a18a-d9f29c49baa5.png)

**5 node, 3k ranges**

*without change*
![image](https://user-images.githubusercontent.com/39606633/177620933-56cfe528-c45c-429f-a4d9-9d3ba90fe8e1.png)

*with change*
![image](https://user-images.githubusercontent.com/39606633/177621186-ee467043-47d5-4279-bb69-5478e7ad445a.png)



resolves #83444

Release note: None

84044: ui: option to search exact statement on SQL Activity r=maryliag a=maryliag

Previously, when doing a search on SQL Activity page,
it was returning all statements that contained all terms
from the search, but not necessarily on the same order.
This commit adds an option when you wrap the search in quotes
it will only return results with the exact match in order.

https://www.loom.com/share/442c6eaee84b4c71a1acdef0b63b74bf

Release note (ui change): Ability to search for the exact terms
in order when wrapping the search in quotes.

84047: sql: remove unused error return value in a method of connExecutor r=yuzefovich a=yuzefovich

Found while looking into #83935.

Release note: None

84082: roachtest: skip multitenant/fairness r=cucaroach a=cucaroach

Informs: #83994

Release note: None


84085: roachtest: fix zipping of artifacts to include other zips r=srosenberg a=renatolabs

When artifacts are zipped in preparation for being published to
TeamCity, other zip files are skipped. The idea is that we won't try
to recursively zip artifacts.zip itself, or debug.zip, which is
published separately. However, some tests (notably, `tpchvec`)
download their own zip files in the `logs` directory so that they'll
be available for analysis when a test fails.

While there was an intention to skip only top-level zip files (as
indicated by existing comments), the code itself would skip any zip
files found in the artifacts directory. This commit updates the zipping
logic to skip only toplevel zip files, allowing tests to write their
own zip files to the `logs` directory and have them available for
inspection later.

Release note: None.

Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
@craig craig bot closed this as completed in ad0a2dd Jul 8, 2022
nvanbenschoten pushed a commit to nvanbenschoten/cockroach that referenced this issue Jul 13, 2022
This patch adds a check that will  enqueue replicas that do not have a
valid lease into the replicate queue. The replicate and base queue already
will attempt to acquire the lease when processing these replicas.

This ensures that following a node restart, a replica with a valid lease
will be installed within the replica scanner interval, for all ranges.

resolves cockroachdb#83444

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2022
This patch adds a check that will  enqueue replicas that do not have a
valid lease into the replicate queue. The replicate and base queue already
will attempt to acquire the lease when processing these replicas.

This ensures that following a node restart, a replica with a valid lease
will be installed within the replica scanner interval, for all ranges.

resolves cockroachdb#83444

Release note: None
kvoli added a commit to kvoli/cockroach that referenced this issue Oct 18, 2022
This patch adds a check that will  enqueue replicas that do not have a
valid lease into the replicate queue. The replicate and base queue already
will attempt to acquire the lease when processing these replicas.

This ensures that following a node restart, a replica with a valid lease
will be installed within the replica scanner interval, for all ranges.

resolves cockroachdb#83444

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants