Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency,kvserver: limited scans do optimistic latching #66059

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Jun 3, 2021

Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Numbers for the new OptimisticEval benchmark show improvement
when real-contention=false, compared to master which only had
optimistic locking. There is a modest slowdown in the
real-contention=true case since every optimistic read has to
try twice. The benchmark has concurrent writes of two different
kinds: latches represents 1PC writers, so no intents, while
latches-and-locks represent writers that will leave intents.

For the latter, when the optimistic read finds an intent during
evaluation it cannot necessarily add it as a discovered lock,
if there is also a conflicting latch (since it could be racing
with intent resolution). This case is rare in these benchmarks
(latches-and-locks/real-contention=true): 13% found an intent
and had a conflicting latch when num-writers=1 and < 1% for
the same case when num-writers=4. The remainder (the common
case) was to find a conflict when looking for conflicting
latches/locks using CheckOptimisticNoConflicts.

name                                                                     old time/op    new time/op    delta
OptimisticEvalForLocks/real-contention=false-16                            28.2µs ± 4%    28.5µs ± 5%     ~     (p=0.393 n=10+10)
OptimisticEvalForLocks/real-contention=true-16                             5.52ms ± 1%    5.52ms ± 1%     ~     (p=0.796 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16     220µs ±62%      89µs ±38%  -59.37%  (p=0.000 n=10+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16     213µs ±65%     155µs ±82%  -27.33%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16     1.33ms ± 3%    1.27ms ±16%     ~     (p=0.829 n=8+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16     2.02ms ±10%    2.25ms ± 9%  +11.31%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16               175µs ± 2%      45µs ± 5%  -74.05%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16               613µs ± 2%      44µs ± 3%  -92.74%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                181µs ± 4%     179µs ± 5%     ~     (p=0.315 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                624µs ± 3%     620µs ± 3%     ~     (p=0.247 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEvalForLocks/real-contention=false-16                            8.40kB ± 7%    8.33kB ± 3%     ~     (p=1.000 n=10+8)
OptimisticEvalForLocks/real-contention=true-16                             31.8kB ± 7%    32.6kB ± 9%     ~     (p=0.382 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16    23.9kB ±21%    17.8kB ±25%  -25.55%  (p=0.003 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16    24.1kB ±20%    19.4kB ±22%  -19.56%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16      104kB ± 1%     101kB ± 1%   -2.89%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      197kB ± 6%     217kB ±11%  +10.19%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16              29.9kB ± 0%    13.4kB ± 1%  -55.07%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16              94.9kB ± 0%    14.5kB ± 1%  -84.74%  (p=0.000 n=9+8)
OptimisticEval/latches/real-contention=true/num-writers=1-16               29.9kB ± 0%    31.3kB ± 0%   +4.59%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16               94.8kB ± 0%    96.2kB ± 0%   +1.48%  (p=0.000 n=10+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEvalForLocks/real-contention=false-16                              68.6 ± 1%      69.6 ± 2%   +1.52%  (p=0.005 n=9+10)
OptimisticEvalForLocks/real-contention=true-16                                271 ± 2%       272 ± 2%     ~     (p=0.336 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16       149 ±17%       117 ±18%  -21.00%  (p=0.002 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16       151 ±16%       126 ±18%  -16.31%  (p=0.013 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16        610 ± 0%       559 ± 1%   -8.32%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      1.12k ± 5%     1.19k ±12%     ~     (p=0.053 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16                 162 ± 0%        91 ± 0%  -43.83%  (p=0.001 n=8+9)
OptimisticEval/latches/real-contention=false/num-writers=4-16                 445 ± 0%        96 ± 0%  -78.44%  (p=0.000 n=9+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                  163 ± 0%       184 ± 0%  +13.07%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                  445 ± 0%       467 ± 0%   +4.94%  (p=0.000 n=10+10)

Fixes #9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I owe benchmarks and testing for spanlatch.Manager, but wanted a quick opinion first.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

	// Note that we are letting locking readers be considered for optimistic
	// evaluation. This is correct, though not necessarily beneficial.

I couldn't convince myself of any incorrectness because of this, though we could tighten this to exclude locking readers.
There may be a bug here or elsewhere, which I've not been able to reproduce when re-running kvserver tests multiple times: there was one case where a node failed with existing lock cannot be acquired by different transaction -- it was a writer since it came via a stack that included raft and replicaStateMachine.ApplySideEffects. So it is possible that something is allowing a locking reader to add locks concurrently, which shouldn't happen since a locking reader declares its spans as SpanReadWrite.


pkg/kv/kvserver/replica_test.go, line 2885 at r1 (raw file):

// evaluate optimistically without waiting for latches to be acquired. In some
// cases, this allows them to avoid waiting on writes that their
// over-estimated declared spans overlapped with.

I lifted this from your PR and just tweaked with some additional comments.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I owe benchmarks and testing for spanlatch.Manager, but wanted a quick opinion first.

This all looks great! I'll close #33373, since it's no longer needed.

Reviewed 8 of 9 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Previously, sumeerbhola wrote…

I couldn't convince myself of any incorrectness because of this, though we could tighten this to exclude locking readers.
There may be a bug here or elsewhere, which I've not been able to reproduce when re-running kvserver tests multiple times: there was one case where a node failed with existing lock cannot be acquired by different transaction -- it was a writer since it came via a stack that included raft and replicaStateMachine.ApplySideEffects. So it is possible that something is allowing a locking reader to add locks concurrently, which shouldn't happen since a locking reader declares its spans as SpanReadWrite.

This is interesting. I also don't see why this wouldn't be safe. We check for latch conflicts before calling handleReadOnlyLocalEvalResult on the read path, so we should detect conflicting writes that are ahead of the locking read in terms of latchManager sequencing. And we are holding write latches across the entire read's span, so the writer shouldn't be able to slip in from behind. Which test was it that you saw the failure on?


pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

	AcquireOptimistic(req Request) latchGuard

	// CheckOptimisticNoConflicts returns true iff the spans in the provided

nit: there's some asymetry here between the latchManager and the lockTable. Specifically, the latchManger has a CheckOptimisticNoConflicts method that accepts a latchGuard, but the lockTableGuard itself has a CheckOptimisticNoConflicts method.

Did you consider adding the CheckOptimisticNoConflicts and WaitUntilAcquired methods on to the latchGuard interface? Is that awkward because we have spanlatch.Guard implement the interface directly?


pkg/kv/kvserver/concurrency/concurrency_manager.go, line 146 at r2 (raw file):

		}
		g = newGuard(req)
		g.lm = m.lm

Would it make more sense to assign this at the same place that we assign the latchGuard?


pkg/kv/kvserver/concurrency/concurrency_manager.go, line 195 at r2 (raw file):

	firstIteration := true
	for {
		if !firstIteration {

Should we pull this assertion into the else branch of the following conditional block?


pkg/kv/kvserver/spanlatch/manager.go, line 289 at r2 (raw file):

func overlaps(it *iterator, search *latch, ignore ignoreFn) bool {
	for it.FirstOverlap(search); it.Valid(); it.NextOverlap(search) {
		held := it.Cur()

Make a note here that the held latch may have already been signalled, but that still doesn't mean that we can ignore it, because that doesn't prove that our request observed the result of the conflicting latch holder.


pkg/kv/kvserver/spanlatch/manager.go, line 290 at r2 (raw file):

	for it.FirstOverlap(search); it.Valid(); it.NextOverlap(search) {
		held := it.Cur()
		if ignore(search.ts, held.ts) {

minor nit: if !ignore(...) { return true } is a little easier to read.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is interesting. I also don't see why this wouldn't be safe. We check for latch conflicts before calling handleReadOnlyLocalEvalResult on the read path, so we should detect conflicting writes that are ahead of the locking read in terms of latchManager sequencing. And we are holding write latches across the entire read's span, so the writer shouldn't be able to slip in from behind. Which test was it that you saw the failure on?

I can reproduce this on TestLargeUnsplittableRangeReplicate. Excluding locking readers still causes the failure. This looks like a range descriptor lock. Looks like the admin commands like AdminSplitRequest only specify isAdmin and not isLocking or isWrite. I can't quite tell what keys are specified in the RequestHeader for such commands, or where they are executed (I couldn't find cmd_.*.go files for them) -- where should I look? Though I can reproduce this even after excluding isAdmin commands from optimistic execution. It's possible there is just a bug somewhere, say in the spanlatch/manager.go changes and not a logic issue -- but let me know if something obvious leaps out at you.

F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580  existing lock at /Meta2/Table/18 held by id=fc0ce145 key=/Local/Range/Table/17/RangeDescriptor pri=0.01707394 epo=0 ts=1623172221.959241000,0 min=1623172221.959241000,0 seq=7 cannot be acquired by different transaction id=a2b9108e key=/Local/Range/Table/17/RangeDescriptor pri=0.00223317 epo=0 ts=1623172221.969124000,0 min=1623172221.969124000,0 seq=7
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !goroutine 951792 [running]:
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafaa100, 0x0, 0x10000012a, 0xffffffff00203001)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00a03a560, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1686aa8933f28218, 0x400000000, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8ef2920, 0xc009464cf0, 0x1, 0x4, 0x8193f2a, 0x2, 0xc003918828, 0x1, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc009617770, 0x8ef2920, 0xc009464cf0, 0xc00a3b3c20)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:397 +0xf9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc0013db500, 0x8ef2920, 0xc009464cf0, 0x0, 0x0, 0x0, 0x0, 0xc00a3b3c20, 0x1, 0x1, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0013db5d8, 0x8f47240, 0xc0028a9008, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x357969b8, 0xc0013db848, 0xc003919350, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc003919898, 0x8ef2920, 0xc006b68030, 0x8f39980, 0xc0013db7e8, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc003919898, 0x8ef2920, 0xc006b68030, 0x1, 0x82178fa)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc005405000, 0x8ef2920, 0xc003b3e090, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc009bbd2c0, 0x8ef2920, 0xc003b3e090)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00756c900, 0x8ef2920, 0xc003b3e090, 0x0, 0xc00781c2f0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!
I'll start working on more tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: there's some asymetry here between the latchManager and the lockTable. Specifically, the latchManger has a CheckOptimisticNoConflicts method that accepts a latchGuard, but the lockTableGuard itself has a CheckOptimisticNoConflicts method.

Did you consider adding the CheckOptimisticNoConflicts and WaitUntilAcquired methods on to the latchGuard interface? Is that awkward because we have spanlatch.Guard implement the interface directly?

Yes, exactly. It didn't seem worth the fiddliness with an additional struct. It's not even clear that we need these methods to be public (same for lockTable) since they are only called from within the package. Can you remind me why the tests use concurrency_test as their package name?


pkg/kv/kvserver/concurrency/concurrency_manager.go, line 146 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Would it make more sense to assign this at the same place that we assign the latchGuard?

Done


pkg/kv/kvserver/concurrency/concurrency_manager.go, line 195 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we pull this assertion into the else branch of the following conditional block?

Done


pkg/kv/kvserver/spanlatch/manager.go, line 289 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Make a note here that the held latch may have already been signalled, but that still doesn't mean that we can ignore it, because that doesn't prove that our request observed the result of the conflicting latch holder.

Done


pkg/kv/kvserver/spanlatch/manager.go, line 290 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

minor nit: if !ignore(...) { return true } is a little easier to read.

Done

@sumeerbhola sumeerbhola force-pushed the opt_latch branch 2 times, most recently from b183428 to 1357016 Compare June 8, 2021 22:45
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and a microbenchmark.

I did not investigate the reason for the small speedup in OptimisticEvalForLatches/real-contention=true.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and a microbenchmark.

These microbenchmark numbers are very encouraging. Thanks for gathering them.

Reviewed 4 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/client_replica_test.go, line 4025 at r3 (raw file):

							for {
								txn := db.NewTxn(ctx, "locking txn")
								_, err = txn.ScanForUpdate(ctx, lockStart, "c", 0)

I'm curious why you're using a locking read instead of a 1PC write. Is it so that a lock gets added to the lock table?

I ask because conflicting with a write's latches is likely more detrimental than conflicting with a locking read's latches, because writes need to go through consensus.


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Previously, sumeerbhola wrote…

I can reproduce this on TestLargeUnsplittableRangeReplicate. Excluding locking readers still causes the failure. This looks like a range descriptor lock. Looks like the admin commands like AdminSplitRequest only specify isAdmin and not isLocking or isWrite. I can't quite tell what keys are specified in the RequestHeader for such commands, or where they are executed (I couldn't find cmd_.*.go files for them) -- where should I look? Though I can reproduce this even after excluding isAdmin commands from optimistic execution. It's possible there is just a bug somewhere, say in the spanlatch/manager.go changes and not a logic issue -- but let me know if something obvious leaps out at you.

F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580  existing lock at /Meta2/Table/18 held by id=fc0ce145 key=/Local/Range/Table/17/RangeDescriptor pri=0.01707394 epo=0 ts=1623172221.959241000,0 min=1623172221.959241000,0 seq=7 cannot be acquired by different transaction id=a2b9108e key=/Local/Range/Table/17/RangeDescriptor pri=0.00223317 epo=0 ts=1623172221.969124000,0 min=1623172221.969124000,0 seq=7
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !goroutine 951792 [running]:
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafaa100, 0x0, 0x10000012a, 0xffffffff00203001)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00a03a560, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1686aa8933f28218, 0x400000000, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8ef2920, 0xc009464cf0, 0x1, 0x4, 0x8193f2a, 0x2, 0xc003918828, 0x1, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc009617770, 0x8ef2920, 0xc009464cf0, 0xc00a3b3c20)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:397 +0xf9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc0013db500, 0x8ef2920, 0xc009464cf0, 0x0, 0x0, 0x0, 0x0, 0xc00a3b3c20, 0x1, 0x1, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0013db5d8, 0x8f47240, 0xc0028a9008, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x357969b8, 0xc0013db848, 0xc003919350, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc003919898, 0x8ef2920, 0xc006b68030, 0x8f39980, 0xc0013db7e8, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc003919898, 0x8ef2920, 0xc006b68030, 0x1, 0x82178fa)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc005405000, 0x8ef2920, 0xc003b3e090, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc009bbd2c0, 0x8ef2920, 0xc003b3e090)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00756c900, 0x8ef2920, 0xc003b3e090, 0x0, 0xc00781c2f0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397  [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc

AdminSplitRequest shouldn't directly touch this key though. Admin requests delegate to standard transactions, which run Puts, Gets, and an EndTxn with a special commit trigger. Based on the lock's key and the key of the lock holder transaction, I wonder if the existing lock holder is a Merge transaction that just performed a locking Get.

Were you able to reproduce this when you disable optimistic evaluation for locking reads?


pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

Can you remind me why the tests use concurrency_test as their package name?

The idea with these _test packages is that they force you to test the external interface of a package. But you aren't even calling these methods from outside of this package, are you? Aren't you calling (*concurrency.Guard).CheckOptimisticNoConflicts?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Were you able to reproduce this when you disable optimistic evaluation for locking reads?

I can reproduce even when disallowing both locking reads and admin commands by
considerOptEval := ba.IsReadOnly() && !ba.IsLocking() && !ba.IsAdmin() && ba.Header.MaxSpanRequestKeys > 0 && optimisticEvalLimitedScans.Get(&r.ClusterSettings().SV)

The key keeps shifting,

existing lock at /Meta2/Table/41 held by id=a3667c84 key=/Local/Range/Table/40/RangeDescriptor pri=0.03801826 epo=0 ts=1623700863.597718000,0 min=1623700863.597718000,0 seq=7 cannot be acquired by different transaction id=b15c8cd1 key=/Local/Range/Table/40/RangeDescriptor pri=0.01175227 epo=0 ts=1623700863.614196000,0
existing lock at /Meta2/System/tsd held by id=50b55b3f key=/Local/Range/System/NodeLivenessMax/RangeDescriptor pri=0.00108075 epo=0 ts=1623699918.758514000,1 min=1623699918.757972000,0 seq=7 cannot be acquired by different transaction id=efa59995 key=/Local/Range/System/NodeLivenessMax/RangeDescriptor pri=0.00103256 epo=0 ts=1623699918.768336000,0 min=1623699918.768336000,0 seq=7

I printed the stack trace for each txn being created and its ID to see where the conflicting txns originated. These look like change-replicas. Can you shed some light on the latching/locking that these do?

Lock holder txn:
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649  txn 1eb0e3d6, created by goroutine 3966204 [running]:
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +runtime/debug.Stack(0x0, 0xc00b49e664, 0x24)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/usr/local/opt/[email protected]/libexec/src/runtime/debug/stack.go:24 +0x9f
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv.NewTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc00bf04200, 0x1, 0xc007182bf0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:131 +0x289
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc00bf04200, 0x8f1d0a0, 0xc00b9f8c60, 0xc00b49e950, 0x4, 0x4)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:778 +0x87
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.execChangeReplicasTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc00b49eac4, 0x1, 0x1, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2093 +0x265
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).initializeRaftLearners(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182ba8, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1595 +0x425
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).changeReplicasImpl(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182b70, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1041 +0xaed
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).changeReplicas(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc007182b70, 0x1, 0x1, 0xc003ada5b0, 0x1, 0x81f1aca, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:1373 +0xdd
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).addOrReplaceVoters(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00c5b7da0, 0x3, 0x4, 0x0, 0x0, 0x0, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:644 +0x874
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChange(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00b49f6c8, 0x81d2200, 0xd, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:401 +0x736
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).process(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc0083ae550, 0x81d2249, 0xd, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:313 +0x1d4
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x8f1d0a0, 0xc00b9f8c60, 0xdf8475800, 0x8f1d0a0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:964 +0x2ac
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x8f1d0a0, 0xc00b9f8c60, 0xc00b998420, 0x21, 0xdf8475800, 0xc00b49f9b0, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:140 +0x9e
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc00c1f3900, 0x8f1d0e0, 0xc00c8226c0, 0x8f905a0, 0xc00d3cea80, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:923 +0x2b3
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1.2(0x8f1d0e0, 0xc00c822690)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1182 +0x6c
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc00e2e8300, 0x8f1d0e0, 0xc00c822690, 0xc00b9983f0, 0x2f, 0xc00b49fba0, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:309 +0xb2
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1(0xc00c1f3900, 0x8f1d0e0, 0xc00e7f0510, 0xc00e2e8300)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1179 +0x5aa
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2(0x8f1d0e0, 0xc00e7f0510)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1188 +0x4cf
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f0510, 0x0, 0xc002c04860)
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
I210614 20:15:09.978401 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.978695 3966204 kv/kvserver/replica_command.go:2112  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5650  change replicas (add [(n5,s5):4LEARNER] remove []): existing descriptor r8:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=4]
Txn that tries to acquire over the existing lock:
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677  txn 8d255e03, created by goroutine 3966204 [running]:
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +runtime/debug.Stack(0x0, 0xc00b49e7e4, 0x24)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/usr/local/opt/[email protected]/libexec/src/runtime/debug/stack.go:24 +0x9f
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv.NewTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc00bf04200, 0x1, 0xc007183b30)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:131 +0x289
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc00bf04200, 0x8f1d0a0, 0xc00b9f8c60, 0xc00b49ead0, 0x4, 0x4)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:778 +0x87
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.execChangeReplicasTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc003ada930, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007183af4, 0x1, 0x1, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2093 +0x265
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).execReplicationChangesForVoters(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada930, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182ba8, 0x1, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1751 +0x44f
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).changeReplicasImpl(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182b70, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1054 +0x705
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).changeReplicas(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc007182b70, 0x1, 0x1, 0xc003ada5b0, 0x1, 0x81f1aca, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:1373 +0xdd
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).addOrReplaceVoters(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00c5b7da0, 0x3, 0x4, 0x0, 0x0, 0x0, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:644 +0x874
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChange(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00b49f6c8, 0x81d2200, 0xd, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:401 +0x736
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).process(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc0083ae550, 0x81d2249, 0xd, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:313 +0x1d4
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x8f1d0a0, 0xc00b9f8c60, 0xdf8475800, 0x8f1d0a0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:964 +0x2ac
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x8f1d0a0, 0xc00b9f8c60, 0xc00b998420, 0x21, 0xdf8475800, 0xc00b49f9b0, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:140 +0x9e
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc00c1f3900, 0x8f1d0e0, 0xc00c8226c0, 0x8f905a0, 0xc00d3cea80, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:923 +0x2b3
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1.2(0x8f1d0e0, 0xc00c822690)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1182 +0x6c
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc00e2e8300, 0x8f1d0e0, 0xc00c822690, 0xc00b9983f0, 0x2f, 0xc00b49fba0, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:309 +0xb2
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1(0xc00c1f3900, 0x8f1d0e0, 0xc00e7f0510, 0xc00e2e8300)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1179 +0x5aa
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2(0x8f1d0e0, 0xc00e7f0510)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1188 +0x4cf
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f0510, 0x0, 0xc002c04860)
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.988687 3966204 kv/txn.go:131  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.988995 3966204 kv/kvserver/replica_command.go:2112  [n1,replicate,s1,r8/1:/Table/1{2-3}] 5678  change replicas (add [(n5,s5):4] remove []): existing descriptor r8:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n5,s5):4LEARNER, next=5, gen=5]
Node failure when trying to acquire an already held lock:
I210614 20:15:09.992247 3965597 1@kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5679  the server is terminating due to a fatal error (see the DEV channel for details)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680  existing lock at /Meta2/Table/13 held by id=1eb0e3d6 key=/Local/Range/Table/12/RangeDescriptor pri=0.00527678 epo=0 ts=1623701709.978252000,0 min=1623701709.978252000,0 seq=7 cannot be acquired by different transaction id=8d255e03 key=/Local/Range/Table/12/RangeDescriptor pri=0.01443145 epo=0 ts=1623701709.988545000,0 min=1623701709.988545000,0 seq=7
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !goroutine 3965597 [running]:
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafe6a00, 0xe6a69e, 0x1, 0xffffffff00203004)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc0074c4620, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x16888c1a3d498ec8, 0x400000000, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8f1d0e0, 0xc00ca80300, 0x1, 0x4, 0x81b910a, 0x2, 0xc0043d4828, 0x1, 0x1)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc00d2b7450, 0x8f1d0e0, 0xc00ca80300, 0xc00b3d19e0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:417 +0xf9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc006804000, 0x8f1d0e0, 0xc00ca80300, 0x0, 0x0, 0x0, 0x0, 0xc00b3d19e0, 0x1, 0x1, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0068040d8, 0x8f71ce0, 0xc004ed1008, 0x0, 0x0, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x3458aab8, 0xc006804348, 0xc0043d5350, 0x0, 0x0, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc0043d5898, 0x8f1d0e0, 0xc00a6368a0, 0x8f642a0, 0xc0068042e8, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc0043d5898, 0x8f1d0e0, 0xc00a6368a0, 0x1, 0x823ccbe)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc006804000, 0x8f1d0e0, 0xc00a6368a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc006804000, 0x8f1d0e0, 0xc00a6368a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc00c240700, 0x8f1d0e0, 0xc00e7f08a0, 0x1)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc008fda820, 0x8f1d0e0, 0xc00e7f08a0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f08a0, 0x0, 0xc00dddc0e0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417  [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !	/Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc

pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can you remind me why the tests use concurrency_test as their package name?

The idea with these _test packages is that they force you to test the external interface of a package. But you aren't even calling these methods from outside of this package, are you? Aren't you calling (*concurrency.Guard).CheckOptimisticNoConflicts?

Do you care whether these are exported from the package or not? As you partially said, we never call any of the latchManager methods directly from the tests, and same for lockTable since the interface names themselves are not exported.
(Though I'm inclined not to do this trivial refactor in this PR)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Can you shed some light on the latching/locking that these do?

execChangeReplicasTxn doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues. It looks like the write to the meta key is happening through a PutRequest here - notice the call to updateRangeAddressing before.

One possibility is that ba.IsReadOnly() isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()?


pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

Previously, sumeerbhola wrote…

Do you care whether these are exported from the package or not? As you partially said, we never call any of the latchManager methods directly from the tests, and same for lockTable since the interface names themselves are not exported.
(Though I'm inclined not to do this trivial refactor in this PR)

I don't think these need to be exported. The only other reason for that is if we were implementing the interface in a separate package, like we do with the latchManager.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

execChangeReplicasTxn doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues.

yep, I meant that and was not being clear in stating my question.

One possibility is that ba.IsReadOnly() isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()?

It still fails in the same way.

It turns out to be a silly oversight (which took much longer to find than it should have) -- an optimistically evaluating request that discovers a lock and has a conflicting latch cannot add that discovered lock since that conflicting latch may be due to intent resolution that it is racing with. If it adds that discovered lock we've added something that is already resolved to the lock table.

Also, I'll narrow this down to IsReadOnly && isAllTransactional but I need some help understanding and writing the code comment justifying the isAllTransactional part. Is there something I can read to understand what is going on with these flags. For instance, I see SubsumeRequest has flags isRead | isAlone | updatesTSCache but declareKeysSubsume says "non-MVCC write over every addressable key in the range" -- why doesn't the flags say isWrite?

@sumeerbhola sumeerbhola force-pushed the opt_latch branch 4 times, most recently from 39ce60f to 40ac09c Compare June 16, 2021 14:34
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/client_replica_test.go, line 4025 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm curious why you're using a locking read instead of a 1PC write. Is it so that a lock gets added to the lock table?

I ask because conflicting with a write's latches is likely more detrimental than conflicting with a locking read's latches, because writes need to go through consensus.

Good point. I've changed this to writes. And added two kinds of writing modes: the 1PC writes only results in latch conflicts while the 2PC writes can also sometimes have a conflicting intent discovered during evaluation (though this is not common). The PR description has the new numbers and some description text.


pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think these need to be exported. The only other reason for that is if we were implementing the interface in a separate package, like we do with the latchManager.

latchManagerImpl is in the same package, so I may not be understanding your comment. I've left all latchManager methods as exported for now.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):

Previously, sumeerbhola wrote…

execChangeReplicasTxn doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues.

yep, I meant that and was not being clear in stating my question.

One possibility is that ba.IsReadOnly() isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()?

It still fails in the same way.

It turns out to be a silly oversight (which took much longer to find than it should have) -- an optimistically evaluating request that discovers a lock and has a conflicting latch cannot add that discovered lock since that conflicting latch may be due to intent resolution that it is racing with. If it adds that discovered lock we've added something that is already resolved to the lock table.

Also, I'll narrow this down to IsReadOnly && isAllTransactional but I need some help understanding and writing the code comment justifying the isAllTransactional part. Is there something I can read to understand what is going on with these flags. For instance, I see SubsumeRequest has flags isRead | isAlone | updatesTSCache but declareKeysSubsume says "non-MVCC write over every addressable key in the range" -- why doesn't the flags say isWrite?

The isRead and isWrite flags aren't the best names. What they are referring to is whether or not a request goes through raft or not, not the way in which it synchronizes with other requests. So a locking read is also isRead.

I don't see much documentation on these flags. Everything seems to be in roachpb/api.go.

@sumeerbhola
Copy link
Collaborator Author

TFTR!

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 21, 2021

Build failed:

Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.

Numbers for the new OptimisticEval benchmark show improvement
when real-contention=false, compared to master which only had
optimistic locking. There is a modest slowdown in the
real-contention=true case since every optimistic read has to
try twice. The benchmark has concurrent writes of two different
kinds: latches represents 1PC writers, so no intents, while
latches-and-locks represent writers that will leave intents.

For the latter, when the optimistic read finds an intent during
evaluation it cannot necessarily add it as a discovered lock,
if there is also a conflicting latch (since it could be racing
with intent resolution). This case is rare in these benchmarks
(latches-and-locks/real-contention=true): 13% found an intent
and had a conflicting latch when num-writers=1 and < 1% for
the same case when num-writers=4. The remainder (the common
case) was to find a conflict when looking for conflicting
latches/locks using CheckOptimisticNoConflicts.

name                                                                     old time/op    new time/op    delta
OptimisticEvalForLocks/real-contention=false-16                            28.2µs ± 4%    28.5µs ± 5%     ~     (p=0.393 n=10+10)
OptimisticEvalForLocks/real-contention=true-16                             5.52ms ± 1%    5.52ms ± 1%     ~     (p=0.796 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16     220µs ±62%      89µs ±38%  -59.37%  (p=0.000 n=10+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16     213µs ±65%     155µs ±82%  -27.33%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16     1.33ms ± 3%    1.27ms ±16%     ~     (p=0.829 n=8+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16     2.02ms ±10%    2.25ms ± 9%  +11.31%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16               175µs ± 2%      45µs ± 5%  -74.05%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16               613µs ± 2%      44µs ± 3%  -92.74%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                181µs ± 4%     179µs ± 5%     ~     (p=0.315 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                624µs ± 3%     620µs ± 3%     ~     (p=0.247 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEvalForLocks/real-contention=false-16                            8.40kB ± 7%    8.33kB ± 3%     ~     (p=1.000 n=10+8)
OptimisticEvalForLocks/real-contention=true-16                             31.8kB ± 7%    32.6kB ± 9%     ~     (p=0.382 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16    23.9kB ±21%    17.8kB ±25%  -25.55%  (p=0.003 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16    24.1kB ±20%    19.4kB ±22%  -19.56%  (p=0.015 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16      104kB ± 1%     101kB ± 1%   -2.89%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      197kB ± 6%     217kB ±11%  +10.19%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16              29.9kB ± 0%    13.4kB ± 1%  -55.07%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-16              94.9kB ± 0%    14.5kB ± 1%  -84.74%  (p=0.000 n=9+8)
OptimisticEval/latches/real-contention=true/num-writers=1-16               29.9kB ± 0%    31.3kB ± 0%   +4.59%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16               94.8kB ± 0%    96.2kB ± 0%   +1.48%  (p=0.000 n=10+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEvalForLocks/real-contention=false-16                              68.6 ± 1%      69.6 ± 2%   +1.52%  (p=0.005 n=9+10)
OptimisticEvalForLocks/real-contention=true-16                                271 ± 2%       272 ± 2%     ~     (p=0.336 n=8+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16       149 ±17%       117 ±18%  -21.00%  (p=0.002 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16       151 ±16%       126 ±18%  -16.31%  (p=0.013 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16        610 ± 0%       559 ± 1%   -8.32%  (p=0.000 n=8+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16      1.12k ± 5%     1.19k ±12%     ~     (p=0.053 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-16                 162 ± 0%        91 ± 0%  -43.83%  (p=0.001 n=8+9)
OptimisticEval/latches/real-contention=false/num-writers=4-16                 445 ± 0%        96 ± 0%  -78.44%  (p=0.000 n=9+9)
OptimisticEval/latches/real-contention=true/num-writers=1-16                  163 ± 0%       184 ± 0%  +13.07%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-16                  445 ± 0%       467 ± 0%   +4.94%  (p=0.000 n=10+10)

Fixes cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
@sumeerbhola
Copy link
Collaborator Author

I can't reproduce the TestKVFeed failure on the previous bors run.

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 22, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: Scans with limit acquire excessively large latches
3 participants