-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrency,kvserver: limited scans do optimistic latching #66059
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I owe benchmarks and testing for spanlatch.Manager, but wanted a quick opinion first.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
// Note that we are letting locking readers be considered for optimistic // evaluation. This is correct, though not necessarily beneficial.
I couldn't convince myself of any incorrectness because of this, though we could tighten this to exclude locking readers.
There may be a bug here or elsewhere, which I've not been able to reproduce when re-running kvserver tests multiple times: there was one case where a node failed with existing lock cannot be acquired by different transaction
-- it was a writer since it came via a stack that included raft and replicaStateMachine.ApplySideEffects
. So it is possible that something is allowing a locking reader to add locks concurrently, which shouldn't happen since a locking reader declares its spans as SpanReadWrite
.
pkg/kv/kvserver/replica_test.go, line 2885 at r1 (raw file):
// evaluate optimistically without waiting for latches to be acquired. In some // cases, this allows them to avoid waiting on writes that their // over-estimated declared spans overlapped with.
I lifted this from your PR and just tweaked with some additional comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I owe benchmarks and testing for spanlatch.Manager, but wanted a quick opinion first.
This all looks great! I'll close #33373, since it's no longer needed.
Reviewed 8 of 9 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Previously, sumeerbhola wrote…
I couldn't convince myself of any incorrectness because of this, though we could tighten this to exclude locking readers.
There may be a bug here or elsewhere, which I've not been able to reproduce when re-running kvserver tests multiple times: there was one case where a node failed withexisting lock cannot be acquired by different transaction
-- it was a writer since it came via a stack that included raft andreplicaStateMachine.ApplySideEffects
. So it is possible that something is allowing a locking reader to add locks concurrently, which shouldn't happen since a locking reader declares its spans asSpanReadWrite
.
This is interesting. I also don't see why this wouldn't be safe. We check for latch conflicts before calling handleReadOnlyLocalEvalResult
on the read path, so we should detect conflicting writes that are ahead of the locking read in terms of latchManager sequencing. And we are holding write latches across the entire read's span, so the writer shouldn't be able to slip in from behind. Which test was it that you saw the failure on?
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
AcquireOptimistic(req Request) latchGuard // CheckOptimisticNoConflicts returns true iff the spans in the provided
nit: there's some asymetry here between the latchManager and the lockTable. Specifically, the latchManger has a CheckOptimisticNoConflicts
method that accepts a latchGuard
, but the lockTableGuard
itself has a CheckOptimisticNoConflicts
method.
Did you consider adding the CheckOptimisticNoConflicts
and WaitUntilAcquired
methods on to the latchGuard
interface? Is that awkward because we have spanlatch.Guard
implement the interface directly?
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 146 at r2 (raw file):
} g = newGuard(req) g.lm = m.lm
Would it make more sense to assign this at the same place that we assign the latchGuard
?
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 195 at r2 (raw file):
firstIteration := true for { if !firstIteration {
Should we pull this assertion into the else
branch of the following conditional block?
pkg/kv/kvserver/spanlatch/manager.go, line 289 at r2 (raw file):
func overlaps(it *iterator, search *latch, ignore ignoreFn) bool { for it.FirstOverlap(search); it.Valid(); it.NextOverlap(search) { held := it.Cur()
Make a note here that the held
latch may have already been signalled, but that still doesn't mean that we can ignore it, because that doesn't prove that our request observed the result of the conflicting latch holder.
pkg/kv/kvserver/spanlatch/manager.go, line 290 at r2 (raw file):
for it.FirstOverlap(search); it.Valid(); it.NextOverlap(search) { held := it.Cur() if ignore(search.ts, held.ts) {
minor nit: if !ignore(...) { return true }
is a little easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is interesting. I also don't see why this wouldn't be safe. We check for latch conflicts before calling
handleReadOnlyLocalEvalResult
on the read path, so we should detect conflicting writes that are ahead of the locking read in terms of latchManager sequencing. And we are holding write latches across the entire read's span, so the writer shouldn't be able to slip in from behind. Which test was it that you saw the failure on?
I can reproduce this on TestLargeUnsplittableRangeReplicate
. Excluding locking readers still causes the failure. This looks like a range descriptor lock. Looks like the admin commands like AdminSplitRequest
only specify isAdmin
and not isLocking
or isWrite
. I can't quite tell what keys are specified in the RequestHeader
for such commands, or where they are executed (I couldn't find cmd_.*.go
files for them) -- where should I look? Though I can reproduce this even after excluding isAdmin
commands from optimistic execution. It's possible there is just a bug somewhere, say in the spanlatch/manager.go changes and not a logic issue -- but let me know if something obvious leaps out at you.
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 existing lock at /Meta2/Table/18 held by id=fc0ce145 key=/Local/Range/Table/17/RangeDescriptor pri=0.01707394 epo=0 ts=1623172221.959241000,0 min=1623172221.959241000,0 seq=7 cannot be acquired by different transaction id=a2b9108e key=/Local/Range/Table/17/RangeDescriptor pri=0.00223317 epo=0 ts=1623172221.969124000,0 min=1623172221.969124000,0 seq=7
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !goroutine 951792 [running]:
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafaa100, 0x0, 0x10000012a, 0xffffffff00203001)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00a03a560, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1686aa8933f28218, 0x400000000, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8ef2920, 0xc009464cf0, 0x1, 0x4, 0x8193f2a, 0x2, 0xc003918828, 0x1, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc009617770, 0x8ef2920, 0xc009464cf0, 0xc00a3b3c20)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:397 +0xf9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc0013db500, 0x8ef2920, 0xc009464cf0, 0x0, 0x0, 0x0, 0x0, 0xc00a3b3c20, 0x1, 0x1, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0013db5d8, 0x8f47240, 0xc0028a9008, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x357969b8, 0xc0013db848, 0xc003919350, 0x0, 0x0, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc003919898, 0x8ef2920, 0xc006b68030, 0x8f39980, 0xc0013db7e8, 0x0, 0x0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc003919898, 0x8ef2920, 0xc006b68030, 0x1, 0x82178fa)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc005405000, 0x8ef2920, 0xc003b3e090, 0x1)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc009bbd2c0, 0x8ef2920, 0xc003b3e090)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00756c900, 0x8ef2920, 0xc003b3e090, 0x0, 0xc00781c2f0)
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I'll start working on more tests.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: there's some asymetry here between the latchManager and the lockTable. Specifically, the latchManger has a
CheckOptimisticNoConflicts
method that accepts alatchGuard
, but thelockTableGuard
itself has aCheckOptimisticNoConflicts
method.Did you consider adding the
CheckOptimisticNoConflicts
andWaitUntilAcquired
methods on to thelatchGuard
interface? Is that awkward because we havespanlatch.Guard
implement the interface directly?
Yes, exactly. It didn't seem worth the fiddliness with an additional struct. It's not even clear that we need these methods to be public (same for lockTable
) since they are only called from within the package. Can you remind me why the tests use concurrency_test
as their package name?
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 146 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Would it make more sense to assign this at the same place that we assign the
latchGuard
?
Done
pkg/kv/kvserver/concurrency/concurrency_manager.go, line 195 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we pull this assertion into the
else
branch of the following conditional block?
Done
pkg/kv/kvserver/spanlatch/manager.go, line 289 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make a note here that the
held
latch may have already been signalled, but that still doesn't mean that we can ignore it, because that doesn't prove that our request observed the result of the conflicting latch holder.
Done
pkg/kv/kvserver/spanlatch/manager.go, line 290 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
minor nit:
if !ignore(...) { return true }
is a little easier to read.
Done
b183428
to
1357016
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests and a microbenchmark.
I did not investigate the reason for the small speedup in OptimisticEvalForLatches/real-contention=true
.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests and a microbenchmark.
These microbenchmark numbers are very encouraging. Thanks for gathering them.
Reviewed 4 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/client_replica_test.go, line 4025 at r3 (raw file):
for { txn := db.NewTxn(ctx, "locking txn") _, err = txn.ScanForUpdate(ctx, lockStart, "c", 0)
I'm curious why you're using a locking read instead of a 1PC write. Is it so that a lock gets added to the lock table?
I ask because conflicting with a write's latches is likely more detrimental than conflicting with a locking read's latches, because writes need to go through consensus.
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Previously, sumeerbhola wrote…
I can reproduce this on
TestLargeUnsplittableRangeReplicate
. Excluding locking readers still causes the failure. This looks like a range descriptor lock. Looks like the admin commands likeAdminSplitRequest
only specifyisAdmin
and notisLocking
orisWrite
. I can't quite tell what keys are specified in theRequestHeader
for such commands, or where they are executed (I couldn't findcmd_.*.go
files for them) -- where should I look? Though I can reproduce this even after excludingisAdmin
commands from optimistic execution. It's possible there is just a bug somewhere, say in the spanlatch/manager.go changes and not a logic issue -- but let me know if something obvious leaps out at you.F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 existing lock at /Meta2/Table/18 held by id=fc0ce145 key=/Local/Range/Table/17/RangeDescriptor pri=0.01707394 epo=0 ts=1623172221.959241000,0 min=1623172221.959241000,0 seq=7 cannot be acquired by different transaction id=a2b9108e key=/Local/Range/Table/17/RangeDescriptor pri=0.00223317 epo=0 ts=1623172221.969124000,0 min=1623172221.969124000,0 seq=7 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !goroutine 951792 [running]: F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafaa100, 0x0, 0x10000012a, 0xffffffff00203001) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc00a03a560, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1686aa8933f28218, 0x400000000, 0x0, ...) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8ef2920, 0xc009464cf0, 0x1, 0x4, 0x8193f2a, 0x2, 0xc003918828, 0x1, 0x1) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc009617770, 0x8ef2920, 0xc009464cf0, 0xc00a3b3c20) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:397 +0xf9 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc0013db500, 0x8ef2920, 0xc009464cf0, 0x0, 0x0, 0x0, 0x0, 0xc00a3b3c20, 0x1, 0x1, ...) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0013db5d8, 0x8f47240, 0xc0028a9008, 0x0, 0x0, 0x0, 0x0) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x357969b8, 0xc0013db848, 0xc003919350, 0x0, 0x0, 0x0, 0x0) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc003919898, 0x8ef2920, 0xc006b68030, 0x8f39980, 0xc0013db7e8, 0x0, 0x0) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc003919898, 0x8ef2920, 0xc006b68030, 0x1, 0x82178fa) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc0013db500, 0x8ef2920, 0xc006b68030, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc005405000, 0x8ef2920, 0xc003b3e090, 0x1) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc009bbd2c0, 0x8ef2920, 0xc003b3e090) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00756c900, 0x8ef2920, 0xc003b3e090, 0x0, 0xc00781c2f0) F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9 F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask F210608 17:10:21.972415 951792 kv/kvserver/concurrency/concurrency_manager.go:397 [n1,s1,r1/1:/{Min-System/NodeL…}] 580 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
AdminSplitRequest
shouldn't directly touch this key though. Admin
requests delegate to standard transactions, which run Puts
, Gets
, and an EndTxn
with a special commit trigger. Based on the lock's key and the key of the lock holder transaction, I wonder if the existing lock holder is a Merge transaction that just performed a locking Get
.
Were you able to reproduce this when you disable optimistic evaluation for locking reads?
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
Can you remind me why the tests use concurrency_test as their package name?
The idea with these _test
packages is that they force you to test the external interface of a package. But you aren't even calling these methods from outside of this package, are you? Aren't you calling (*concurrency.Guard).CheckOptimisticNoConflicts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Were you able to reproduce this when you disable optimistic evaluation for locking reads?
I can reproduce even when disallowing both locking reads and admin commands by
considerOptEval := ba.IsReadOnly() && !ba.IsLocking() && !ba.IsAdmin() && ba.Header.MaxSpanRequestKeys > 0 && optimisticEvalLimitedScans.Get(&r.ClusterSettings().SV)
The key keeps shifting,
existing lock at /Meta2/Table/41 held by id=a3667c84 key=/Local/Range/Table/40/RangeDescriptor pri=0.03801826 epo=0 ts=1623700863.597718000,0 min=1623700863.597718000,0 seq=7 cannot be acquired by different transaction id=b15c8cd1 key=/Local/Range/Table/40/RangeDescriptor pri=0.01175227 epo=0 ts=1623700863.614196000,0
existing lock at /Meta2/System/tsd held by id=50b55b3f key=/Local/Range/System/NodeLivenessMax/RangeDescriptor pri=0.00108075 epo=0 ts=1623699918.758514000,1 min=1623699918.757972000,0 seq=7 cannot be acquired by different transaction id=efa59995 key=/Local/Range/System/NodeLivenessMax/RangeDescriptor pri=0.00103256 epo=0 ts=1623699918.768336000,0 min=1623699918.768336000,0 seq=7
I printed the stack trace for each txn being created and its ID to see where the conflicting txns originated. These look like change-replicas. Can you shed some light on the latching/locking that these do?
Lock holder txn:
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 txn 1eb0e3d6, created by goroutine 3966204 [running]:
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +runtime/debug.Stack(0x0, 0xc00b49e664, 0x24)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /usr/local/opt/[email protected]/libexec/src/runtime/debug/stack.go:24 +0x9f
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv.NewTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc00bf04200, 0x1, 0xc007182bf0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:131 +0x289
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc00bf04200, 0x8f1d0a0, 0xc00b9f8c60, 0xc00b49e950, 0x4, 0x4)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:778 +0x87
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.execChangeReplicasTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc00b49eac4, 0x1, 0x1, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2093 +0x265
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).initializeRaftLearners(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182ba8, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1595 +0x425
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).changeReplicasImpl(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182b70, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1041 +0xaed
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).changeReplicas(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc007182b70, 0x1, 0x1, 0xc003ada5b0, 0x1, 0x81f1aca, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:1373 +0xdd
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).addOrReplaceVoters(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00c5b7da0, 0x3, 0x4, 0x0, 0x0, 0x0, ...)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:644 +0x874
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChange(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00b49f6c8, 0x81d2200, 0xd, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:401 +0x736
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).process(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc0083ae550, 0x81d2249, 0xd, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:313 +0x1d4
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x8f1d0a0, 0xc00b9f8c60, 0xdf8475800, 0x8f1d0a0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:964 +0x2ac
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x8f1d0a0, 0xc00b9f8c60, 0xc00b998420, 0x21, 0xdf8475800, 0xc00b49f9b0, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:140 +0x9e
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc00c1f3900, 0x8f1d0e0, 0xc00c8226c0, 0x8f905a0, 0xc00d3cea80, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:923 +0x2b3
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1.2(0x8f1d0e0, 0xc00c822690)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1182 +0x6c
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc00e2e8300, 0x8f1d0e0, 0xc00c822690, 0xc00b9983f0, 0x2f, 0xc00b49fba0, 0x0, 0x0)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:309 +0xb2
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1(0xc00c1f3900, 0x8f1d0e0, 0xc00e7f0510, 0xc00e2e8300)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1179 +0x5aa
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2(0x8f1d0e0, 0xc00e7f0510)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1188 +0x4cf
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f0510, 0x0, 0xc002c04860)
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 +created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
I210614 20:15:09.978401 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5649 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.978695 3966204 kv/kvserver/replica_command.go:2112 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5650 change replicas (add [(n5,s5):4LEARNER] remove []): existing descriptor r8:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=4]
Txn that tries to acquire over the existing lock:
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 txn 8d255e03, created by goroutine 3966204 [running]:
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +runtime/debug.Stack(0x0, 0xc00b49e7e4, 0x24)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /usr/local/opt/[email protected]/libexec/src/runtime/debug/stack.go:24 +0x9f
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv.NewTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc00bf04200, 0x1, 0xc007183b30)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:131 +0x289
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc00bf04200, 0x8f1d0a0, 0xc00b9f8c60, 0xc00b49ead0, 0x4, 0x4)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:778 +0x87
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.execChangeReplicasTxn(0x8f1d0a0, 0xc00b9f8c60, 0xc003ada930, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007183af4, 0x1, 0x1, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2093 +0x265
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).execReplicationChangesForVoters(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada930, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182ba8, 0x1, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1751 +0x44f
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).changeReplicasImpl(0xc00d3cea80, 0x8f1d0a0, 0xc00b9f8c60, 0xc003ada5b0, 0x1, 0x81f1aca, 0x16, 0xc00e0f93b0, 0x46, 0xc007182b70, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:1054 +0x705
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).changeReplicas(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc007182b70, 0x1, 0x1, 0xc003ada5b0, 0x1, 0x81f1aca, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:1373 +0xdd
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).addOrReplaceVoters(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00c5b7da0, 0x3, 0x4, 0x0, 0x0, 0x0, ...)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:644 +0x874
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChange(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc00b49f6c8, 0x81d2200, 0xd, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:401 +0x736
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).process(0xc0069c9e00, 0x8f1d0a0, 0xc00b9f8c60, 0xc00d3cea80, 0xc0083ae550, 0x81d2249, 0xd, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicate_queue.go:313 +0x1d4
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1(0x8f1d0a0, 0xc00b9f8c60, 0xdf8475800, 0x8f1d0a0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:964 +0x2ac
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x8f1d0a0, 0xc00b9f8c60, 0xc00b998420, 0x21, 0xdf8475800, 0xc00b49f9b0, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:140 +0x9e
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc00c1f3900, 0x8f1d0e0, 0xc00c8226c0, 0x8f905a0, 0xc00d3cea80, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:923 +0x2b3
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1.2(0x8f1d0e0, 0xc00c822690)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1182 +0x6c
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc00e2e8300, 0x8f1d0e0, 0xc00c822690, 0xc00b9983f0, 0x2f, 0xc00b49fba0, 0x0, 0x0)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:309 +0xb2
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2.1(0xc00c1f3900, 0x8f1d0e0, 0xc00e7f0510, 0xc00e2e8300)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1179 +0x5aa
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2(0x8f1d0e0, 0xc00e7f0510)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/queue.go:1188 +0x4cf
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f0510, 0x0, 0xc002c04860)
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 +created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.988687 3966204 kv/txn.go:131 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5677 + /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
I210614 20:15:09.988995 3966204 kv/kvserver/replica_command.go:2112 [n1,replicate,s1,r8/1:/Table/1{2-3}] 5678 change replicas (add [(n5,s5):4] remove []): existing descriptor r8:/Table/1{2-3} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n5,s5):4LEARNER, next=5, gen=5]
Node failure when trying to acquire an already held lock:
I210614 20:15:09.992247 3965597 1@kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5679 the server is terminating due to a fatal error (see the DEV channel for details)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 existing lock at /Meta2/Table/13 held by id=1eb0e3d6 key=/Local/Range/Table/12/RangeDescriptor pri=0.00527678 epo=0 ts=1623701709.978252000,0 min=1623701709.978252000,0 seq=7 cannot be acquired by different transaction id=8d255e03 key=/Local/Range/Table/12/RangeDescriptor pri=0.01443145 epo=0 ts=1623701709.988545000,0 min=1623701709.988545000,0 seq=7
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !goroutine 3965597 [running]:
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xafe6a00, 0xe6a69e, 0x1, 0xffffffff00203004)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc0074c4620, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x16888c1a3d498ec8, 0x400000000, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:277 +0xc67
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x8f1d0e0, 0xc00ca80300, 0x1, 0x4, 0x81b910a, 0x2, 0xc0043d4828, 0x1, 0x1)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockAcquired(0xc00d2b7450, 0x8f1d0e0, 0xc00ca80300, 0xc00b3d19e0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:417 +0xf9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleReadWriteLocalEvalResult(0xc006804000, 0x8f1d0e0, 0xc00ca80300, 0x0, 0x0, 0x0, 0x0, 0xc00b3d19e0, 0x1, 0x1, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_proposal.go:662 +0xed
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects(0xc0068040d8, 0x8f71ce0, 0xc004ed1008, 0x0, 0x0, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_application_state_machine.go:1209 +0x8c5
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.mapCheckedCmdIter(0x3458aab8, 0xc006804348, 0xc0043d5350, 0x0, 0x0, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/cmd.go:196 +0x131
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).applyOneBatch(0xc0043d5898, 0x8f1d0e0, 0xc00a6368a0, 0x8f642a0, 0xc0068042e8, 0x0, 0x0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:291 +0x245
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).ApplyCommittedEntries(0xc0043d5898, 0x8f1d0e0, 0xc00a6368a0, 0x1, 0x823ccbe)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:247 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc006804000, 0x8f1d0e0, 0xc00a6368a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:839 +0xf91
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(0xc006804000, 0x8f1d0e0, 0xc00a6368a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:487 +0x11c
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc00c240700, 0x8f1d0e0, 0xc00e7f08a0, 0x1)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:515 +0x145
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker(0xc008fda820, 0x8f1d0e0, 0xc00e7f08a0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:284 +0x2c2
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00e2e8300, 0x8f1d0e0, 0xc00e7f08a0, 0x0, 0xc00dddc0e0)
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xb9
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
F210614 20:15:09.992333 3965597 kv/kvserver/concurrency/concurrency_manager.go:417 [n1,s1,r1/1:/{Min-System/NodeL…}] 5680 ! /Users/sumeer/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:342 +0xfc
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can you remind me why the tests use concurrency_test as their package name?
The idea with these
_test
packages is that they force you to test the external interface of a package. But you aren't even calling these methods from outside of this package, are you? Aren't you calling(*concurrency.Guard).CheckOptimisticNoConflicts
?
Do you care whether these are exported from the package or not? As you partially said, we never call any of the latchManager
methods directly from the tests, and same for lockTable
since the interface names themselves are not exported.
(Though I'm inclined not to do this trivial refactor in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Can you shed some light on the latching/locking that these do?
execChangeReplicasTxn
doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues. It looks like the write to the meta key is happening through a PutRequest
here - notice the call to updateRangeAddressing
before.
One possibility is that ba.IsReadOnly()
isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest
through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()
?
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
Previously, sumeerbhola wrote…
Do you care whether these are exported from the package or not? As you partially said, we never call any of the
latchManager
methods directly from the tests, and same forlockTable
since the interface names themselves are not exported.
(Though I'm inclined not to do this trivial refactor in this PR)
I don't think these need to be exported. The only other reason for that is if we were implementing the interface in a separate package, like we do with the latchManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
execChangeReplicasTxn doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues.
yep, I meant that and was not being clear in stating my question.
One possibility is that ba.IsReadOnly() isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()?
It still fails in the same way.
It turns out to be a silly oversight (which took much longer to find than it should have) -- an optimistically evaluating request that discovers a lock and has a conflicting latch cannot add that discovered lock since that conflicting latch may be due to intent resolution that it is racing with. If it adds that discovered lock we've added something that is already resolved to the lock table.
Also, I'll narrow this down to IsReadOnly && isAllTransactional
but I need some help understanding and writing the code comment justifying the isAllTransactional
part. Is there something I can read to understand what is going on with these flags. For instance, I see SubsumeRequest
has flags isRead | isAlone | updatesTSCache
but declareKeysSubsume
says "non-MVCC write over every addressable key in the range" -- why doesn't the flags say isWrite
?
39ce60f
to
40ac09c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/client_replica_test.go, line 4025 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm curious why you're using a locking read instead of a 1PC write. Is it so that a lock gets added to the lock table?
I ask because conflicting with a write's latches is likely more detrimental than conflicting with a locking read's latches, because writes need to go through consensus.
Good point. I've changed this to writes. And added two kinds of writing modes: the 1PC writes only results in latch conflicts while the 2PC writes can also sometimes have a conflicting intent discovered during evaluation (though this is not common). The PR description has the new numbers and some description text.
pkg/kv/kvserver/concurrency/concurrency_control.go, line 430 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't think these need to be exported. The only other reason for that is if we were implementing the interface in a separate package, like we do with the
latchManager
.
latchManagerImpl
is in the same package, so I may not be understanding your comment. I've left all latchManager
methods as exported for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/replica_send.go, line 815 at r1 (raw file):
Previously, sumeerbhola wrote…
execChangeReplicasTxn doesn't perform any latching or locking itself. Instead, all of this is done through the batches that it issues.
yep, I meant that and was not being clear in stating my question.
One possibility is that ba.IsReadOnly() isn't selective enough and that you're running into issues with "non-transactional" operations. I don't know exactly what is going wrong, but I'm thinking that this filter could allow requests like SubsumeRequest through, which would be very bad. Can you try to reproduce with the additional condition of ba.IsAllTransactional()?
It still fails in the same way.
It turns out to be a silly oversight (which took much longer to find than it should have) -- an optimistically evaluating request that discovers a lock and has a conflicting latch cannot add that discovered lock since that conflicting latch may be due to intent resolution that it is racing with. If it adds that discovered lock we've added something that is already resolved to the lock table.
Also, I'll narrow this down to
IsReadOnly && isAllTransactional
but I need some help understanding and writing the code comment justifying theisAllTransactional
part. Is there something I can read to understand what is going on with these flags. For instance, I seeSubsumeRequest
has flagsisRead | isAlone | updatesTSCache
butdeclareKeysSubsume
says "non-MVCC write over every addressable key in the range" -- why doesn't the flags sayisWrite
?
The isRead
and isWrite
flags aren't the best names. What they are referring to is whether or not a request goes through raft or not, not the way in which it synchronizes with other requests. So a locking read is also isRead
.
I don't see much documentation on these flags. Everything seems to be in roachpb/api.go
.
TFTR! |
bors r+ |
Build failed: |
Latches for the full spans get inserted up front in the spanlatch.Manager, and the conflict checking happens after evaluation, only over the spans that were actually read. If there is a conflict, the existing inserted latches are waited on, and execution switches to pessimistic latching and locking. The existing cluster setting for optimistic locking is used to gate this behavior. Numbers for the new OptimisticEval benchmark show improvement when real-contention=false, compared to master which only had optimistic locking. There is a modest slowdown in the real-contention=true case since every optimistic read has to try twice. The benchmark has concurrent writes of two different kinds: latches represents 1PC writers, so no intents, while latches-and-locks represent writers that will leave intents. For the latter, when the optimistic read finds an intent during evaluation it cannot necessarily add it as a discovered lock, if there is also a conflicting latch (since it could be racing with intent resolution). This case is rare in these benchmarks (latches-and-locks/real-contention=true): 13% found an intent and had a conflicting latch when num-writers=1 and < 1% for the same case when num-writers=4. The remainder (the common case) was to find a conflict when looking for conflicting latches/locks using CheckOptimisticNoConflicts. name old time/op new time/op delta OptimisticEvalForLocks/real-contention=false-16 28.2µs ± 4% 28.5µs ± 5% ~ (p=0.393 n=10+10) OptimisticEvalForLocks/real-contention=true-16 5.52ms ± 1% 5.52ms ± 1% ~ (p=0.796 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 220µs ±62% 89µs ±38% -59.37% (p=0.000 n=10+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 213µs ±65% 155µs ±82% -27.33% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 1.33ms ± 3% 1.27ms ±16% ~ (p=0.829 n=8+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 2.02ms ±10% 2.25ms ± 9% +11.31% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 175µs ± 2% 45µs ± 5% -74.05% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 613µs ± 2% 44µs ± 3% -92.74% (p=0.000 n=10+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 181µs ± 4% 179µs ± 5% ~ (p=0.315 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 624µs ± 3% 620µs ± 3% ~ (p=0.247 n=10+10) name old alloc/op new alloc/op delta OptimisticEvalForLocks/real-contention=false-16 8.40kB ± 7% 8.33kB ± 3% ~ (p=1.000 n=10+8) OptimisticEvalForLocks/real-contention=true-16 31.8kB ± 7% 32.6kB ± 9% ~ (p=0.382 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 23.9kB ±21% 17.8kB ±25% -25.55% (p=0.003 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 24.1kB ±20% 19.4kB ±22% -19.56% (p=0.015 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 104kB ± 1% 101kB ± 1% -2.89% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 197kB ± 6% 217kB ±11% +10.19% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 29.9kB ± 0% 13.4kB ± 1% -55.07% (p=0.000 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=4-16 94.9kB ± 0% 14.5kB ± 1% -84.74% (p=0.000 n=9+8) OptimisticEval/latches/real-contention=true/num-writers=1-16 29.9kB ± 0% 31.3kB ± 0% +4.59% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 94.8kB ± 0% 96.2kB ± 0% +1.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta OptimisticEvalForLocks/real-contention=false-16 68.6 ± 1% 69.6 ± 2% +1.52% (p=0.005 n=9+10) OptimisticEvalForLocks/real-contention=true-16 271 ± 2% 272 ± 2% ~ (p=0.336 n=8+8) OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-16 149 ±17% 117 ±18% -21.00% (p=0.002 n=10+10) OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-16 151 ±16% 126 ±18% -16.31% (p=0.013 n=10+10) OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-16 610 ± 0% 559 ± 1% -8.32% (p=0.000 n=8+9) OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-16 1.12k ± 5% 1.19k ±12% ~ (p=0.053 n=9+10) OptimisticEval/latches/real-contention=false/num-writers=1-16 162 ± 0% 91 ± 0% -43.83% (p=0.001 n=8+9) OptimisticEval/latches/real-contention=false/num-writers=4-16 445 ± 0% 96 ± 0% -78.44% (p=0.000 n=9+9) OptimisticEval/latches/real-contention=true/num-writers=1-16 163 ± 0% 184 ± 0% +13.07% (p=0.000 n=10+10) OptimisticEval/latches/real-contention=true/num-writers=4-16 445 ± 0% 467 ± 0% +4.94% (p=0.000 n=10+10) Fixes cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting latches in an optimistic manner, which means it will not conflict with latches that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off (along with optimistic locking) by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
I can't reproduce the TestKVFeed failure on the previous bors run. |
bors r+ |
Build succeeded: |
Latches for the full spans get inserted up front in the
spanlatch.Manager, and the conflict checking happens after
evaluation, only over the spans that were actually read.
If there is a conflict, the existing inserted latches are waited
on, and execution switches to pessimistic latching and locking.
The existing cluster setting for optimistic locking is used to
gate this behavior.
Numbers for the new OptimisticEval benchmark show improvement
when real-contention=false, compared to master which only had
optimistic locking. There is a modest slowdown in the
real-contention=true case since every optimistic read has to
try twice. The benchmark has concurrent writes of two different
kinds: latches represents 1PC writers, so no intents, while
latches-and-locks represent writers that will leave intents.
For the latter, when the optimistic read finds an intent during
evaluation it cannot necessarily add it as a discovered lock,
if there is also a conflicting latch (since it could be racing
with intent resolution). This case is rare in these benchmarks
(latches-and-locks/real-contention=true): 13% found an intent
and had a conflicting latch when num-writers=1 and < 1% for
the same case when num-writers=4. The remainder (the common
case) was to find a conflict when looking for conflicting
latches/locks using CheckOptimisticNoConflicts.
Fixes #9521
Release note (performance improvement): A limited scan now checks
for conflicting latches in an optimistic manner, which means it will
not conflict with latches that were held in the scan's full spans,
but were not in the spans that were scanned until the limit was
reached. This behavior can be turned off (along with optimistic
locking) by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.