-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrency: move finalizedTxnCache into lock table #57947
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):
// For COMMITTED or ABORTED transactions, all locks are released. // // For PENDING or STAGING transactions, the behavior is:
Is the eventual plan for UpdateLocks
to only deal with updating PENDING/STAGING locks and to never deal with the removal of locks?
pkg/kv/kvserver/concurrency/lock_table.go, line 515 at r1 (raw file):
The first state it will see after releasing latches is doneWaiting, which will cause it to resolve intents.
This is only true if g.mu.startWait
had previously been false, right?
pkg/kv/kvserver/concurrency/lock_table.go, line 1100 at r1 (raw file):
// need to do the resolution. // // Ideally, we would strengthen B and C -- a request should make a claim on
I think it would be worth running the YCSB suite before and after this change, just to verify that we're not triggering one of these races reliably and causing some pathological behavior that dramatically impacts performance. I've found with YCSB that high levels of concurrency + contention can lead to some unexpected behavior.
pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):
// TODO(sumeer): do this cleaner solution for batched intent resolution. // // In the future we'd like to augment the lockTable with an understanding of
In such a world, would we need the finalizedTxnCache
? It seems like it may still serve a purpose if it was shared across Ranges, but I'm not sure that it would if we weren't able to lift it up, because a transaction should (almost always) resolve all of its intents on a single range atomically.
But actually, that may not be true for abandoned intents, where the lack of an inverted txnID -> []intent
index prevents us from resolving all of an abandoned transaction's intents when we find one to be stale. In those cases, some amount of memory, even on a per-range level, can still be helpful.
pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):
// Also, resolving these locks/intents would proceed without latching, so we // would not rely on MVCC scanning to add discovered locks to the lock table, // since the discovered locks may be stale.
Maybe also mention ideas around coalescing resolution and replacement into the same Raft entry. I don't know if we've ever thought all the way through how such a mechanism would work, but you can imagine that such a mechanism would be very beneficial for contended workloads.
pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
We don't expect the caller to GC this lockState and instead will GC it in tryUpdateLock.
How is this supposed to work? Who will end up calling tryUpdateLock
? Is that what the call to g.lt.UpdateLocks
is in findNextLockAfter
?
Note that there may be other waiters, so the caller may have to wait behind them.
Is this why we can't just return false here?
pkg/kv/kvserver/concurrency/lock_table.go, line 1237 at r1 (raw file):
} active := true if replicatedLockFinalizedTxn != nil && l.queuedWriters.Front().Value.(*queuedGuard) == qg {
nit: this would probably be easier to understand if we dropped this check all the way down to the end of the if sa == spanset.SpanReadWrite {
block, so it was common among all cases.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 97 at r1 (raw file):
stopper *stop.Stopper ir IntentResolver lm LockManager
I think we can remove this dependency.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):
// evaluation. toResolve := guard.ResolveBeforeScanning() w.resolveDeferredIntents(ctx, &err, &toResolve)
We can get rid of some of the reference funkiness here now that it's not a deferred function. The method's signature should probably end up looking something like:
func (w *lockTableWaiterImpl) resolveDeferredIntents(ctx context.Context, toResolve []roachpb.LockUpdate) *Error
pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):
[1] sequence req1: sequencing complete, returned guard # Normally req1 will not discover write intents for c, d, e in one shot, but
It won't? I thought we accumulate multiple intents in pebbleMVCCScanner
.
b710ae8
to
d1a4d19
Compare
d1a4d19
to
7e397dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I've addressed the comments, added tests, and fixed existing tests.
I owe the ycsb measurements. Should l run all the roachtests
with the ycsb
prefix? And should I be running all of them with one roachprod cluster to have the same machines for stable measurements?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is the eventual plan for
UpdateLocks
to only deal with updating PENDING/STAGING locks and to never deal with the removal of locks?
I wasn't thinking of it changing when UpdateLocks
is called in the future. The call from handleReadWriteLocalEvalResult
is authoritative, unlike the call to AddDiscoveredLock
which requires latches to avoid staleness. Is there a possibility that multiple actions from the same transaction can race, or are the calls from replicaStateMachine.ApplySideEffects
single-threaded for a range (due to raftScheduler
?) ?
Maybe you are thinking of some idea that I missed?
pkg/kv/kvserver/concurrency/lock_table.go, line 515 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The first state it will see after releasing latches is doneWaiting, which will cause it to resolve intents.
This is only true if
g.mu.startWait
had previously been false, right?
Correct. I have added a longer comment.
pkg/kv/kvserver/concurrency/lock_table.go, line 1100 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think it would be worth running the YCSB suite before and after this change, just to verify that we're not triggering one of these races reliably and causing some pathological behavior that dramatically impacts performance. I've found with YCSB that high levels of concurrency + contention can lead to some unexpected behavior.
Ack
pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
In such a world, would we need the
finalizedTxnCache
? It seems like it may still serve a purpose if it was shared across Ranges, but I'm not sure that it would if we weren't able to lift it up, because a transaction should (almost always) resolve all of its intents on a single range atomically.But actually, that may not be true for abandoned intents, where the lack of an inverted
txnID -> []intent
index prevents us from resolving all of an abandoned transaction's intents when we find one to be stale. In those cases, some amount of memory, even on a per-range level, can still be helpful.
IIUC, the original motivation for the finalizedTxnCache
in 67c6bdb#diff-b2445fe50a89171ab341cf76c645ef668d8dcccd14143b7c96d07d8b52047941 was transactions with abandoned intents. So yes, it seems it will be relevant even in the future for that case.
For non-abandoned intents, yes I was imagining some node level data-structure that would keep track of STAGED transactions that had committed from the perspective of the user. I haven't given any thought on where exactly that would be placed.
pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Maybe also mention ideas around coalescing resolution and replacement into the same Raft entry. I don't know if we've ever thought all the way through how such a mechanism would work, but you can imagine that such a mechanism would be very beneficial for contended workloads.
replacement of what? Do you have a pointer to an earlier discussion that I can read?
pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
This comment was confusing and I've rewritten it. And a piece of code was missing.
The tryUpdateLock
should happen when doing resolution of all locks for the transaction. The code there to do GC of an empty lock was missing (I forgot about it), and I've added it now.
The call to g.lt.UpdateLocks
in findNextLockAfter
is only for replicated locks, and that will do GC since the calling path allows for that. The behavior here is just a peculiarity of the code -- we are not holding treeMu.mu
.
Is this why we can't just return false here?
Correct. Added to the comment.
pkg/kv/kvserver/concurrency/lock_table.go, line 1237 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this would probably be easier to understand if we dropped this check all the way down to the end of the
if sa == spanset.SpanReadWrite {
block, so it was common among all cases.
Done.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 97 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think we can remove this dependency.
Done
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We can get rid of some of the reference funkiness here now that it's not a deferred function. The method's signature should probably end up looking something like:
func (w *lockTableWaiterImpl) resolveDeferredIntents(ctx context.Context, toResolve []roachpb.LockUpdate) *Error
Done
pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It won't? I thought we accumulate multiple intents in
pebbleMVCCScanner
.
My understanding was that since these are 3 requests, when the first one evaluates and finds the interleaved intent it will come back and do AddDiscoveredLock
and start waiting.
Is that correct?
79f3bd7
to
a9df0cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
Previously, sumeerbhola wrote…
This comment was confusing and I've rewritten it. And a piece of code was missing.
The
tryUpdateLock
should happen when doing resolution of all locks for the transaction. The code there to do GC of an empty lock was missing (I forgot about it), and I've added it now.The call to
g.lt.UpdateLocks
infindNextLockAfter
is only for replicated locks, and that will do GC since the calling path allows for that. The behavior here is just a peculiarity of the code -- we are not holdingtreeMu.mu
.Is this why we can't just return false here?
Correct. Added to the comment.
I've cleaned this up to do the GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I owe the ycsb measurements. Should l run all the roachtests with the ycsb prefix? And should I be running all of them with one roachprod cluster to have the same machines for stable measurements?
There is a lot of variability.
name old ops/sec new ops/sec delta
A/nodes=3 17.5k ± 5% 17.3k ± 3% ~ (p=0.421 n=5+5)
A/nodes=3/cpu=32 27.4k ± 2% 28.1k ± 4% ~ (p=0.393 n=3+5)
F/nodes=3 7.99k ± 8% 8.10k ± 4% ~ (p=0.841 n=5+5)
F/nodes=3/cpu=32 15.3k ± 5% 15.2k ± 2% ~ (p=0.786 n=3+5)
C/nodes=3 43.5k ± 5% 44.4k ± 3% ~ (p=0.151 n=5+5)
C/nodes=3/cpu=32 119k ± 1% 121k ± 2% ~ (p=0.143 n=3+5)
D/nodes=3 35.2k ± 7% 36.3k ± 4% ~ (p=0.690 n=5+5)
D/nodes=3/cpu=32 97.2k ± 0% 94.9k ± 2% -2.33% (p=0.036 n=3+5)
E/nodes=3 777 ± 5% 795 ± 5% ~ (p=0.310 n=5+5)
E/nodes=3/cpu=32 1.06k ± 6% 1.15k ± 2% +7.95% (p=0.036 n=3+5)
B/nodes=3 32.1k ± 6% 31.9k ± 2% ~ (p=0.421 n=5+5)
B/nodes=3/cpu=32 93.6k ± 1% 92.0k ± 3% ~ (p=0.250 n=3+5)
name old p50 new p50 delta
A/nodes=3 3.80 ± 0% 3.80 ± 3% ~ (p=1.000 n=4+5)
A/nodes=3/cpu=32 2.53 ± 3% 2.56 ± 2% ~ (p=1.000 n=3+5)
F/nodes=3 8.34 ± 9% 8.12 ±10% ~ (p=0.762 n=5+5)
F/nodes=3/cpu=32 5.87 ± 2% 6.02 ± 5% ~ (p=0.464 n=3+5)
C/nodes=3 2.90 ± 0% 2.84 ± 2% ~ (p=0.095 n=4+5)
C/nodes=3/cpu=32 1.20 ± 0% 1.20 ± 0% ~ (all equal)
D/nodes=3 2.10 ± 5% 2.08 ± 6% ~ (p=1.000 n=5+5)
D/nodes=3/cpu=32 1.00 ± 0% 1.10 ± 0% ~ (p=0.057 n=3+4)
E/nodes=3 60.4 ±13% 61.6 ± 5% ~ (p=1.000 n=5+5)
E/nodes=3/cpu=32 23.4 ± 3% 27.9 ± 9% +19.06% (p=0.036 n=3+5)
B/nodes=3 3.02 ± 4% 3.00 ± 3% ~ (p=1.000 n=5+5)
B/nodes=3/cpu=32 1.23 ± 5% 1.30 ± 0% ~ (p=0.214 n=3+5)
name old p95 new p95 delta
A/nodes=3 15.7 ± 7% 15.9 ± 2% ~ (p=0.397 n=5+5)
A/nodes=3/cpu=32 9.07 ± 4% 9.10 ± 3% ~ (p=1.429 n=3+5)
F/nodes=3 37.1 ±13% 34.4 ± 6% ~ (p=0.397 n=5+5)
F/nodes=3/cpu=32 19.6 ± 3% 20.1 ± 6% ~ (p=0.643 n=3+5)
C/nodes=3 8.40 ± 0% 8.44 ± 5% ~ (p=1.000 n=4+5)
C/nodes=3/cpu=32 5.00 ± 0% 5.00 ± 0% ~ (all equal)
D/nodes=3 7.68 ± 9% 7.30 ± 0% ~ (p=0.333 n=5+4)
D/nodes=3/cpu=32 4.70 ± 0% 4.82 ± 4% ~ (p=0.714 n=3+5)
E/nodes=3 453 ± 7% 440 ± 7% ~ (p=0.492 n=5+5)
E/nodes=3/cpu=32 582 ± 4% 550 ± 4% -5.38% (p=0.036 n=3+5)
B/nodes=3 14.7 ± 7% 14.7 ± 0% ~ (p=0.556 n=5+4)
B/nodes=3/cpu=32 7.00 ± 3% 7.10 ± 0% ~ (p=0.857 n=3+4)
name old p99 new p99 delta
A/nodes=3 38.0 ±44% 37.5 ±17% ~ (p=0.587 n=5+5)
A/nodes=3/cpu=32 74.1 ± 4% 71.3 ± 6% ~ (p=0.286 n=3+5)
F/nodes=3 84.3 ±44% 106.5 ±42% ~ (p=0.135 n=5+5)
F/nodes=3/cpu=32 148 ±15% 143 ± 9% ~ (p=0.750 n=3+5)
C/nodes=3 13.7 ± 7% 13.1 ± 0% -4.52% (p=0.016 n=5+4)
C/nodes=3/cpu=32 7.80 ± 3% 7.60 ± 0% ~ (p=0.571 n=3+4)
D/nodes=3 13.4 ± 9% 12.6 ± 0% ~ (p=0.333 n=5+4)
D/nodes=3/cpu=32 7.90 ± 0% 8.10 ± 0% ~ (p=0.057 n=3+4)
E/nodes=3 611 ± 7% 604 ± 0% ~ (p=0.968 n=5+4)
E/nodes=3/cpu=32 705 ± 0% 671 ± 0% ~ (p=0.357 n=3+5)
B/nodes=3 29.2 ±22% 30.2 ±10% ~ (p=0.421 n=5+5)
B/nodes=3/cpu=32 14.0 ± 5% 14.0 ± 9% ~ (p=1.000 n=3+5)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for running the YCSB suite. Those results look good.
Reviewed 8 of 8 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):
or are the calls from replicaStateMachine.ApplySideEffects single-threaded for a range (due to raftScheduler?)
Right, they're all single-threaded.
But no, I think I was just confused. It makes sense that we'll still want some authoritative hook that keeps the persistent and in-memory state in the lockTable in sync.
pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):
Previously, sumeerbhola wrote…
IIUC, the original motivation for the
finalizedTxnCache
in 67c6bdb#diff-b2445fe50a89171ab341cf76c645ef668d8dcccd14143b7c96d07d8b52047941 was transactions with abandoned intents. So yes, it seems it will be relevant even in the future for that case.For non-abandoned intents, yes I was imagining some node level data-structure that would keep track of STAGED transactions that had committed from the perspective of the user. I haven't given any thought on where exactly that would be placed.
Ack.
pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):
Previously, sumeerbhola wrote…
replacement of what? Do you have a pointer to an earlier discussion that I can read?
Replacement of one intent with another — instead of performing intent resolution and intent creation in two separate Raft operations. I don't think we've ever really discussed the idea concretely in terms of how it would fit into the lock table plans.
But upon further consideration, it may also be much less important now that you've structured things so that we can store multiple locks on the same key (as long as only one is active). Once inactive locks no longer block new lock acquisition, it is probably easier to eliminate/amortize the Raft operations used for intent resolution by batching resolution aggressively.
pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
Previously, sumeerbhola wrote…
I've cleaned this up to do the GC.
Ah, I see. This might be a bit easier to understand in the context of other code if you restructure it slightly to consult the return value of lockIsFree
. Something like:
gc := l.lockIsFree()
if gc {
// Empty lock.
return false, true
}
lockHolderTxn = nil
// There is now a reservation holder, which may be the caller itself,
// so fall through to the processing below.
pkg/kv/kvserver/concurrency/lock_table.go, line 518 at r2 (raw file):
j := 0 for i := range g.toResolve { if heldByTxn := g.lt.updateLockInternal(&g.toResolve[i]); heldByTxn {
Could you give this a comment? Something about how we're attempting to update the lock table and filtering toResolve
by which updates were actually needed.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):
Previously, sumeerbhola wrote…
Done
s/*[]roachpb.LockUpdate/[]roachpb.LockUpdate/
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):
// evaluation. toResolve := guard.ResolveBeforeScanning() return w.resolveDeferredIntents(ctx, &toResolve)
One thing that's not totally clear to me is what happens if this operation is canceled and how that impacts other waiters. Will the waiter who was assigned to resolve the deferred intents hand the responsibility to the next in line when it exits the lock table?
pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):
Previously, sumeerbhola wrote…
My understanding was that since these are 3 requests, when the first one evaluates and finds the interleaved intent it will come back and do
AddDiscoveredLock
and start waiting.
Is that correct?
I see. We're both correct, we're just saying different things. A single request will accumulate multiple intents but intents won't be accumulated across multiple requests in the same batch. So in the case of req1
, we will not discover the write intents in one shot.
pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 187 at r2 (raw file):
global: num=5 lock: "a" holder: txn: 00000000-0000-0000-0000-000000000002, ts: 0.000000010,1, info: repl epoch: 0, seqs: [0]
Do you think it would make things any clearer to indicate here which locks are known to be finalized?
a9df0cd
to
f137769
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Replacement of one intent with another — instead of performing intent resolution and intent creation in two separate Raft operations. I don't think we've ever really discussed the idea concretely in terms of how it would fit into the lock table plans.
But upon further consideration, it may also be much less important now that you've structured things so that we can store multiple locks on the same key (as long as only one is active). Once inactive locks no longer block new lock acquisition, it is probably easier to eliminate/amortize the Raft operations used for intent resolution by batching resolution aggressively.
Got it -- thanks for the clarification.
pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Ah, I see. This might be a bit easier to understand in the context of other code if you restructure it slightly to consult the return value of
lockIsFree
. Something like:gc := l.lockIsFree() if gc { // Empty lock. return false, true } lockHolderTxn = nil // There is now a reservation holder, which may be the caller itself, // so fall through to the processing below.
Done
pkg/kv/kvserver/concurrency/lock_table.go, line 518 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you give this a comment? Something about how we're attempting to update the lock table and filtering
toResolve
by which updates were actually needed.
Done
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/*[]roachpb.LockUpdate/[]roachpb.LockUpdate/
Done
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
One thing that's not totally clear to me is what happens if this operation is canceled and how that impacts other waiters. Will the waiter who was assigned to resolve the deferred intents hand the responsibility to the next in line when it exits the lock table?
No it doesn't hand off. The assignment to do the resolution happens when transitioning to doneWaiting
so this bad situation will only occur when the transition to doneWaiting
races with the cancellation. I've added a longer comment here and referred to the comment in tryActiveWait
. Doing handoff would cleanly solve these races -- we really shouldn't remove these locks from the data-strucure before doing intent resolution.
I was somewhat worried about bugs caused by a dropped hand-off. But since the requester has to call lockTable.Dequeue()
in a timely manner we could use that to release the "claim" on doing intent resolution. I would do that in a later PR. Thoughts?
pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I see. We're both correct, we're just saying different things. A single request will accumulate multiple intents but intents won't be accumulated across multiple requests in the same batch. So in the case of
req1
, we will not discover the write intents in one shot.
Ack. I've updated the comment.
pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 187 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think it would make things any clearer to indicate here which locks are known to be finalized?
Good point. Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):
I was somewhat worried about bugs caused by a dropped hand-off. But since the requester has to call lockTable.Dequeue() in a timely manner we could use that to release the "claim" on doing intent resolution. I would do that in a later PR.
This sounds very reasonable to me. I had forgotten that you already discussed this in the comment/TODO on lockTableImpl.tryActiveWait
.
5480df2
to
044ddfa
Compare
This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs cockroachdb#41720 Release note: None
044ddfa
to
9146295
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table.go, line 564 at r5 (raw file):
// The first case above (breaking reservations) only occurs for transactional // requests, but the other cases can happen for both transactional and // non-transactional requests.
This comment was out of date, so fixed this.
pkg/kv/kvserver/concurrency/lock_table.go, line 1321 at r5 (raw file):
if replicatedLockFinalizedTxn != nil && l.queuedWriters.Front().Value.(*queuedGuard) == qg { // First waiter, so should not wait. NB: this inactive waiter can be // non-transactional.
added some comments here and below.
pkg/kv/kvserver/concurrency/lock_table.go, line 1917 at r5 (raw file):
e = e.Next() l.queuedWriters.Remove(curr) if qg.active {
This was a bug.
It somewhat existed even prior to this PR since discovered locks could cause a non-transactional request to be an inactive waiter, but this PR made it worse (it was masked earlier since the request would scan again). The non-transactional inactive waiter would have mustFindNextLockAfter
set to true even though it had gone through all the locks and accumulated locks to resolve. That would cause the call to lockTableGuardImpl.CurState
to resume searching, which would cause a panic in findNextLockAfter
when doing g.spans.GetSpans(g.sa, g.ss)
since the indices were out-of-bounds.
There is now a test that exercises this.
pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 739 at r5 (raw file):
# waiters. # -----------------------------------------------------------------------------
This tickles the bug.
pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 786 at r5 (raw file):
# ----------------------------------------------------------------------------- # req12 is a read request that finds a lock from a finalized txn # when scanning.
This is just thrown in for good measure -- wasn't a bug. Wanted to be sure, despite code inspection and documented invariants, that reads were never inactive waiters.
bors r+ |
Build succeeded: |
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.
In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
by checking the lock table after request evaluation, as
outlined in kv: Scans with limit scan too much of lockTable #49973.
use is changed in this PR. The code in the lock table also does
removal of intents before resolution, but there is a TODO to
fix that in the future. It should be easier to do this with the
behavior contained in the lock table.
The immediate benefits, which may not have any practical
significance, are:
removed.
only when the requester has finished a scan and is in a
position to do resolution. Earlier one could remove the lock
but block on another lock, and not do intent resolution on
the first lock. This would cause wasteful evaluation of other
requests.
Informs #41720
Release note: None