Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: move finalizedTxnCache into lock table #57947

Merged
merged 1 commit into from
Jan 6, 2021

Conversation

sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Dec 15, 2020

This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:

  • for optimizing limited scans: a later PR will fix this properly
    by checking the lock table after request evaluation, as
    outlined in kv: Scans with limit scan too much of lockTable #49973.
  • using the finalizedTxnCache in the lockTableWaiterImpl: this
    use is changed in this PR. The code in the lock table also does
    removal of intents before resolution, but there is a TODO to
    fix that in the future. It should be easier to do this with the
    behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:

  • We no longer resolve unreplicated locks -- they are simply
    removed.
  • A replicated lock is removed from the lock table data-structure
    only when the requester has finished a scan and is in a
    position to do resolution. Earlier one could remove the lock
    but block on another lock, and not do intent resolution on
    the first lock. This would cause wasteful evaluation of other
    requests.

Informs #41720

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):

	// For COMMITTED or ABORTED transactions, all locks are released.
	//
	// For PENDING or STAGING transactions, the behavior is:

Is the eventual plan for UpdateLocks to only deal with updating PENDING/STAGING locks and to never deal with the removal of locks?


pkg/kv/kvserver/concurrency/lock_table.go, line 515 at r1 (raw file):

The first state it will see after releasing latches is doneWaiting, which will cause it to resolve intents.

This is only true if g.mu.startWait had previously been false, right?


pkg/kv/kvserver/concurrency/lock_table.go, line 1100 at r1 (raw file):

//  need to do the resolution.
//
//  Ideally, we would strengthen B and C -- a request should make a claim on

I think it would be worth running the YCSB suite before and after this change, just to verify that we're not triggering one of these races reliably and causing some pathological behavior that dramatically impacts performance. I've found with YCSB that high levels of concurrency + contention can lead to some unexpected behavior.


pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):

//  TODO(sumeer): do this cleaner solution for batched intent resolution.
//
//  In the future we'd like to augment the lockTable with an understanding of

In such a world, would we need the finalizedTxnCache? It seems like it may still serve a purpose if it was shared across Ranges, but I'm not sure that it would if we weren't able to lift it up, because a transaction should (almost always) resolve all of its intents on a single range atomically.

But actually, that may not be true for abandoned intents, where the lack of an inverted txnID -> []intent index prevents us from resolving all of an abandoned transaction's intents when we find one to be stale. In those cases, some amount of memory, even on a per-range level, can still be helpful.


pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):

//  Also, resolving these locks/intents would proceed without latching, so we
//  would not rely on MVCC scanning to add discovered locks to the lock table,
//  since the discovered locks may be stale.

Maybe also mention ideas around coalescing resolution and replacement into the same Raft entry. I don't know if we've ever thought all the way through how such a mechanism would work, but you can imagine that such a mechanism would be very beneficial for contended workloads.


pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):

We don't expect the caller to GC this lockState and instead will GC it in tryUpdateLock.

How is this supposed to work? Who will end up calling tryUpdateLock? Is that what the call to g.lt.UpdateLocks is in findNextLockAfter?

Note that there may be other waiters, so the caller may have to wait behind them.

Is this why we can't just return false here?


pkg/kv/kvserver/concurrency/lock_table.go, line 1237 at r1 (raw file):

			}
			active := true
			if replicatedLockFinalizedTxn != nil && l.queuedWriters.Front().Value.(*queuedGuard) == qg {

nit: this would probably be easier to understand if we dropped this check all the way down to the end of the if sa == spanset.SpanReadWrite { block, so it was common among all cases.


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 97 at r1 (raw file):

	stopper *stop.Stopper
	ir      IntentResolver
	lm      LockManager

I think we can remove this dependency.


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):

				// evaluation.
				toResolve := guard.ResolveBeforeScanning()
				w.resolveDeferredIntents(ctx, &err, &toResolve)

We can get rid of some of the reference funkiness here now that it's not a deferred function. The method's signature should probably end up looking something like:

func (w *lockTableWaiterImpl) resolveDeferredIntents(ctx context.Context, toResolve []roachpb.LockUpdate) *Error

pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):

[1] sequence req1: sequencing complete, returned guard

# Normally req1 will not discover write intents for c, d, e in one shot, but

It won't? I thought we accumulate multiple intents in pebbleMVCCScanner.

@sumeerbhola sumeerbhola force-pushed the lt_txn_cache branch 3 times, most recently from b710ae8 to d1a4d19 Compare December 22, 2020 00:17
@sumeerbhola sumeerbhola changed the title [WIP] concurrency: move finalizedTxnCache into lock table concurrency: move finalizedTxnCache into lock table Dec 22, 2020
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I've addressed the comments, added tests, and fixed existing tests.

I owe the ycsb measurements. Should l run all the roachtests with the ycsb prefix? And should I be running all of them with one roachprod cluster to have the same machines for stable measurements?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is the eventual plan for UpdateLocks to only deal with updating PENDING/STAGING locks and to never deal with the removal of locks?

I wasn't thinking of it changing when UpdateLocks is called in the future. The call from handleReadWriteLocalEvalResult is authoritative, unlike the call to AddDiscoveredLock which requires latches to avoid staleness. Is there a possibility that multiple actions from the same transaction can race, or are the calls from replicaStateMachine.ApplySideEffects single-threaded for a range (due to raftScheduler?) ?

Maybe you are thinking of some idea that I missed?


pkg/kv/kvserver/concurrency/lock_table.go, line 515 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The first state it will see after releasing latches is doneWaiting, which will cause it to resolve intents.

This is only true if g.mu.startWait had previously been false, right?

Correct. I have added a longer comment.


pkg/kv/kvserver/concurrency/lock_table.go, line 1100 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think it would be worth running the YCSB suite before and after this change, just to verify that we're not triggering one of these races reliably and causing some pathological behavior that dramatically impacts performance. I've found with YCSB that high levels of concurrency + contention can lead to some unexpected behavior.

Ack


pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

In such a world, would we need the finalizedTxnCache? It seems like it may still serve a purpose if it was shared across Ranges, but I'm not sure that it would if we weren't able to lift it up, because a transaction should (almost always) resolve all of its intents on a single range atomically.

But actually, that may not be true for abandoned intents, where the lack of an inverted txnID -> []intent index prevents us from resolving all of an abandoned transaction's intents when we find one to be stale. In those cases, some amount of memory, even on a per-range level, can still be helpful.

IIUC, the original motivation for the finalizedTxnCache in 67c6bdb#diff-b2445fe50a89171ab341cf76c645ef668d8dcccd14143b7c96d07d8b52047941 was transactions with abandoned intents. So yes, it seems it will be relevant even in the future for that case.

For non-abandoned intents, yes I was imagining some node level data-structure that would keep track of STAGED transactions that had committed from the perspective of the user. I haven't given any thought on where exactly that would be placed.


pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Maybe also mention ideas around coalescing resolution and replacement into the same Raft entry. I don't know if we've ever thought all the way through how such a mechanism would work, but you can imagine that such a mechanism would be very beneficial for contended workloads.

replacement of what? Do you have a pointer to an earlier discussion that I can read?


pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):
This comment was confusing and I've rewritten it. And a piece of code was missing.

The tryUpdateLock should happen when doing resolution of all locks for the transaction. The code there to do GC of an empty lock was missing (I forgot about it), and I've added it now.

The call to g.lt.UpdateLocks in findNextLockAfter is only for replicated locks, and that will do GC since the calling path allows for that. The behavior here is just a peculiarity of the code -- we are not holding treeMu.mu.

Is this why we can't just return false here?

Correct. Added to the comment.


pkg/kv/kvserver/concurrency/lock_table.go, line 1237 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: this would probably be easier to understand if we dropped this check all the way down to the end of the if sa == spanset.SpanReadWrite { block, so it was common among all cases.

Done.


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 97 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we can remove this dependency.

Done


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can get rid of some of the reference funkiness here now that it's not a deferred function. The method's signature should probably end up looking something like:

func (w *lockTableWaiterImpl) resolveDeferredIntents(ctx context.Context, toResolve []roachpb.LockUpdate) *Error

Done


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It won't? I thought we accumulate multiple intents in pebbleMVCCScanner.

My understanding was that since these are 3 requests, when the first one evaluates and finds the interleaved intent it will come back and do AddDiscoveredLock and start waiting.
Is that correct?

@sumeerbhola sumeerbhola force-pushed the lt_txn_cache branch 3 times, most recently from 79f3bd7 to a9df0cd Compare December 22, 2020 02:43
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):

Previously, sumeerbhola wrote…

This comment was confusing and I've rewritten it. And a piece of code was missing.

The tryUpdateLock should happen when doing resolution of all locks for the transaction. The code there to do GC of an empty lock was missing (I forgot about it), and I've added it now.

The call to g.lt.UpdateLocks in findNextLockAfter is only for replicated locks, and that will do GC since the calling path allows for that. The behavior here is just a peculiarity of the code -- we are not holding treeMu.mu.

Is this why we can't just return false here?

Correct. Added to the comment.

I've cleaned this up to do the GC.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I owe the ycsb measurements. Should l run all the roachtests with the ycsb prefix? And should I be running all of them with one roachprod cluster to have the same machines for stable measurements?

There is a lot of variability.

name              old ops/sec  new ops/sec  delta
A/nodes=3          17.5k ± 5%   17.3k ± 3%     ~     (p=0.421 n=5+5)
A/nodes=3/cpu=32   27.4k ± 2%   28.1k ± 4%     ~     (p=0.393 n=3+5)
F/nodes=3          7.99k ± 8%   8.10k ± 4%     ~     (p=0.841 n=5+5)
F/nodes=3/cpu=32   15.3k ± 5%   15.2k ± 2%     ~     (p=0.786 n=3+5)
C/nodes=3          43.5k ± 5%   44.4k ± 3%     ~     (p=0.151 n=5+5)
C/nodes=3/cpu=32    119k ± 1%    121k ± 2%     ~     (p=0.143 n=3+5)
D/nodes=3          35.2k ± 7%   36.3k ± 4%     ~     (p=0.690 n=5+5)
D/nodes=3/cpu=32   97.2k ± 0%   94.9k ± 2%   -2.33%  (p=0.036 n=3+5)
E/nodes=3            777 ± 5%     795 ± 5%     ~     (p=0.310 n=5+5)
E/nodes=3/cpu=32   1.06k ± 6%   1.15k ± 2%   +7.95%  (p=0.036 n=3+5)
B/nodes=3          32.1k ± 6%   31.9k ± 2%     ~     (p=0.421 n=5+5)
B/nodes=3/cpu=32   93.6k ± 1%   92.0k ± 3%     ~     (p=0.250 n=3+5)

name              old p50      new p50      delta
A/nodes=3           3.80 ± 0%    3.80 ± 3%     ~     (p=1.000 n=4+5)
A/nodes=3/cpu=32    2.53 ± 3%    2.56 ± 2%     ~     (p=1.000 n=3+5)
F/nodes=3           8.34 ± 9%    8.12 ±10%     ~     (p=0.762 n=5+5)
F/nodes=3/cpu=32    5.87 ± 2%    6.02 ± 5%     ~     (p=0.464 n=3+5)
C/nodes=3           2.90 ± 0%    2.84 ± 2%     ~     (p=0.095 n=4+5)
C/nodes=3/cpu=32    1.20 ± 0%    1.20 ± 0%     ~     (all equal)
D/nodes=3           2.10 ± 5%    2.08 ± 6%     ~     (p=1.000 n=5+5)
D/nodes=3/cpu=32    1.00 ± 0%    1.10 ± 0%     ~     (p=0.057 n=3+4)
E/nodes=3           60.4 ±13%    61.6 ± 5%     ~     (p=1.000 n=5+5)
E/nodes=3/cpu=32    23.4 ± 3%    27.9 ± 9%  +19.06%  (p=0.036 n=3+5)
B/nodes=3           3.02 ± 4%    3.00 ± 3%     ~     (p=1.000 n=5+5)
B/nodes=3/cpu=32    1.23 ± 5%    1.30 ± 0%     ~     (p=0.214 n=3+5)

name              old p95      new p95      delta
A/nodes=3           15.7 ± 7%    15.9 ± 2%     ~     (p=0.397 n=5+5)
A/nodes=3/cpu=32    9.07 ± 4%    9.10 ± 3%     ~     (p=1.429 n=3+5)
F/nodes=3           37.1 ±13%    34.4 ± 6%     ~     (p=0.397 n=5+5)
F/nodes=3/cpu=32    19.6 ± 3%    20.1 ± 6%     ~     (p=0.643 n=3+5)
C/nodes=3           8.40 ± 0%    8.44 ± 5%     ~     (p=1.000 n=4+5)
C/nodes=3/cpu=32    5.00 ± 0%    5.00 ± 0%     ~     (all equal)
D/nodes=3           7.68 ± 9%    7.30 ± 0%     ~     (p=0.333 n=5+4)
D/nodes=3/cpu=32    4.70 ± 0%    4.82 ± 4%     ~     (p=0.714 n=3+5)
E/nodes=3            453 ± 7%     440 ± 7%     ~     (p=0.492 n=5+5)
E/nodes=3/cpu=32     582 ± 4%     550 ± 4%   -5.38%  (p=0.036 n=3+5)
B/nodes=3           14.7 ± 7%    14.7 ± 0%     ~     (p=0.556 n=5+4)
B/nodes=3/cpu=32    7.00 ± 3%    7.10 ± 0%     ~     (p=0.857 n=3+4)

name              old p99      new p99      delta
A/nodes=3           38.0 ±44%    37.5 ±17%     ~     (p=0.587 n=5+5)
A/nodes=3/cpu=32    74.1 ± 4%    71.3 ± 6%     ~     (p=0.286 n=3+5)
F/nodes=3           84.3 ±44%   106.5 ±42%     ~     (p=0.135 n=5+5)
F/nodes=3/cpu=32     148 ±15%     143 ± 9%     ~     (p=0.750 n=3+5)
C/nodes=3           13.7 ± 7%    13.1 ± 0%   -4.52%  (p=0.016 n=5+4)
C/nodes=3/cpu=32    7.80 ± 3%    7.60 ± 0%     ~     (p=0.571 n=3+4)
D/nodes=3           13.4 ± 9%    12.6 ± 0%     ~     (p=0.333 n=5+4)
D/nodes=3/cpu=32    7.90 ± 0%    8.10 ± 0%     ~     (p=0.057 n=3+4)
E/nodes=3            611 ± 7%     604 ± 0%     ~     (p=0.968 n=5+4)
E/nodes=3/cpu=32     705 ± 0%     671 ± 0%     ~     (p=0.357 n=3+5)
B/nodes=3           29.2 ±22%    30.2 ±10%     ~     (p=0.421 n=5+5)
B/nodes=3/cpu=32    14.0 ± 5%    14.0 ± 9%     ~     (p=1.000 n=3+5)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for running the YCSB suite. Those results look good.

Reviewed 8 of 8 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/concurrency/concurrency_control.go, line 534 at r1 (raw file):

or are the calls from replicaStateMachine.ApplySideEffects single-threaded for a range (due to raftScheduler?)

Right, they're all single-threaded.

But no, I think I was just confused. It makes sense that we'll still want some authoritative hook that keeps the persistent and in-memory state in the lockTable in sync.


pkg/kv/kvserver/concurrency/lock_table.go, line 1107 at r1 (raw file):

Previously, sumeerbhola wrote…

IIUC, the original motivation for the finalizedTxnCache in 67c6bdb#diff-b2445fe50a89171ab341cf76c645ef668d8dcccd14143b7c96d07d8b52047941 was transactions with abandoned intents. So yes, it seems it will be relevant even in the future for that case.

For non-abandoned intents, yes I was imagining some node level data-structure that would keep track of STAGED transactions that had committed from the perspective of the user. I haven't given any thought on where exactly that would be placed.

Ack.


pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):

Previously, sumeerbhola wrote…

replacement of what? Do you have a pointer to an earlier discussion that I can read?

Replacement of one intent with another — instead of performing intent resolution and intent creation in two separate Raft operations. I don't think we've ever really discussed the idea concretely in terms of how it would fit into the lock table plans.

But upon further consideration, it may also be much less important now that you've structured things so that we can store multiple locks on the same key (as long as only one is active). Once inactive locks no longer block new lock acquisition, it is probably easier to eliminate/amortize the Raft operations used for intent resolution by batching resolution aggressively.


pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):

Previously, sumeerbhola wrote…

I've cleaned this up to do the GC.

Ah, I see. This might be a bit easier to understand in the context of other code if you restructure it slightly to consult the return value of lockIsFree. Something like:

gc := l.lockIsFree()
if gc {
    // Empty lock.
    return false, true
}
lockHolderTxn = nil
// There is now a reservation holder, which may be the caller itself,
// so fall through to the processing below.

pkg/kv/kvserver/concurrency/lock_table.go, line 518 at r2 (raw file):

		j := 0
		for i := range g.toResolve {
			if heldByTxn := g.lt.updateLockInternal(&g.toResolve[i]); heldByTxn {

Could you give this a comment? Something about how we're attempting to update the lock table and filtering toResolve by which updates were actually needed.


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):

Previously, sumeerbhola wrote…

Done

s/*[]roachpb.LockUpdate/[]roachpb.LockUpdate/


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):

				// evaluation.
				toResolve := guard.ResolveBeforeScanning()
				return w.resolveDeferredIntents(ctx, &toResolve)

One thing that's not totally clear to me is what happens if this operation is canceled and how that impacts other waiters. Will the waiter who was assigned to resolve the deferred intents hand the responsibility to the next in line when it exits the lock table?


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):

Previously, sumeerbhola wrote…

My understanding was that since these are 3 requests, when the first one evaluates and finds the interleaved intent it will come back and do AddDiscoveredLock and start waiting.
Is that correct?

I see. We're both correct, we're just saying different things. A single request will accumulate multiple intents but intents won't be accumulated across multiple requests in the same batch. So in the case of req1, we will not discover the write intents in one shot.


pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 187 at r2 (raw file):

global: num=5
 lock: "a"
  holder: txn: 00000000-0000-0000-0000-000000000002, ts: 0.000000010,1, info: repl epoch: 0, seqs: [0]

Do you think it would make things any clearer to indicate here which locks are known to be finalized?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/concurrency/lock_table.go, line 1115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Replacement of one intent with another — instead of performing intent resolution and intent creation in two separate Raft operations. I don't think we've ever really discussed the idea concretely in terms of how it would fit into the lock table plans.

But upon further consideration, it may also be much less important now that you've structured things so that we can store multiple locks on the same key (as long as only one is active). Once inactive locks no longer block new lock acquisition, it is probably easier to eliminate/amortize the Raft operations used for intent resolution by batching resolution aggressively.

Got it -- thanks for the clarification.


pkg/kv/kvserver/concurrency/lock_table.go, line 1142 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Ah, I see. This might be a bit easier to understand in the context of other code if you restructure it slightly to consult the return value of lockIsFree. Something like:

gc := l.lockIsFree()
if gc {
    // Empty lock.
    return false, true
}
lockHolderTxn = nil
// There is now a reservation holder, which may be the caller itself,
// so fall through to the processing below.

Done


pkg/kv/kvserver/concurrency/lock_table.go, line 518 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you give this a comment? Something about how we're attempting to update the lock table and filtering toResolve by which updates were actually needed.

Done


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 270 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/*[]roachpb.LockUpdate/[]roachpb.LockUpdate/

Done


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

One thing that's not totally clear to me is what happens if this operation is canceled and how that impacts other waiters. Will the waiter who was assigned to resolve the deferred intents hand the responsibility to the next in line when it exits the lock table?

No it doesn't hand off. The assignment to do the resolution happens when transitioning to doneWaiting so this bad situation will only occur when the transition to doneWaiting races with the cancellation. I've added a longer comment here and referred to the comment in tryActiveWait. Doing handoff would cleanly solve these races -- we really shouldn't remove these locks from the data-strucure before doing intent resolution.

I was somewhat worried about bugs caused by a dropped hand-off. But since the requester has to call lockTable.Dequeue() in a timely manner we could use that to release the "claim" on doing intent resolution. I would do that in a later PR. Thoughts?


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/clear_abandoned_intents, line 417 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I see. We're both correct, we're just saying different things. A single request will accumulate multiple intents but intents won't be accumulated across multiple requests in the same batch. So in the case of req1, we will not discover the write intents in one shot.

Ack. I've updated the comment.


pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 187 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think it would make things any clearer to indicate here which locks are known to be finalized?

Good point. Added

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still :lgtm:

Reviewed 5 of 5 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 269 at r2 (raw file):

I was somewhat worried about bugs caused by a dropped hand-off. But since the requester has to call lockTable.Dequeue() in a timely manner we could use that to release the "claim" on doing intent resolution. I would do that in a later PR.

This sounds very reasonable to me. I had forgotten that you already discussed this in the comment/TODO on lockTableImpl.tryActiveWait.

@sumeerbhola sumeerbhola force-pushed the lt_txn_cache branch 2 times, most recently from 5480df2 to 044ddfa Compare January 5, 2021 22:23
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs cockroachdb#41720

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/concurrency/lock_table.go, line 564 at r5 (raw file):

// The first case above (breaking reservations) only occurs for transactional
// requests, but the other cases can happen for both transactional and
// non-transactional requests.

This comment was out of date, so fixed this.


pkg/kv/kvserver/concurrency/lock_table.go, line 1321 at r5 (raw file):

		if replicatedLockFinalizedTxn != nil && l.queuedWriters.Front().Value.(*queuedGuard) == qg {
			// First waiter, so should not wait. NB: this inactive waiter can be
			// non-transactional.

added some comments here and below.


pkg/kv/kvserver/concurrency/lock_table.go, line 1917 at r5 (raw file):

			e = e.Next()
			l.queuedWriters.Remove(curr)
			if qg.active {

This was a bug.
It somewhat existed even prior to this PR since discovered locks could cause a non-transactional request to be an inactive waiter, but this PR made it worse (it was masked earlier since the request would scan again). The non-transactional inactive waiter would have mustFindNextLockAfter set to true even though it had gone through all the locks and accumulated locks to resolve. That would cause the call to lockTableGuardImpl.CurState to resume searching, which would cause a panic in findNextLockAfter when doing g.spans.GetSpans(g.sa, g.ss) since the indices were out-of-bounds.

There is now a test that exercises this.


pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 739 at r5 (raw file):

# waiters.
# -----------------------------------------------------------------------------

This tickles the bug.


pkg/kv/kvserver/concurrency/testdata/lock_table/clear_finalized_txn_locks, line 786 at r5 (raw file):

# -----------------------------------------------------------------------------
# req12 is a read request that finds a lock from a finalized txn
# when scanning.

This is just thrown in for good measure -- wasn't a bug. Wanted to be sure, despite code inspection and documented invariants, that reads were never inactive waiters.

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 6, 2021

Build succeeded:

@craig craig bot merged commit 614b0f1 into cockroachdb:master Jan 6, 2021
@sumeerbhola sumeerbhola deleted the lt_txn_cache branch February 11, 2021 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants