concurrency: always listen for lock state transitions when pushing #112768

arulajmani · 2023-10-20T17:44:29Z

Prior to this patch, the lockTableWaiter would only listen for lock state transitions if it was pushing a transaction while waiting for an unheld lock. The reasoning was that if the lock was held, the pusher would not be able to proceed until the push returned. This isn't quite true -- there's a few cases where the request may no longer conflict with what's being tracked in the lock table:

The lock may have been rolled back because of savepoints.
The lock may have been forgotten by the lock table (replicated locks are forgotten when they're updated).

This patch changes the lockTableWaiter to also listen for state transitions when pushing the a held lock's transaction. Cases where the pusher no longer conflicts with the lock state are detected and the push is cancelled.

Conveniently, the updates to resolve_pushed_intents show the effect of making this change.

Fixes #111596

Release note: None

cockroach-teamcity · 2023-10-20T17:44:38Z

This change is

nvanbenschoten

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

-- commits line 2 at r1:
"listen"

-- commits line 13 at r1:
The most interesting case here is when the pusher (shared lock) doesn't directly conflict with the lock holder (shared lock), but it conflicts with a waiter (exclusive lock) in front of it in the queue, so it pushes the lock holder for deadlock detection purposes. In such cases, that waiter dropping out the queue should permit the pusher to proceed.

-- commits line 16 at r1:
"the a"

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 368 at r1 (raw file):

				// wait-queue without leaving behind a lock. In this case, the request
				// has a dependency on the conflicting request but not necessarily the
				// entire conflicting transaction.

I think retaining some mention of "dependency" in the new comments would be good, as this is one way to look at the situation.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 377 at r1 (raw file):

				// such cases, the lock table will simply forget the lock when the
				// intent is resolved. Note that in such cases, the pusher may still
				// conflict with the intent and rediscover it -- that's okay.

It would be worth including the additional case I mentioned above.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 378 at r1 (raw file):

				// intent is resolved. Note that in such cases, the pusher may still
				// conflict with the intent and rediscover it -- that's okay.
				if timerWaitingState.held {

nit: should we share the error handling logic to avoid duplication? Something like:

var err error
if timerWaitingState.held {
    err = w.pushLockTxn(pushCtx, req, timerWaitingState)
} else {
    err = w.pushRequestTxn(pushCtx, req, timerWaitingState)    
}
// Ignore the context canceled error. If this was for the
// parent context then we'll notice on the next select.
//
// NOTE: we look at pushCtx.Err() and not err to avoid the
// potential for bugs if context cancellation is not
// propagated correctly on some error paths.
if errors.Is(pushCtx.Err(), context.Canceled) {
    err = nil
}
return err

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r1 (raw file):

				// conflict with the intent and rediscover it -- that's okay.
				if timerWaitingState.held {
					err = w.pushLockTxn(pushCtx, req, timerWaitingState)

This err is referencing the function's named return variable. The fact that this is in scope is a holdover from 67c6bdb and feels moderately risky. Is there any risk to this aliasing? I think it would be better to un-name this return variable and declare locally scoped err variables where needed.

pkg/kv/kvserver/concurrency/testdata/concurrency_manager/wait_elsewhere line 98 at r1 (raw file):

----

# Abort the writing txn. This will cause the blocked request to unblock. Note

Looks like this comment needs an update.

arulajmani

TFTR, RFAL!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

-- commits line 13 at r1:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The most interesting case here is when the pusher (shared lock) doesn't directly conflict with the lock holder (shared lock), but it conflicts with a waiter (exclusive lock) in front of it in the queue, so it pushes the lock holder for deadlock detection purposes. In such cases, that waiter dropping out the queue should permit the pusher to proceed.

Added it here and to the comment below.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 368 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think retaining some mention of "dependency" in the new comments would be good, as this is one way to look at the situation.

Good point. Reworded to make this clearer.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 378 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should we share the error handling logic to avoid duplication? Something like:

var err error
if timerWaitingState.held {
    err = w.pushLockTxn(pushCtx, req, timerWaitingState)
} else {
    err = w.pushRequestTxn(pushCtx, req, timerWaitingState)    
}
// Ignore the context canceled error. If this was for the
// parent context then we'll notice on the next select.
//
// NOTE: we look at pushCtx.Err() and not err to avoid the
// potential for bugs if context cancellation is not
// propagated correctly on some error paths.
if errors.Is(pushCtx.Err(), context.Canceled) {
    err = nil
}
return err

Done.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This err is referencing the function's named return variable. The fact that this is in scope is a holdover from 67c6bdb and feels moderately risky. Is there any risk to this aliasing? I think it would be better to un-name this return variable and declare locally scoped err variables where needed.

Done.

nvanbenschoten

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r2 (raw file):

					// Concretely, a construction like follows:
					//   - holder: shared
					//	  - wait-queue: exclusive, shared

nit: odd indentation here.

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 380 at r2 (raw file):

					//   - holder: shared
					//	  - wait-queue: exclusive, shared
					// In this case, the waiting shared lock request will push the holder.

Let's add one sentence about why the shared lock request is pushing the shared lock holder instead of the exclusive lock request.

arulajmani

TFTR!

bors r=nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 380 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's add one sentence about why the shared lock request is pushing the shared lock holder instead of the exclusive lock request.

Done.

craig · 2023-10-30T23:29:28Z

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

arulajmani · 2023-10-31T13:50:06Z

bors r-

arulajmani · 2023-10-31T13:50:44Z

bors r=nvanbenschoten

craig · 2023-10-31T13:50:48Z

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

Prior to this patch, the lockTableWaiter would only listen for lock state transitions if it was pushing a transaction while waiting for an unheld lock. The reasoning was that if the lock was held, the pusher would not be able to proceed until the push returned. This isn't quite true -- there's a few cases where the request may no longer conflict with what's being tracked in the lock table: - A waiting request may not conflict with the lock holder, but a waiting request instead. In such cases, it'll push the lock holder even though it is compatible with it. However, if the conflicting waiting request drops out of the lock's wait queues, the pusher may proceed. - The lock may have been rolled back because of savepoints. - The lock may have been forgotten by the lock table (replicated locks are forgotten when they're updated). This patch changes the lockTableWaiter to also listen for state transitions when pushing a held lock's transaction. Cases where the pusher no longer conflicts with the lock state are detected and the push is cancelled. Conveniently, the updates to `resolve_pushed_intents` show the effect of making this change. Fixes cockroachdb#111596 Release note: None

arulajmani · 2023-10-31T14:23:10Z

bors r-

arulajmani · 2023-10-31T14:23:16Z

bors r=nvanbenschoten

craig · 2023-10-31T15:07:57Z

Build succeeded:

Bazel Essential CI (Cockroach)

arulajmani · 2023-10-31T19:26:49Z

blathers backport 23.2

arulajmani requested a review from nvanbenschoten October 20, 2023 17:44

arulajmani requested a review from a team as a code owner October 20, 2023 17:44

arulajmani changed the title ~~concurrency: always list for lock state transitions when pushing~~ concurrency: always listen for lock state transitions when pushing Oct 23, 2023

nvanbenschoten reviewed Oct 23, 2023

View reviewed changes

arulajmani force-pushed the sl-111596 branch from eb33878 to 9629393 Compare October 30, 2023 18:54

arulajmani commented Oct 30, 2023

View reviewed changes

arulajmani requested a review from nvanbenschoten October 30, 2023 19:59

nvanbenschoten approved these changes Oct 30, 2023

View reviewed changes

arulajmani force-pushed the sl-111596 branch from 9629393 to a71a1a3 Compare October 30, 2023 23:29

arulajmani commented Oct 30, 2023

View reviewed changes

arulajmani force-pushed the sl-111596 branch 2 times, most recently from 95e69bb to 813e78a Compare October 31, 2023 13:49

arulajmani force-pushed the sl-111596 branch from 813e78a to 7eecfdc Compare October 31, 2023 13:50

arulajmani force-pushed the sl-111596 branch from 7eecfdc to 51a3ce4 Compare October 31, 2023 14:21

craig bot merged commit 8b34755 into cockroachdb:master Oct 31, 2023
3 checks passed

blathers-crl bot mentioned this pull request Oct 31, 2023

release-23.2: concurrency: always listen for lock state transitions when pushing #113528

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency: always listen for lock state transitions when pushing #112768

concurrency: always listen for lock state transitions when pushing #112768

arulajmani commented Oct 20, 2023 •

edited

Loading

cockroach-teamcity commented Oct 20, 2023

nvanbenschoten left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

craig bot commented Oct 30, 2023

arulajmani commented Oct 31, 2023

arulajmani commented Oct 31, 2023

craig bot commented Oct 31, 2023

arulajmani commented Oct 31, 2023

arulajmani commented Oct 31, 2023

craig bot commented Oct 31, 2023

arulajmani commented Oct 31, 2023

concurrency: always listen for lock state transitions when pushing #112768

concurrency: always listen for lock state transitions when pushing #112768

Conversation

arulajmani commented Oct 20, 2023 • edited Loading

cockroach-teamcity commented Oct 20, 2023

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

craig bot commented Oct 30, 2023

arulajmani commented Oct 31, 2023

arulajmani commented Oct 31, 2023

craig bot commented Oct 31, 2023

arulajmani commented Oct 31, 2023

arulajmani commented Oct 31, 2023

craig bot commented Oct 31, 2023

arulajmani commented Oct 31, 2023

arulajmani commented Oct 20, 2023 •

edited

Loading