Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: always listen for lock state transitions when pushing #112768

Merged
merged 1 commit into from
Oct 31, 2023

Conversation

arulajmani
Copy link
Collaborator

@arulajmani arulajmani commented Oct 20, 2023

Prior to this patch, the lockTableWaiter would only listen for lock state transitions if it was pushing a transaction while waiting for an unheld lock. The reasoning was that if the lock was held, the pusher would not be able to proceed until the push returned. This isn't quite true -- there's a few cases where the request may no longer conflict with what's being tracked in the lock table:

  • The lock may have been rolled back because of savepoints.
  • The lock may have been forgotten by the lock table (replicated locks are forgotten when they're updated).

This patch changes the lockTableWaiter to also listen for state transitions when pushing the a held lock's transaction. Cases where the pusher no longer conflicts with the lock state are detected and the push is cancelled.

Conveniently, the updates to resolve_pushed_intents show the effect of making this change.

Fixes #111596

Release note: None

@arulajmani arulajmani requested a review from a team as a code owner October 20, 2023 17:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@arulajmani arulajmani changed the title concurrency: always list for lock state transitions when pushing concurrency: always listen for lock state transitions when pushing Oct 23, 2023
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


-- commits line 2 at r1:
"listen"


-- commits line 13 at r1:
The most interesting case here is when the pusher (shared lock) doesn't directly conflict with the lock holder (shared lock), but it conflicts with a waiter (exclusive lock) in front of it in the queue, so it pushes the lock holder for deadlock detection purposes. In such cases, that waiter dropping out the queue should permit the pusher to proceed.


-- commits line 16 at r1:
"the a"


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 368 at r1 (raw file):

				// wait-queue without leaving behind a lock. In this case, the request
				// has a dependency on the conflicting request but not necessarily the
				// entire conflicting transaction.

I think retaining some mention of "dependency" in the new comments would be good, as this is one way to look at the situation.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 377 at r1 (raw file):

				// such cases, the lock table will simply forget the lock when the
				// intent is resolved. Note that in such cases, the pusher may still
				// conflict with the intent and rediscover it -- that's okay.

It would be worth including the additional case I mentioned above.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 378 at r1 (raw file):

				// intent is resolved. Note that in such cases, the pusher may still
				// conflict with the intent and rediscover it -- that's okay.
				if timerWaitingState.held {

nit: should we share the error handling logic to avoid duplication? Something like:

var err error
if timerWaitingState.held {
    err = w.pushLockTxn(pushCtx, req, timerWaitingState)
} else {
    err = w.pushRequestTxn(pushCtx, req, timerWaitingState)    
}
// Ignore the context canceled error. If this was for the
// parent context then we'll notice on the next select.
//
// NOTE: we look at pushCtx.Err() and not err to avoid the
// potential for bugs if context cancellation is not
// propagated correctly on some error paths.
if errors.Is(pushCtx.Err(), context.Canceled) {
    err = nil
}
return err

pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r1 (raw file):

				// conflict with the intent and rediscover it -- that's okay.
				if timerWaitingState.held {
					err = w.pushLockTxn(pushCtx, req, timerWaitingState)

This err is referencing the function's named return variable. The fact that this is in scope is a holdover from 67c6bdb and feels moderately risky. Is there any risk to this aliasing? I think it would be better to un-name this return variable and declare locally scoped err variables where needed.


pkg/kv/kvserver/concurrency/testdata/concurrency_manager/wait_elsewhere line 98 at r1 (raw file):

----

# Abort the writing txn. This will cause the blocked request to unblock. Note

Looks like this comment needs an update.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR, RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


-- commits line 13 at r1:

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The most interesting case here is when the pusher (shared lock) doesn't directly conflict with the lock holder (shared lock), but it conflicts with a waiter (exclusive lock) in front of it in the queue, so it pushes the lock holder for deadlock detection purposes. In such cases, that waiter dropping out the queue should permit the pusher to proceed.

Added it here and to the comment below.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 368 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think retaining some mention of "dependency" in the new comments would be good, as this is one way to look at the situation.

Good point. Reworded to make this clearer.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 378 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should we share the error handling logic to avoid duplication? Something like:

var err error
if timerWaitingState.held {
    err = w.pushLockTxn(pushCtx, req, timerWaitingState)
} else {
    err = w.pushRequestTxn(pushCtx, req, timerWaitingState)    
}
// Ignore the context canceled error. If this was for the
// parent context then we'll notice on the next select.
//
// NOTE: we look at pushCtx.Err() and not err to avoid the
// potential for bugs if context cancellation is not
// propagated correctly on some error paths.
if errors.Is(pushCtx.Err(), context.Canceled) {
    err = nil
}
return err

Done.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This err is referencing the function's named return variable. The fact that this is in scope is a holdover from 67c6bdb and feels moderately risky. Is there any risk to this aliasing? I think it would be better to un-name this return variable and declare locally scoped err variables where needed.

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 379 at r2 (raw file):

					// Concretely, a construction like follows:
					//   - holder: shared
					//	  - wait-queue: exclusive, shared

nit: odd indentation here.


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 380 at r2 (raw file):

					//   - holder: shared
					//	  - wait-queue: exclusive, shared
					// In this case, the waiting shared lock request will push the holder.

Let's add one sentence about why the shared lock request is pushing the shared lock holder instead of the exclusive lock request.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table_waiter.go line 380 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's add one sentence about why the shared lock request is pushing the shared lock holder instead of the exclusive lock request.

Done.

@craig
Copy link
Contributor

craig bot commented Oct 30, 2023

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@arulajmani arulajmani force-pushed the sl-111596 branch 2 times, most recently from 95e69bb to 813e78a Compare October 31, 2023 13:49
@arulajmani
Copy link
Collaborator Author

bors r-

@arulajmani
Copy link
Collaborator Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Oct 31, 2023

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

Prior to this patch, the lockTableWaiter would only listen for lock
state transitions if it was pushing a transaction while waiting for
an unheld lock. The reasoning was that if the lock was held, the pusher
would not be able to proceed until the push returned. This isn't quite
true -- there's a few cases where the request may no longer conflict
with what's being tracked in the lock table:

- A waiting request may not conflict with the lock holder, but a waiting
request instead. In such cases, it'll push the lock holder even though
it is compatible with it. However, if the conflicting waiting request
drops out of the lock's wait queues, the pusher may proceed.
- The lock may have been rolled back because of savepoints.
- The lock may have been forgotten by the lock table (replicated locks
are forgotten when they're updated).

This patch changes the lockTableWaiter to also listen for state
transitions when pushing a held lock's transaction. Cases where the
pusher no longer conflicts with the lock state are detected and the push
is cancelled.

Conveniently, the updates to `resolve_pushed_intents` show the effect
of making this change.

Fixes cockroachdb#111596

Release note: None
@arulajmani
Copy link
Collaborator Author

bors r-

@arulajmani
Copy link
Collaborator Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Oct 31, 2023

Build succeeded:

@craig craig bot merged commit 8b34755 into cockroachdb:master Oct 31, 2023
3 checks passed
@arulajmani
Copy link
Collaborator Author

blathers backport 23.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: requests waiting on locks should listen to local state transitions when pushing
3 participants