Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: use lock modes to detect conflicts in SKIP LOCKED #108918

Merged
merged 2 commits into from
Aug 19, 2023

Conversation

arulajmani
Copy link
Collaborator

This patch switches isKeyLockedByConflictingTxn to make use of lock modes to detect conflicts. As a result, SKIP LOCKED requests correctly interact with shared locks. We add a few tests to demonstrate this.

Fixes #108715

Release note: None

@arulajmani arulajmani requested review from a team as code owners August 17, 2023 16:51
@arulajmani arulajmani requested a review from RaduBerinde August 17, 2023 16:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @RaduBerinde)


pkg/kv/kvserver/concurrency/concurrency_control.go line 777 at r1 (raw file):

	// IsKeyLockedByConflictingTxn returns whether the specified key is locked by
	// a conflicting transaction in the lock TableGuard's snapshot of the lock

"lockTableGuard"


pkg/kv/kvserver/concurrency/concurrency_control.go line 782 at r1 (raw file):

	// itself, there's no conflict to speak of, so false is returned.
	//
	// This method is use by requests in conjunction with the SkipLocked wait

"used"


pkg/kv/kvserver/concurrency/concurrency_control.go line 790 at r1 (raw file):

	// from starving out regular locking requests. In such cases, true is
	// returned, but so is nil.
	IsKeyLockedByConflictingTxn(roachpb.Key, lock.Strength) (bool, *enginepb.TxnMeta)

Same two comments in mvcc.go and concurrency_manager.go (on IsKeyLockedByConflictingTxn).


pkg/kv/kvserver/concurrency/lock_table.go line 680 at r1 (raw file):

func (g *lockTableGuardImpl) IsKeyLockedByConflictingTxn(
	key roachpb.Key, strength lock.Strength,

nit while we're here: s/strength/str/


pkg/kv/kvserver/concurrency/lock_table.go line 707 at r1 (raw file):

			lock.Conflicts(l.getLockMode(), makeLockMode(strength, g.txn, g.ts), &g.lt.settings.SV) {
			return true, l.holder.txn // they key is locked by some other transaction; return the holder
		}

Consider leaving a comment after this line explaining the two states we could be in. Either the lock is held at only strengths that are non-conflicting with the provided strength (the "compatible lock" case), or it is held by only the same transaction at a lower, conflicting strength (the "lock upgrade" case).


pkg/kv/kvserver/concurrency/lock_table.go line 711 at r1 (raw file):

	// There's no conflict with the lock holder itself. However, there may be
	// other locking requests that came before us that we may conflict with. This

"This ensures"

What exactly is this? "Treating these as conflicts ensures"?


pkg/kv/kvserver/concurrency/lock_table.go line 715 at r1 (raw file):

	// from starving out regular locking requests.
	if strength == lock.None { // [1] we only need to do this checking for locking requests
		// TODO(arul): Is there a hazard for a stream of non-locking requests to

Latches will eventually ensure fairness, right? As in, eventually the writer will acquire a latch, get bumped by the ts cache, but then perform its write. That's not the case for locking SKIP LOCKED, where the writer would fail the write and need to retry (potentially indefinitely).


pkg/kv/kvserver/concurrency/lock_table.go line 725 at r1 (raw file):

		if qqg.guard.seqNum > g.seqNum {
			// We only need to check for conflicts with requests that came before us
			// (read: have lower sequence numbers than us).

Consider reminding the reader that l.queuedWriters is stored in increasing sequence number order.


pkg/kv/kvserver/concurrency/lock_table.go line 728 at r1 (raw file):

			break
		}
		if qqg.guard.txn.ID == g.txn.ID {

isSameTxn?


pkg/kv/kvserver/concurrency/lock_table.go line 729 at r1 (raw file):

		}
		if qqg.guard.txn.ID == g.txn.ID {
			panic(errors.AssertionFailedf(

I feel like we don't want to panic here. Can we consider this to be a non-conflict? I see now where your question came from in Slack.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)


pkg/kv/kvserver/concurrency/lock_table.go line 715 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Latches will eventually ensure fairness, right? As in, eventually the writer will acquire a latch, get bumped by the ts cache, but then perform its write. That's not the case for locking SKIP LOCKED, where the writer would fail the write and need to retry (potentially indefinitely).

Yeah, that's true. I'll remove this TODO and update the comment above using a bit of your explanation here.


pkg/kv/kvserver/concurrency/lock_table.go line 728 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

isSameTxn?

Note to self: there's a txnMeta() function on the lockTableGuardImpl that we can use here.


pkg/kv/kvserver/concurrency/lock_table.go line 729 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I feel like we don't want to panic here. Can we consider this to be a non-conflict? I see now where your question came from in Slack.

Yeah, I was initially leaning towards the panic approach because it restricts the testing space we need to account for here.

I'll change this to be non-conflicting and add tests for it.

This patch switches isKeyLockedByConflictingTxn to make use of lock
modes to detect conflicts. As a result, SKIP LOCKED requests correctly
interact with shared locks. We add a few tests to demonstrate this.

Fixes cockroachdb#108715

Release note: None
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)


pkg/kv/kvserver/concurrency/lock_table.go line 707 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider leaving a comment after this line explaining the two states we could be in. Either the lock is held at only strengths that are non-conflicting with the provided strength (the "compatible lock" case), or it is held by only the same transaction at a lower, conflicting strength (the "lock upgrade" case).

Added some words. I made the comment a bit more general (to reference multiple locks on a single key), given that change is coming shortly.


pkg/kv/kvserver/concurrency/lock_table.go line 711 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"This ensures"

What exactly is this? "Treating these as conflicts ensures"?

Clarified!


pkg/kv/kvserver/concurrency/lock_table.go line 715 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Yeah, that's true. I'll remove this TODO and update the comment above using a bit of your explanation here.

Actually, nvm on writing an explanation here. It seems a bit tangential, and I don't want to distract future readers from the meaty commentary that we already have.


pkg/kv/kvserver/concurrency/lock_table.go line 729 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Yeah, I was initially leaning towards the panic approach because it restricts the testing space we need to account for here.

I'll change this to be non-conflicting and add tests for it.

Like we spoke about offline, I switched this to bubble up the error instead of panicking. For posterity, we don't expect this to case to happen during normal operation (in how SKIP LOCKED is used by SQL). However, we may hit this branch during a replay, so panicking here is no good.

We could treat this as a non-conflict, however that expands the testing surface area a fair bit for no good reason. We instead choose to return the error back up to the caller.

Avoid a panic that would have previously crashed the node.

We don't expect locking skip locked requests to encounter waiting
requests from their own transactions during normal operation. However,
this could happen if they're being replayed for some reason. We
shouldn't crash the node in such cases.

Epic: none

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)

@arulajmani
Copy link
Collaborator Author

TFTR!

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 19, 2023

Build succeeded:

@craig craig bot merged commit a4a7758 into cockroachdb:master Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concurrency: integrate lock modes with SKIP LOCKED
3 participants