concurrency: use of lock.Modes to determine confilcts in tryActiveWait #102973

arulajmani · 2023-05-09T18:58:37Z

This patch refactors tryActiveWait to make use of lock.Modes to determine conflicts. In doing so, it splits out the logic into 3 different methods:

To determine a conflict with a lock holder.
To determine a conflict with a reservation.
To modify datastructures in preparation for active wait.

In doing so, we no longer make the assumption that there will only ever be a single reservation or lock holder on a key. In the future, we'll be able to further extend tryActiveWait to account for multiple locks/reservations on a key.

Informs #102210

Release note: None

blathers-crl · 2023-05-09T18:58:41Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-05-09T18:58:47Z

This change is

arulajmani · 2023-05-09T18:59:10Z

@nvanbenschoten still a bit rough, but the overall structure is there. I'll ping when it's ready for a detailed look.

This patch refactors tryActiveWait to make use of lock.Modes to determine conflicts. In doing so, it splits out the logic into 3 different methods: 1. To determine a conflict with a lock holder. 2. To determine a conflict with a reservation. 3. To modify datastructures in preparation for active wait. In doing so, we no longer make the assumption that there will only ever be a single reservation or lock holder on a key. In the future, we'll be able to further extend tryActiveWait to account for multiple locks/reservations on a key. Informs cockroachdb#102210 Release note: None

arulajmani · 2023-05-11T16:35:48Z

@nvanbenschoten I reworked this a bit after our pair programming session the other day -- I'm quite happy with how things ended up! I still plan on adding a commit here that threads in cluster settings into the lock table, and using them when constructing lock modes, but that shouldn't preclude you from having a look.

nvanbenschoten

Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

-- commits line 2 at r2:
s/confilcts/conflicts/

-- commits line 9 at r2:
Do we need to call out the method to determine a conflict with a waiter? Or were we planning to consider reservations to be a form of waiters? In which case, s/reservation/waiter/.

pkg/kv/kvserver/concurrency/lock_table.go line 1515 at r2 (raw file):

// tryActiveWait decides whether the request, g, should actively wait at this
// key or not. It adjusts the data-structures appropriately if the request
// needs. If the request needs to wait at this key and the supplied notify

"if the request needs"

Consider giving a hint at what this adjustment might look like. You don't need to be precise or exhaustive, but some indication that this function may add g to the lock's wait-queue would be helpful.

pkg/kv/kvserver/concurrency/lock_table.go line 1521 at r2 (raw file):

// with any locks or reservations that may be present at this key. A request
// is only allowed to proceed if it is compatible with all locks and
// reservations.

"all locks and reservations"

And also all other waiters, right?

pkg/kv/kvserver/concurrency/lock_table.go line 1524 at r2 (raw file):

//
// There is one tricky case with tryActiveWait -- when the key is only locked by
// finalized transactions. Such cases are determined by consulting the

"by finalized transactions"

This implies that we always know with confidence whether a transaction is finalized or not. Consider rephrasing to something like "when a key is only locked by transactions that are known to be finalized".

pkg/kv/kvserver/concurrency/lock_table.go line 1528 at r2 (raw file):

// are a few considerations here:
//
// - For unreplicated locks, this method will silently clear the lock. The

Consider generalizing this sentence to work with multiple lock holders. Something like "the method will remove the finalized transaction(s) from the lock, which may leave it empty".

pkg/kv/kvserver/concurrency/lock_table.go line 1569 at r2 (raw file):

// traces of them via a round of replication. This is discussed in more detail
// in #41720. Specifically, see mention of "contention footprint" and
// COMMITTED_BUT_NOT_REMOVABLE. 	Also, resolving these locks/intents would

nit: there's a tab in here. In fact, there's a bit of messed up spacing in these lines.

pkg/kv/kvserver/concurrency/lock_table.go line 1607 at r2 (raw file):

	// after nudging the lock, there's no conflict -- this lock can be gc-ed at
	// the caller.
	if !l.isHeldOrReserved() {

We only hit this on the clearLockHolder path of conflictsWithLockHolder, right? If so, we're not doing any nudging here. It's also not clear that this benefits from being separated from that logic. The split raises questions about whether there are other reasons why we'd find a lock that's not held or reserved.

pkg/kv/kvserver/concurrency/lock_table.go line 1608 at r2 (raw file):

	// the caller.
	if !l.isHeldOrReserved() {
		// TODO(arul): Should a locking request acquire a reservation instead of

It probably should. The current behavior is surprising. This would also allow us to simplify the code, right? We could eliminate the transitionedToFree logic?

pkg/kv/kvserver/concurrency/lock_table.go line 1666 at r2 (raw file):

	var ws waitingState
	// TODO(arul): try and improve the structure of this logic.
	// TODO(arul): What if the request is already in the queue?

Do we need to address this TODO now? Would it help to push the waitQueueMaxLengthExceeded handling into adjustWaitQueues?

pkg/kv/kvserver/concurrency/lock_table.go line 1765 at r2 (raw file):

		waitForState.kind = waitSelf
	} else if l.distinguishedWaiter == nil {
		l.distinguishedWaiter = g

It's surprising to see that constructWaitingState is mutating lockState. Is there a better place for this? For instance, should we be setting the l.distinguishedWaiter in adjustWaitQueues and then just checking whether g is the distinguished waiter here?

pkg/kv/kvserver/concurrency/lock_table.go line 1870 at r2 (raw file):

While doing so, it should mark all active waiters it doesn't conflict with as inactive (and nudge them to no longer wait at this lock).

Why should it do this? Is that because we know we'd only reach this point if we don't conflict with the lock holder or any reservations? Does that necessarily mean that waiters that we don't conflict with also don't conflict with the lock holder or any reservation?

pkg/kv/kvserver/concurrency/lock_table.go line 1888 at r2 (raw file):

		return // non-{locking,transactional} requests cannot acquire reservations.
	}
	if l.reservation != nil && l.reservation == g {

Again, didn't we already check this condition?

pkg/kv/kvserver/concurrency/lock_table.go line 1923 at r2 (raw file):

		return false, nil // the lock isn't held; no conflict to speak of
	}
	if g.isSameTxn(lockHolderTxn) {

Didn't we already check this in the call to alreadyLockedOrReservedByRequest?

pkg/kv/kvserver/concurrency/lock_table.go line 1937 at r2 (raw file):

	}

	if g.str == lock.None {

It's subtle that g.str is mutable over the lifetime of a lockTableGuardImpl and yet it looks to the user like it is an immutable attribute of the guard. That's enforced by comments like "if the request, referenced by the supplied lockTableGuardImpl". I think this is why the previous code passed around a SpanAccess alongside the guard. I don't think we need to go that far, but maybe push g.str behind a well-named, well-commented accessor method which references the iteration in stepToNextSpan.

arulajmani

@nvanbenschoten I've replied in-line to all your comments. Even though a few of them weren't directly applicable in the new structure, I think the spirit still holds -- so thanks for the detailed look on this one! Like we spoke about offline, I opened a fresh slate PR over at #104261.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 1515 at r2 (raw file):