concurrency: correctly handle lock updates that decrease its strength #112732

arulajmani · 2023-10-19T22:47:00Z

When a lock is updated on behalf of a non-finalized transaction, the lock table simply forgets any replicated lock information it was previously tracking. Now, if a lock is held with both replicated and unreplicated durability, and the replicated lock's strength is stronger than the unreplicated lock's strength, forgetting the replicated lock means there may be requests waiting on a key that no longer conflict with what's being tracked in memory[*]. In such cases, requests that no longer conflict with the in-memory tracking should no longer wait on this key -- they should be allowed to proceed with their scan. It's not guaranteed that they no longer conflict with the lock -- they might rediscover the replicated lock again and start waiting. However, if they no longer conflict with the lock, blocking them indefinitely can lead to deadlocks.

[*] Concretely, consider the following construction:

key a: [txn 1, (repl intent ts@10), (unrepl shared)]
- wait-queue readers: [none ts@12], [none ts@15]
- wait-queue locking requests: [shared], [shared], [exclusive] [shared] In this case, updating the intent's timestamp to ts@14 (and forgetting it) should allow both the non-locking readers and first two shared locking requests to become compatible with what's being tracked in memory for this key.

Fixes #112608

Release note: None

cockroach-teamcity · 2023-10-19T22:47:09Z

This change is

miraradeva

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3378 at r1 (raw file):

//
// REQUIRES: kl.mu to be locked.
func (kl *keyLocks) onMaybeDecreasedLockStr(st *cluster.Settings) {

I like the overall logic in this function but I wonder if you're concerned at all about the efficiency of looping through all lock holders, all waiting readers, and potentially many queued locking requests. Are there cases where we will end up doing this over and over? E.g. an intent and a bunch of queued shared locking requests where the intent's ts is bumped again and again. Every time we update the intent lock, we remove the replicated lock info, the strength decreases, we iterate over all of the above, and each of the queued requests encounters the intent again when they try to evaluate. Maybe this is not possible, but I wonder if there are other similar situations.

arulajmani

There's a leaked goroutine in the concurrency manager test that I added because of #111596. We'll need to get to it before merging this patch.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3378 at r1 (raw file):

Every time we update the intent lock, we remove the replicated lock info, the strength decreases, we iterate over all of the above, and each of the queued requests encounters the intent again when they try to evaluate.

This is a good call out. It's not great that we're potentially in this scheme of discover -> update -> forget -> re-discover. However, it's no different than what would happen right now if there was only an intent on this key (as opposed to there being both an intent and an unreplicated shared lock).

Outside of KVNemesis, I'm also not sure how we can end up with both an unreplicated shared lock and an intent on a key -- so maybe that helps assuage some of the concerns here?

miraradeva

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3378 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Every time we update the intent lock, we remove the replicated lock info, the strength decreases, we iterate over all of the above, and each of the queued requests encounters the intent again when they try to evaluate.

This is a good call out. It's not great that we're potentially in this scheme of discover -> update -> forget -> re-discover. However, it's no different than what would happen right now if there was only an intent on this key (as opposed to there being both an intent and an unreplicated shared lock).

Outside of KVNemesis, I'm also not sure how we can end up with both an unreplicated shared lock and an intent on a key -- so maybe that helps assuage some of the concerns here?

The discover -> update -> forget -> re-discover cycle seems reasonable; my concern is only around adding another dimension of iterating over all waiting requests. As long as we're convinced it's rare, it should be ok. A bad case would need not only an unreplicated shared lock and an intent on a key, but also a long queue of waiters, which would be even less likely to occur.

arulajmani

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3439 at r3 (raw file):

	}
	if distinguishedRemoved {
		kl.tryMakeNewDistinguished()

Now that this function is a bit more general, this needs to call informActiveWaiters instead.

arulajmani

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3439 at r3 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Now that this function is a bit more general, this needs to call informActiveWaiters instead.

Done.

nvanbenschoten

Reviewed 4 of 6 files at r1, 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table.go line 3359 at r2 (raw file):

		e = e.Next()
		if g.ts.Less(newTs) {
			// TODO(arul): Flip this condition; as written, this short circuits --

Nice catch!

pkg/kv/kvserver/concurrency/lock_table.go line 2963 at r4 (raw file):

			// a low watermark timestamp across multiple lock holders by being cute
			// here.
			if tl.getLockMode().Strength == lock.Exclusive ||

We decided to remove this and replace it with a call to recomputeWaitQueues.

pkg/kv/kvserver/concurrency/lock_table.go line 3254 at r4 (raw file):

	beforeTs := tl.writeTS()
	beforeStr := tl.getLockMode().Strength
	advancedTs := beforeTs.Less(ts)

nit that's not for this PR: the way you're computing beforeStr and then afterStr below without pre-computing whether it is expected to change feels less fragile to me than what we're doing with timestamps. For timestamps, we precompute advancedTs and then assume that the state transition we plan to make if this flag is true will be reflected in the writeTS() method. That feels backwards.

The following pattern (like you have with the strength) would be easier to understand and more obviously correct:

beforeTs := tl.writeTS()
...
if tl.isHeldUnreplicated() {
    tl.unreplicatedInfo.ts.Forward(up.Txn.WriteTimestamp)
}
...
afterTs := tl.writeTS()
advancedTs := beforeTs.Less(afterTs)

pkg/kv/kvserver/concurrency/lock_table.go line 3372 at r4 (raw file):

// locks holders and other waiting requests. Such computation is necessary when
// a lock's strength has decreased[1] or locking requests have dropped out of
// wait queue's[2] without actually acquiring the lock.

Or the lock's timestamp has advanced.

pkg/kv/kvserver/concurrency/lock_table.go line 3391 at r4 (raw file):

	for e := kl.holders.Front(); e != nil; e = e.Next() {
		holder := e.Value
		if strongestMode.Less(holder.getLockMode()) {

nit: avoid the double call to holder.getLockMode(). mode := e.Value.getLockMode()

pkg/kv/kvserver/concurrency/lock_table.go line 3404 at r4 (raw file):

		}
	}
	for e := kl.queuedLockingRequests.Front(); e != nil; {

Let's give each of these loops comments explaining what their role is.

pkg/kv/kvserver/concurrency/lock_table.go line 3408 at r4 (raw file):

		curr := e
		e = e.Next()
		if !strongestMode.Empty() && lock.Conflicts(qlr.mode, strongestMode, &st.SV) {

Should we make lock.Conflicts return false if strongestMode.Empty()? Is it panicking right now?

pkg/kv/kvserver/concurrency/lock_table.go line 3424 at r4 (raw file):

			} else {
				qlr.active = false // mark as inactive
				if qlr.guard == kl.distinguishedWaiter {

Pointing out for future simplification that there is a lot of overlap between this function and maybeReleaseCompatibleLockingRequests.

pkg/kv/kvserver/concurrency/lock_table.go line 3425 at r4 (raw file):

				qlr.active = false // mark as inactive
				if qlr.guard == kl.distinguishedWaiter {
					kl.distinguishedWaiter = nil

// A new distinguished waiter will be selected by informActiveWaiters.

pkg/kv/kvserver/concurrency/lock_table.go line 3432 at r4 (raw file):

			}
		}
		// NB: Non-transactional locking requests race with transactional locking

I found this comment and the logic below to be unintuitive. Is it trying to say that we only update strongestMode if qlr is still in the queue (i.e. we didn't hit the removeLockingRequest path above)? If so, it would be easier to understand if that's how it was structured. You could either look at curr or maintain a removed bool to make the determination.

pkg/kv/kvserver/concurrency/lock/locking.go line 132 at r4 (raw file):

// Less returns true if the receiver conflicts with fewer requests than the Mode
// supplied.
func (m Mode) Less(o Mode) bool {

nit: I feel like we usually use the term "weaker" or "stronger" instead of "less" or "more", so consider naming this Weaker.

pkg/kv/kvserver/concurrency/testdata/concurrency_manager/shared_locks line 245 at r4 (raw file):

  holder: txn: 00000006-0000-0000-0000-000000000000 epoch: 0, iso: Serializable, info: unrepl [(str: Shared seq: 0)]

new-request name=req8 txn=txn7 ts=10,1

# txn7 is read-committed, so it can push the timestamp of intents as if it was high priority.

arulajmani

TFTR, RFAL!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 3359 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice catch!

One way to address the TODO was to delete the function entirely 😂

pkg/kv/kvserver/concurrency/lock_table.go line 3254 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit that's not for this PR: the way you're computing beforeStr and then afterStr below without pre-computing whether it is expected to change feels less fragile to me than what we're doing with timestamps. For timestamps, we precompute advancedTs and then assume that the state transition we plan to make if this flag is true will be reflected in the writeTS() method. That feels backwards.

The following pattern (like you have with the strength) would be easier to understand and more obviously correct:
beforeTs := tl.writeTS()
...
if tl.isHeldUnreplicated() {
    tl.unreplicatedInfo.ts.Forward(up.Txn.WriteTimestamp)
}
...
afterTs := tl.writeTS()
advancedTs := beforeTs.Less(afterTs)

I can follow this up with another PR -- I'm assuming you said "not for this PR" because we want to reduce the amount of code we're backporting to 23.2, which makes sense.

pkg/kv/kvserver/concurrency/lock_table.go line 3408 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we make lock.Conflicts return false if strongestMode.Empty()? Is it panicking right now?

Yeah, it was panicking before. Made the switch.

pkg/kv/kvserver/concurrency/lock_table.go line 3424 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Pointing out for future simplification that there is a lot of overlap between this function and maybeReleaseCompatibleLockingRequests.

Added a TODO.

pkg/kv/kvserver/concurrency/lock_table.go line 3432 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I found this comment and the logic below to be unintuitive. Is it trying to say that we only update strongestMode if qlr is still in the queue (i.e. we didn't hit the removeLockingRequest path above)? If so, it would be easier to understand if that's how it was structured. You could either look at curr or maintain a removed bool to make the determination.

Good call out. Switched to using a removed boolean as that's a bit more readable than looking at curr and comparing it with e.Prev().

pkg/kv/kvserver/concurrency/lock/locking.go line 132 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I feel like we usually use the term "weaker" or "stronger" instead of "less" or "more", so consider naming this Weaker.

Done.

nvanbenschoten

Reviewed 4 of 4 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table.go line 2963 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We decided to remove this and replace it with a call to recomputeWaitQueues.

Don't we want to remove the tl.getLockMode().Strength == lock.Exclusive || ...?

pkg/kv/kvserver/concurrency/lock_table.go line 3254 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I can follow this up with another PR -- I'm assuming you said "not for this PR" because we want to reduce the amount of code we're backporting to 23.2, which makes sense.

Yeah, we don't want to backport refactors that aren't directly fixing issues where we can help it.

pkg/kv/kvserver/concurrency/lock/locking.go line 83 at r5 (raw file):

func Conflicts(m1, m2 Mode, sv *settings.Values) bool {
	if m1.Empty() || m2.Empty() {
		return false // no conflict with empty lock modes

Can we add this to a unit test?

When a lock is updated on behalf of a non-finalized transaction, the lock table simply forgets any replicated lock information it was previously tracking. Now, if a lock is held with both replicated and unreplicated durability, and the replicated lock's strength is stronger than the unreplicated lock's strength, forgetting the replicated lock means there may be requests waiting on a key that no longer conflict with what's being tracked in memory[1]. In such cases, requests that no longer conflict with the in-memory tracking should no longer wait on this key -- they should be allowed to proceed with their scan. It's not guaranteed that they no longer conflict with the lock -- they might rediscover the replicated lock again and start waiting. However, if they no longer conflict with the lock, blocking them indefinitely can lead to undetected deadlocks in rare cases[2]. [1] Concretely, consider the following construction: - key a: [txn 1, (repl intent ts@10), (unrepl shared)] - wait-queue readers: [none ts@12], [none ts@15] - wait-queue locking requests: [shared], [shared], [exclusive] [shared] In this case, updating the intent's timestamp to ts@14 (and forgetting it) should allow both the non-locking readers and first two shared locking requests to become compatible with what's being tracked in memory for this key. [2] This can happen if the lock update corresponds to a successful PUSH_TIMESTAMP. The pusher will be blocked in the lock table waiter waiting for a `waitingState` update that never comes. Now, if there was a dependency cycle between the pusher and pushee had the push timestamp been blocking, we would find ourselves in an undetected deeadlock situation. Fixes cockroachdb#112608 Release note: None

arulajmani

Thanks!

bors r=nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 2963 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Don't we want to remove the tl.getLockMode().Strength == lock.Exclusive || ...?

Yeah, we did -- it's implied if we're in this block.

pkg/kv/kvserver/concurrency/lock/locking.go line 83 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we add this to a unit test?

Added.

craig · 2023-11-03T20:47:17Z

Build succeeded:

Bazel Essential CI (Cockroach)

arulajmani · 2023-11-03T22:06:15Z

blathers backport 23.2

112078: roachtest: clean up command-line flags r=RaduBerinde a=RaduBerinde **This PR is only for the last commit. The rest are #111811** --- #### roachtest: clean up command-line flags The code around command-line flags is pretty messy: they're defined in many places; the name and description of a flag are far away from the variable; the variable names look like local variables and in many cases it's not obvious we're accessing a global. This commit moves all flags to a separate subpackage, `roachtestflags`, making all uses of global flags obvious. We also add a bit of infrastructure to allow defining all information about a flag right next to where the variable is declared. We also provide a `Changed()` function that determines if a flag value was changed (without the caller having to use the Command or even the flag name). There should be no functional changes (just some cosmetic improvements to the flag usage texts). Epic: none Release note: None 113245: kvserver: add metric and log when raft.Storage returns an error r=erikgrinaker a=sumeerbhola The raft.storage.error metric is incremented on an error, and the error is logged every 30s (across all replicas). This was motivated by a test cluster that slowed to a crawl because of deliberate data loss, but was hard to diagnose. The metric could be used for alerting, since we don't expect to see transient errors. Informs #113053 Epic: none Release note: None 113335: kvpb: delete ErrorDetail message r=nvanbenschoten a=nvanbenschoten This was unused, so delete it. The message has been unused since 0c12f6c. Epic: None Release note: None 113636: concurrency: recompute wait queues when locking requests drop out r=nvanbenschoten a=arulajmani First commit from #112732 ---- A locking request must actively wait in a lock's wait queues if: - it conflicts with any of the lock holders. - or it conflicts with a lower sequence numbered request already in the lock's wait queue. As a result, if a locking request exits a lock's wait queue without actually acquiring the lock, it may allow other locking requests to proceed. This patch recomputes wait queues whenever a locking request exits a lock's wait queues to detect such scenarios and unblock requests which were actively waiting previously not no longer need to. Fixes #111144 Release note: None Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>

arulajmani requested a review from nvanbenschoten October 19, 2023 22:47

arulajmani requested a review from a team as a code owner October 19, 2023 22:47

miraradeva reviewed Oct 20, 2023

View reviewed changes

arulajmani force-pushed the sl-112608 branch from 1dc6507 to 6abea12 Compare October 20, 2023 14:48

arulajmani commented Oct 20, 2023

View reviewed changes

miraradeva reviewed Oct 20, 2023

View reviewed changes

arulajmani force-pushed the sl-112608 branch from 6abea12 to 023f676 Compare November 1, 2023 16:35

arulajmani commented Nov 1, 2023

View reviewed changes

arulajmani force-pushed the sl-112608 branch from 023f676 to 1d21c04 Compare November 1, 2023 21:01

arulajmani commented Nov 1, 2023

View reviewed changes

arulajmani mentioned this pull request Nov 1, 2023

concurrency: recompute wait queues when locking requests drop out #113636

Merged

nvanbenschoten reviewed Nov 3, 2023

View reviewed changes

arulajmani force-pushed the sl-112608 branch from 1d21c04 to 7745487 Compare November 3, 2023 18:30

arulajmani commented Nov 3, 2023

View reviewed changes

nvanbenschoten approved these changes Nov 3, 2023

View reviewed changes

arulajmani force-pushed the sl-112608 branch from 7745487 to e875c8a Compare November 3, 2023 20:02

arulajmani commented Nov 3, 2023

View reviewed changes

craig bot merged commit e30d643 into cockroachdb:master Nov 3, 2023
3 checks passed

blathers-crl bot mentioned this pull request Nov 3, 2023

release-23.2: concurrency: correctly handle lock updates that decrease its strength #113791

Merged

cockroachdb deleted a comment from blathers-crl bot Nov 3, 2023

blathers-crl bot mentioned this pull request Nov 6, 2023

release-23.2: concurrency: recompute wait queues when locking requests drop out #113884

Closed

itsbilal mentioned this pull request Nov 6, 2023

storage/metamorphic: TestPebbleEquivalence failed #113806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency: correctly handle lock updates that decrease its strength #112732

concurrency: correctly handle lock updates that decrease its strength #112732

arulajmani commented Oct 19, 2023

cockroach-teamcity commented Oct 19, 2023

miraradeva left a comment

arulajmani left a comment

miraradeva left a comment

arulajmani left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

craig bot commented Nov 3, 2023

arulajmani commented Nov 3, 2023

concurrency: correctly handle lock updates that decrease its strength #112732

concurrency: correctly handle lock updates that decrease its strength #112732

Conversation

arulajmani commented Oct 19, 2023

cockroach-teamcity commented Oct 19, 2023

miraradeva left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

miraradeva left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

craig bot commented Nov 3, 2023

arulajmani commented Nov 3, 2023