concurrency: elide signaling new state channel on some no-op updates #104537

arulajmani · 2023-06-07T17:04:31Z

Previously, various places in the lock table code
(e.g. informActiveWaiters) would update a request's waiting state, and in a few cases, unconditionally signal the new state channel. This would happen even if the waiting state for the active waiter had not changed. This was fine, as the contract for signaling the new state channel makes no guarantees that there has indeed been a state change -- just that there could have been one.

This patch tightens the cases where the channel is signaled/waiting state is updated. We elide updates to the waiting state and signaling the channel in cases where the waiter does not need to take any meaningful action. This change will allow us to blindly update waiting states in the future, by calling informActiveWaiters, without needing to decide whether there is a state change or not at the caller.

As a result of this patch, some observability related fields like the number of waiting readers/writers may be stale. Note that these were could always be stale, this patch just increases the cases where they may be stale.

Release note: None

cockroach-teamcity · 2023-06-07T17:04:43Z

This change is

arulajmani

Copying over questions from #104261 (review):

what kind of bugs can occur from under-eliding? What kind of bugs can occur from over-eliding?

Under eliding -- this shouldn't cause issues, because the lock table waiter can handle the channel being signalled even though the state didn't change.

is the logic in canElideWaitingStateUpdate prone to rot as new fields are added to waitingState?

Added something to prevent this. It's a bit mechanical, but it's better than nothing; let me know if you have ideas to improve it (and if it's worth doing so).

does the logic in canElideWaitingStateUpdate need to consider every field? Even observability-related fields like queuedWriters?

Like we spoke about yesterday, I removed the observability related fields here. Even before my patch, we weren't diligent in keeping these fields up-to-date in various places. For example, tryActiveWait doesn't change the waitState for other requests if it decides to wait on a lock. Taking a step back, I'm not sure the total number of queued readers/writers is useful to a request waiting in wait queues -- locking requests only care about the number of locking requests in front of them; non-locking requests don't really care about any other requests in the wait queues.

An alternative here is to blindly update the wait state, but only signal the channel if a non-observability related field changed. I decided not to do this, mostly because of the question about the value of these fields in the first place.

can we push this into updateWaitingStateLocked so that we don't add complexity to existing methods?

I know we left off yesterday saying we won't do this, but I think the hesitation I had initially goes away by prefixing the method with a maybe.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

nvanbenschoten

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table.go line 557 at r1 (raw file):

}

// maybeUpdateWaitingStateLocked updates the request's waiting state if the

Want to give some explanation for why we elide, either here or in the comment on canElideWaitingStateUpdate.

pkg/kv/kvserver/concurrency/lock_table.go line 592 at r1 (raw file):

}

// canElideWaitingStateUpdate returns true if the updating the guard's waiting

"if the updating"

pkg/kv/kvserver/concurrency/lock_table.go line 1335 at r1 (raw file):

		// If the state has indeed changed, it must perform a different action -- so
		// we pass notify = true here to nudge it to do so.
		g.maybeUpdateWaitingStateLocked(state, true /*notify*/)

nit: /* notify */

pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self line 115 at r1 (raw file):

   distinguished req: 3

# Stays in waitSelf state if scans again.

This case is interesting. We ScanAndEnqueue and then end up without a notification on NewStateChan. Should we be clearing g.mu.state in the guard != nil branch of ScanAndEnqueue so that we always end up with a notification? Without one, we may stall in lockTableWaiterImpl.WaitOn.

arulajmani

TFTR! Addressed comments, RFAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 557 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Want to give some explanation for why we elide, either here or in the comment on canElideWaitingStateUpdate.

Done.

pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self line 115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This case is interesting. We ScanAndEnqueue and then end up without a notification on NewStateChan. Should we be clearing g.mu.state in the guard != nil branch of ScanAndEnqueue so that we always end up with a notification? Without one, we may stall in lockTableWaiterImpl.WaitOn.

Done. Like we talked about offline, this shouldn't be a possible scenario in normal operation (outside of unit tests). I really like your suggestion regardless, because it removes the cognitive burden of having to convince ourselves of this.

nvanbenschoten

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table.go line 2662 at r2 (raw file):

		g.mu.startWait = false
		g.mu.mustFindNextLockAfter = false
		g.mu.state = waitingState{}

nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.

Previously, various places in the lock table code (e.g. `informActiveWaiters`) would update a request's waiting state, and in a few cases, unconditionally signal the new state channel. This would happen even if the waiting state for the active waiter had not changed. This was fine, as the contract for signaling the new state channel makes no guarantees that there has indeed been a state change -- just that there could have been one. This patch tightens the cases where the channel is signaled/waiting state is updated. We elide updates to the waiting state and signaling the channel in cases where the waiter does not need to take any meaningful action. This change will allow us to blindly update waiting states in the future, by calling `informActiveWaiters`, without needing to decide whether there is a state change or not at the caller. As a result of this patch, some observability related fields like the number of waiting readers/writers may be stale. Note that these were could always be stale, this patch just increases the cases where they may be stale. Epic: none Release note: None

arulajmani

TFTR!

bors r=nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 2662 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.

Done.

craig · 2023-06-09T17:53:28Z

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"Changes must be made through a pull request.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

arulajmani · 2023-06-09T18:16:09Z

bors r=nvanbenschoten single on

craig · 2023-06-09T19:28:10Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

arulajmani · 2023-06-09T19:33:24Z

bors r-

craig · 2023-06-09T19:33:27Z

Canceled.

arulajmani · 2023-06-09T19:33:31Z

bors r+ single on

craig · 2023-06-09T21:18:36Z

Build succeeded:

Bazel Essential CI (Cockroach)

104620: concurrency: use lock modes to find conflicts during lock table scans r=nvanbenschoten a=arulajmani Previous attempt over at: #104261 First commit from: #104537 Second commit from: #104600 This patch majorly refactors the lock table scan codepath, all in the name of shared locks. At its core is a desire to use lock modes to perform conflict resolution between an incoming request and locks held on one particular key. In doing so, we rip out tryActiveWait. At a high level (for a particular key), a request's journey looks like the following: - It first checks if the transaction already holds a lock at a equal or higher lock strength (read: It's good enough for its use). If this is the case, it can proceed without any bookkeeping. - It then checks if any finalized transactions hold locks on the key. Such locks do not conflict, but need to be resolved before the transaction can evaluate. They're thus accumulated for later. - The request then enqueues itself in the appropriate wait queue. - It then determines if it needs to actively wait at this lock because of a conflict. If that's the case, the lock table scan short circuits. - Otherwise, the request lays a claim (if it can) before proceeding with its scan. Closes #102210 Release note: None 105482: sqltelemetry: add missing schema telemetry r=postamar a=rafiss CREATE [ SCHEMA | INDEX | FUNCTION | TYPE ] and ALTER FUNCTION did not have any telemetry, but they should. Epic: None Release note: None 105579: sql: disallow cross-database type references in CTAS r=chengxiong-ruan a=chengxiong-ruan Fixes: #105393 Release note (bug fix): reviously, cross-database type references could sneaked in through `CREATE TABLE...AS` statements if the source table is from another database and any of its columns is of a user defined type. This introduced bug where the source table can be dropped and type could not be found for the CTAS table. This commit disallow such CTAS as a fix. 105581: optbuilder: reset annotations when building CREATE FUNCTION r=rafiss a=rafiss In 22dabb0 we started overriding the annotations for each statement in the UDF body. We should reset them to the original values, so we don't accidentally leave the old reference. Epic: None Release note: None 105596: storage: make `storage.max_sync_duration` public and `TenantReadOnly` r=erikgrinaker a=erikgrinaker Users have asked why this setting is not public, this patch makes it so. Furthermore, these settings were `TenantWritable`. We do not want these to be writable by tenants, where they can potentially cause problems on SQL nodes, considering e.g. SQL disk spilling uses Pebble. Epic: none Release note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>

arulajmani requested a review from nvanbenschoten June 7, 2023 17:04

arulajmani requested a review from a team as a code owner June 7, 2023 17:04

arulajmani commented Jun 7, 2023

View reviewed changes

arulajmani force-pushed the elide-waiting-state-updates branch from 4350f0a to aa45085 Compare June 7, 2023 17:58

nvanbenschoten reviewed Jun 7, 2023

View reviewed changes

arulajmani force-pushed the elide-waiting-state-updates branch from aa45085 to e7b1b20 Compare June 8, 2023 15:56

arulajmani commented Jun 8, 2023

View reviewed changes

arulajmani requested a review from nvanbenschoten June 8, 2023 15:56

arulajmani mentioned this pull request Jun 8, 2023

concurrency: use lock modes to find conflicts during lock table scans #104620

Merged

nvanbenschoten approved these changes Jun 9, 2023

View reviewed changes

arulajmani force-pushed the elide-waiting-state-updates branch from e7b1b20 to ff94ccd Compare June 9, 2023 16:27

arulajmani commented Jun 9, 2023

View reviewed changes

craig bot merged commit b3f95aa into cockroachdb:master Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency: elide signaling new state channel on some no-op updates #104537

concurrency: elide signaling new state channel on some no-op updates #104537

arulajmani commented Jun 7, 2023

cockroach-teamcity commented Jun 7, 2023

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023

concurrency: elide signaling new state channel on some no-op updates #104537

concurrency: elide signaling new state channel on some no-op updates #104537

Conversation

arulajmani commented Jun 7, 2023

cockroach-teamcity commented Jun 7, 2023

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023

arulajmani commented Jun 9, 2023

craig bot commented Jun 9, 2023