Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: elide signaling new state channel on some no-op updates #104537

Merged
merged 1 commit into from
Jun 9, 2023

Conversation

arulajmani
Copy link
Collaborator

Previously, various places in the lock table code
(e.g. informActiveWaiters) would update a request's waiting state, and in a few cases, unconditionally signal the new state channel. This would happen even if the waiting state for the active waiter had not changed. This was fine, as the contract for signaling the new state channel makes no guarantees that there has indeed been a state change -- just that there could have been one.

This patch tightens the cases where the channel is signaled/waiting state is updated. We elide updates to the waiting state and signaling the channel in cases where the waiter does not need to take any meaningful action. This change will allow us to blindly update waiting states in the future, by calling informActiveWaiters, without needing to decide whether there is a state change or not at the caller.

As a result of this patch, some observability related fields like the number of waiting readers/writers may be stale. Note that these were could always be stale, this patch just increases the cases where they may be stale.

Release note: None

@arulajmani arulajmani requested a review from nvanbenschoten June 7, 2023 17:04
@arulajmani arulajmani requested a review from a team as a code owner June 7, 2023 17:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying over questions from #104261 (review):

what kind of bugs can occur from under-eliding? What kind of bugs can occur from over-eliding?

Under eliding -- this shouldn't cause issues, because the lock table waiter can handle the channel being signalled even though the state didn't change.

is the logic in canElideWaitingStateUpdate prone to rot as new fields are added to waitingState?

Added something to prevent this. It's a bit mechanical, but it's better than nothing; let me know if you have ideas to improve it (and if it's worth doing so).

does the logic in canElideWaitingStateUpdate need to consider every field? Even observability-related fields like queuedWriters?

Like we spoke about yesterday, I removed the observability related fields here. Even before my patch, we weren't diligent in keeping these fields up-to-date in various places. For example, tryActiveWait doesn't change the waitState for other requests if it decides to wait on a lock. Taking a step back, I'm not sure the total number of queued readers/writers is useful to a request waiting in wait queues -- locking requests only care about the number of locking requests in front of them; non-locking requests don't really care about any other requests in the wait queues.

An alternative here is to blindly update the wait state, but only signal the channel if a non-observability related field changed. I decided not to do this, mostly because of the question about the value of these fields in the first place.

can we push this into updateWaitingStateLocked so that we don't add complexity to existing methods?

I know we left off yesterday saying we won't do this, but I think the hesitation I had initially goes away by prefixing the method with a maybe.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@arulajmani arulajmani force-pushed the elide-waiting-state-updates branch from 4350f0a to aa45085 Compare June 7, 2023 17:58
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table.go line 557 at r1 (raw file):

}

// maybeUpdateWaitingStateLocked updates the request's waiting state if the

Want to give some explanation for why we elide, either here or in the comment on canElideWaitingStateUpdate.


pkg/kv/kvserver/concurrency/lock_table.go line 592 at r1 (raw file):

}

// canElideWaitingStateUpdate returns true if the updating the guard's waiting

"if the updating"


pkg/kv/kvserver/concurrency/lock_table.go line 1335 at r1 (raw file):

		// If the state has indeed changed, it must perform a different action -- so
		// we pass notify = true here to nudge it to do so.
		g.maybeUpdateWaitingStateLocked(state, true /*notify*/)

nit: /* notify */


pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self line 115 at r1 (raw file):

   distinguished req: 3

# Stays in waitSelf state if scans again.

This case is interesting. We ScanAndEnqueue and then end up without a notification on NewStateChan. Should we be clearing g.mu.state in the guard != nil branch of ScanAndEnqueue so that we always end up with a notification? Without one, we may stall in lockTableWaiterImpl.WaitOn.

@arulajmani arulajmani force-pushed the elide-waiting-state-updates branch from aa45085 to e7b1b20 Compare June 8, 2023 15:56
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Addressed comments, RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go line 557 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Want to give some explanation for why we elide, either here or in the comment on canElideWaitingStateUpdate.

Done.


pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self line 115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This case is interesting. We ScanAndEnqueue and then end up without a notification on NewStateChan. Should we be clearing g.mu.state in the guard != nil branch of ScanAndEnqueue so that we always end up with a notification? Without one, we may stall in lockTableWaiterImpl.WaitOn.

Done. Like we talked about offline, this shouldn't be a possible scenario in normal operation (outside of unit tests). I really like your suggestion regardless, because it removes the cognitive burden of having to convince ourselves of this.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table.go line 2662 at r2 (raw file):

		g.mu.startWait = false
		g.mu.mustFindNextLockAfter = false
		g.mu.state = waitingState{}

nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.

Previously, various places in the lock table code
(e.g. `informActiveWaiters`) would update a request's waiting state,
and in a few cases, unconditionally signal the new state channel.
This would happen even if the waiting state for the active waiter
had not changed. This was fine, as the contract for signaling the new
state channel makes no guarantees that there has indeed been a state
change -- just that there could have been one.

This patch tightens the cases where the channel is signaled/waiting
state is updated. We elide updates to the waiting state and signaling
the channel in cases where the waiter does not need to take any
meaningful action. This change will allow us to blindly update waiting
states in the future, by calling `informActiveWaiters`, without needing
to decide whether there is a state change or not at the caller.

As a result of this patch, some observability related fields like the
number of waiting readers/writers may be stale. Note that these were
could always be stale, this patch just increases the cases where they
may be stale.

Epic: none
Release note: None
@arulajmani arulajmani force-pushed the elide-waiting-state-updates branch from e7b1b20 to ff94ccd Compare June 9, 2023 16:27
Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go line 2662 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.

Done.

@craig
Copy link
Contributor

craig bot commented Jun 9, 2023

This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried.

Additional information:

{"message":"Changes must be made through a pull request.","documentation_url":"https://docs.github.com/articles/about-protected-branches"}

@arulajmani
Copy link
Collaborator Author

bors r=nvanbenschoten single on

@craig
Copy link
Contributor

craig bot commented Jun 9, 2023

Build failed (retrying...):

@arulajmani
Copy link
Collaborator Author

bors r-

@craig
Copy link
Contributor

craig bot commented Jun 9, 2023

Canceled.

@arulajmani
Copy link
Collaborator Author

bors r+ single on

@craig
Copy link
Contributor

craig bot commented Jun 9, 2023

Build succeeded:

@craig craig bot merged commit b3f95aa into cockroachdb:master Jun 9, 2023
craig bot pushed a commit that referenced this pull request Jun 27, 2023
104620: concurrency: use lock modes to find conflicts during lock table scans r=nvanbenschoten a=arulajmani

Previous attempt over at: #104261

First commit from: #104537
Second commit from: #104600

This patch majorly refactors the lock table scan codepath, all in the
name of shared locks. At its core is a desire to use lock modes to
perform conflict resolution between an incoming request and locks held
on one particular key. In doing so, we rip out tryActiveWait.

At a high level (for a particular key), a request's journey looks like
the following:

- It first checks if the transaction already holds a lock at a equal or
higher lock strength (read: It's good enough for its use). If this is
the case, it can proceed without any bookkeeping.
- It then checks if any finalized transactions hold locks on the key.
Such locks do not conflict, but need to be resolved before the
transaction can evaluate. They're thus accumulated for later.
- The request then enqueues itself in the appropriate wait queue.
- It then determines if it needs to actively wait at this lock because
of a conflict. If that's the case, the lock table scan short circuits.
- Otherwise, the request lays a claim (if it can) before proceeding with
its scan.

Closes #102210

Release note: None

105482: sqltelemetry: add missing schema telemetry r=postamar a=rafiss

CREATE [ SCHEMA | INDEX | FUNCTION | TYPE ] and ALTER FUNCTION did not
have any telemetry, but they should.

Epic: None
Release note: None

105579: sql: disallow cross-database type references in CTAS r=chengxiong-ruan a=chengxiong-ruan

Fixes: #105393

Release note (bug fix): reviously, cross-database type references could sneaked in through `CREATE TABLE...AS` statements if the source table is from another database and any of its columns is of a user defined type. This introduced bug where the source table can be dropped and type could not be found for the CTAS table. This commit disallow such CTAS as a fix.

105581: optbuilder: reset annotations when building CREATE FUNCTION r=rafiss a=rafiss

In 22dabb0 we started overriding the annotations for each statement in the UDF body. We should reset them to the original values, so we don't accidentally leave the old reference.

Epic: None
Release note: None

105596: storage: make `storage.max_sync_duration` public and `TenantReadOnly` r=erikgrinaker a=erikgrinaker

Users have asked why this setting is not public, this patch makes it so.

Furthermore, these settings were `TenantWritable`. We do not want these to be writable by tenants, where they can potentially cause problems on SQL nodes, considering e.g. SQL disk spilling uses Pebble.

Epic: none
Release note: None

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants