-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrency: elide signaling new state channel on some no-op updates #104537
concurrency: elide signaling new state channel on some no-op updates #104537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying over questions from #104261 (review):
what kind of bugs can occur from under-eliding? What kind of bugs can occur from over-eliding?
Under eliding -- this shouldn't cause issues, because the lock table waiter can handle the channel being signalled even though the state didn't change.
is the logic in canElideWaitingStateUpdate prone to rot as new fields are added to waitingState?
Added something to prevent this. It's a bit mechanical, but it's better than nothing; let me know if you have ideas to improve it (and if it's worth doing so).
does the logic in canElideWaitingStateUpdate need to consider every field? Even observability-related fields like queuedWriters?
Like we spoke about yesterday, I removed the observability related fields here. Even before my patch, we weren't diligent in keeping these fields up-to-date in various places. For example, tryActiveWait
doesn't change the waitState
for other requests if it decides to wait on a lock. Taking a step back, I'm not sure the total number of queued readers/writers is useful to a request waiting in wait queues -- locking requests only care about the number of locking requests in front of them; non-locking requests don't really care about any other requests in the wait queues.
An alternative here is to blindly update the wait state, but only signal the channel if a non-observability related field changed. I decided not to do this, mostly because of the question about the value of these fields in the first place.
can we push this into updateWaitingStateLocked so that we don't add complexity to existing methods?
I know we left off yesterday saying we won't do this, but I think the hesitation I had initially goes away by prefixing the method with a maybe
.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
4350f0a
to
aa45085
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/lock_table.go
line 557 at r1 (raw file):
} // maybeUpdateWaitingStateLocked updates the request's waiting state if the
Want to give some explanation for why we elide, either here or in the comment on canElideWaitingStateUpdate
.
pkg/kv/kvserver/concurrency/lock_table.go
line 592 at r1 (raw file):
} // canElideWaitingStateUpdate returns true if the updating the guard's waiting
"if the updating"
pkg/kv/kvserver/concurrency/lock_table.go
line 1335 at r1 (raw file):
// If the state has indeed changed, it must perform a different action -- so // we pass notify = true here to nudge it to do so. g.maybeUpdateWaitingStateLocked(state, true /*notify*/)
nit: /* notify */
pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self
line 115 at r1 (raw file):
distinguished req: 3 # Stays in waitSelf state if scans again.
This case is interesting. We ScanAndEnqueue
and then end up without a notification on NewStateChan
. Should we be clearing g.mu.state
in the guard != nil
branch of ScanAndEnqueue
so that we always end up with a notification? Without one, we may stall in lockTableWaiterImpl.WaitOn
.
aa45085
to
e7b1b20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Addressed comments, RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go
line 557 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Want to give some explanation for why we elide, either here or in the comment on
canElideWaitingStateUpdate
.
Done.
pkg/kv/kvserver/concurrency/testdata/lock_table/wait_self
line 115 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This case is interesting. We
ScanAndEnqueue
and then end up without a notification onNewStateChan
. Should we be clearingg.mu.state
in theguard != nil
branch ofScanAndEnqueue
so that we always end up with a notification? Without one, we may stall inlockTableWaiterImpl.WaitOn
.
Done. Like we talked about offline, this shouldn't be a possible scenario in normal operation (outside of unit tests). I really like your suggestion regardless, because it removes the cognitive burden of having to convince ourselves of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/lock_table.go
line 2662 at r2 (raw file):
g.mu.startWait = false g.mu.mustFindNextLockAfter = false g.mu.state = waitingState{}
nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.
Previously, various places in the lock table code (e.g. `informActiveWaiters`) would update a request's waiting state, and in a few cases, unconditionally signal the new state channel. This would happen even if the waiting state for the active waiter had not changed. This was fine, as the contract for signaling the new state channel makes no guarantees that there has indeed been a state change -- just that there could have been one. This patch tightens the cases where the channel is signaled/waiting state is updated. We elide updates to the waiting state and signaling the channel in cases where the waiter does not need to take any meaningful action. This change will allow us to blindly update waiting states in the future, by calling `informActiveWaiters`, without needing to decide whether there is a state change or not at the caller. As a result of this patch, some observability related fields like the number of waiting readers/writers may be stale. Note that these were could always be stale, this patch just increases the cases where they may be stale. Epic: none Release note: None
e7b1b20
to
ff94ccd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=nvanbenschoten
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go
line 2662 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: move one line up to keep the fields in the same order as in the struct definition. Doing so makes it slightly easier for a reader to determine which fields are reset.
Done.
This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried. Additional information: {"message":"Changes must be made through a pull request.","documentation_url":"https://docs.github.com/articles/about-protected-branches"} |
bors r=nvanbenschoten single on |
Build failed (retrying...): |
bors r- |
Canceled. |
bors r+ single on |
Build succeeded: |
104620: concurrency: use lock modes to find conflicts during lock table scans r=nvanbenschoten a=arulajmani Previous attempt over at: #104261 First commit from: #104537 Second commit from: #104600 This patch majorly refactors the lock table scan codepath, all in the name of shared locks. At its core is a desire to use lock modes to perform conflict resolution between an incoming request and locks held on one particular key. In doing so, we rip out tryActiveWait. At a high level (for a particular key), a request's journey looks like the following: - It first checks if the transaction already holds a lock at a equal or higher lock strength (read: It's good enough for its use). If this is the case, it can proceed without any bookkeeping. - It then checks if any finalized transactions hold locks on the key. Such locks do not conflict, but need to be resolved before the transaction can evaluate. They're thus accumulated for later. - The request then enqueues itself in the appropriate wait queue. - It then determines if it needs to actively wait at this lock because of a conflict. If that's the case, the lock table scan short circuits. - Otherwise, the request lays a claim (if it can) before proceeding with its scan. Closes #102210 Release note: None 105482: sqltelemetry: add missing schema telemetry r=postamar a=rafiss CREATE [ SCHEMA | INDEX | FUNCTION | TYPE ] and ALTER FUNCTION did not have any telemetry, but they should. Epic: None Release note: None 105579: sql: disallow cross-database type references in CTAS r=chengxiong-ruan a=chengxiong-ruan Fixes: #105393 Release note (bug fix): reviously, cross-database type references could sneaked in through `CREATE TABLE...AS` statements if the source table is from another database and any of its columns is of a user defined type. This introduced bug where the source table can be dropped and type could not be found for the CTAS table. This commit disallow such CTAS as a fix. 105581: optbuilder: reset annotations when building CREATE FUNCTION r=rafiss a=rafiss In 22dabb0 we started overriding the annotations for each statement in the UDF body. We should reset them to the original values, so we don't accidentally leave the old reference. Epic: None Release note: None 105596: storage: make `storage.max_sync_duration` public and `TenantReadOnly` r=erikgrinaker a=erikgrinaker Users have asked why this setting is not public, this patch makes it so. Furthermore, these settings were `TenantWritable`. We do not want these to be writable by tenants, where they can potentially cause problems on SQL nodes, considering e.g. SQL disk spilling uses Pebble. Epic: none Release note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Chengxiong Ruan <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
Previously, various places in the lock table code
(e.g.
informActiveWaiters
) would update a request's waiting state, and in a few cases, unconditionally signal the new state channel. This would happen even if the waiting state for the active waiter had not changed. This was fine, as the contract for signaling the new state channel makes no guarantees that there has indeed been a state change -- just that there could have been one.This patch tightens the cases where the channel is signaled/waiting state is updated. We elide updates to the waiting state and signaling the channel in cases where the waiter does not need to take any meaningful action. This change will allow us to blindly update waiting states in the future, by calling
informActiveWaiters
, without needing to decide whether there is a state change or not at the caller.As a result of this patch, some observability related fields like the number of waiting readers/writers may be stale. Note that these were could always be stale, this patch just increases the cases where they may be stale.
Release note: None