kv: implement Update lock strength for unreplicated locks #49684

nvanbenschoten · 2020-05-29T04:03:54Z

The lockTable currently only supports the exclusive lock strength, but was designed to support varying degrees of lock strengths. Specifically, we'd eventually like to add support for Shared and Upgrade locks as well.

cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto

Lines 40 to 50 in 9e1666b

    
           //  +-----------+-----------+-----------+-----------+-----------+ 
        
           //  |           |   None    |  Shared   |  Upgrade  | Exclusive | 
        
           //  +-----------+-----------+-----------+-----------+-----------+ 
        
           //  | None      |           |           |           |     X^†   | 
        
           //  +-----------+-----------+-----------+-----------+-----------+ 
        
           //  | Shared    |           |           |     X     |     X     | 
        
           //  +-----------+-----------+-----------+-----------+-----------+ 
        
           //  | Upgrade   |           |     X     |     X     |     X     | 
        
           //  +-----------+-----------+-----------+-----------+-----------+ 
        
           //  | Exclusive |     X^†   |     X     |     X     |     X     | 
        
           //  +-----------+-----------+-----------+-----------+-----------+

We currently have no need for Shared locks, but Upgrade locks would be immediately useful for unreplicated locks acquired during the initial row-fetch of an UPDATE statement (implicit SFU) or by a SELECT ... FOR UPDATE statement (explicit SFU).

cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto

Lines 90 to 130 in 9e1666b

    
           // Upgrade (U) locks are a hybrid of Shared and Exclusive locks which are 
        
           // used to prevent a common form of deadlock. When a transaction intends to 
        
           // modify existing KVs, it is often the case that it reads the KVs first and 
        
           // then attempts to modify them. Under pessimistic concurrency control, this 
        
           // would correspond to first acquiring a Shared lock on the keys and then 
        
           // converting the lock to an Exclusive lock when modifying the keys. If two 
        
           // transactions were to acquire the Shared lock initially and then attempt 
        
           // to update the keys concurrently, both transactions would get stuck 
        
           // waiting for the other to release its Shared lock and a deadlock would 
        
           // occur. To resolve the deadlock, one of the two transactions would need to 
        
           // be aborted. 
        
           // 
        
           // To avoid this potential deadlock problem, an Upgrade lock can be used in 
        
           // place of a Shared lock. Upgrade locks are not compatible with any other 
        
           // form of locking. As with Shared locks, the lock holder of a Shared lock 
        
           // on a key is only allowed to read from the key while the lock is held. 
        
           // This resolves the deadlock scenario presented above because only one of 
        
           // the transactions would have been able to acquire an Upgrade lock at a 
        
           // time while reading the initial state of the KVs. This means that the 
        
           // Shared-to-Exclusive lock upgrade would never need to wait on another 
        
           // transaction to release its locks. 
        
           // 
        
           // Under pure pessimistic concurrency control, an Upgrade lock is equivalent 
        
           // to an Exclusive lock. However, unlike with Exclusive locks, reads under 
        
           // optimistic concurrency control do not conflict with Upgrade locks. This 
        
           // is because a transaction can only hold an Upgrade lock on keys that it 
        
           // has not yet modified. This improves concurrency between read and write 
        
           // transactions compared to if the writing transaction had immediately 
        
           // acquired an Exclusive lock. 
        
           // 
        
           // The trade-off here is twofold. First, if the Upgrade lock holder does 
        
           // convert its lock on a key to an Exclusive lock after an optimistic read 
        
           // has observed the state of the key, the transaction that performed the 
        
           // optimistic read may be unable to perform a successful read refresh if it 
        
           // attempts to refresh to a timestamp at or past the timestamp of the lock 
        
           // conversion. Second, the optimistic reads permitted while the Upgrade lock 
        
           // is held will bump the timestamp cache. This may result in the Upgrade 
        
           // lock holder being forced to increase its write timestamp when converting 
        
           // to an Exclusive lock, which in turn may force it to restart if its read 
        
           // refresh fails. 
        
           Upgrade = 2;

The major benefit of this form of lock is that it does not conflict with non-locking reads. This avoids blocking these reads, at the expense of undermining some of the protection against transactions retries that is provided by SFU. This is because these non-locking reads are allowed to slip underneath Upgrade locks and bump the timestamp cache, which may force the UPDATE to have to push its timestamp when it returns to write an intent. This is probably ok, and will certainly be once we also support Shared locks and SELECT ... FOR SHARE so that users can have full control over row-level locking.

This should improve throughput on YCSB-B (95% reads, 5% updates). It's unclear how it will affect YCSB-A (50% reads, 50% updates).

Jira issue: CRDB-4209

The text was updated successfully, but these errors were encountered:

Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.

49891: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs r=yuzefovich a=yuzefovich **physicalplan: preevaluate subqueries on LocalExprs** When the plan is local, we do not serialize expressions. Previously, in such a case we would also not preevaluate the subqueries in the expressions which made `EXPLAIN (VEC)` return unexpected plan (there would `tree.Subquery` in the expression which we don't support in the vectorized, so we would wrap the plan). Now we will preevalute the subqueries before storing in the processor spec. AFAICT it affects only this explain variant and nothing else. Release note: None **colexec: improve expression parsing** This commit introduces `colexec.ExprHelper` that helps with expression processing. Previously, we were allocating a new `execinfra.ExprHelper` and were calling `Init` on it in order to get the typed expression from possibly serialized representation of each expression. Now, this new expression helper is reused between all expressions in the flow on a single node. There is one caveat, however: we need to make sure that we force deserialization of the expressions during `SupportsVectorized` check if the flow is scheduled to be run on a remote node (different from the one that is performing the check). This is necessary to make sure that the remote nodes will be able to deserialize the expressions without encountering errors (if we don't force the serialization during the check, we will use `LocalExpr` - if available - and might not catch things that we don't support). Release note: None **physicalplan: always store LocalExpr** Previously, we would set either `LocalExpr` (unserialized expression, only when we have the full plan on a single node) or `Expr` (serialized expression, when we have distributed plan as well as in some tests). However, we could be setting both and making best effort to reuse unserialized `LocalExpr` on the gateway even if the plan is distributed. And this commit adds such behavior. Fixes: #49810. Release note: None 49966: roachtest: adjust tpchvec and tpcdsvec r=yuzefovich a=yuzefovich **roachtest: add new tpchvec config** This commit adds a new `tpchvec/perf_no_stats` config that is the same as `tpchvec/perf` except for the fact that stats creation is disabled. The plans without stats are likely to be different, so it gives us an easy way to get more test coverage. One caveat here is that query 9 without stats takes insanely long to run, so some new plumbing has been added to skip that query. Additionally, `tpcdsvec` has been adjusted. The test runs all queries with and without stats present with `on` and `off` vectorize options. However, when stats are not present, `on` config will be reduced to `off` because of `vectorize_row_count_threshold` heuristic. This commit disables that heuristic. Release note: None **roachtest: switch the config order in tpchvec/perf** Let's see whether it makes difference to occasional failures of `tpchvec/perf` which are very hard to explain. This commit also changes the workload command for `perf` config to run only against node 1, thus, eliminating one possible source of "randomness" for the failures. Addresses: #49955. Release note: None 49980: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade r=nvanbenschoten a=nvanbenschoten Fixes #49658. Informs #9521. Informs #49973. Related to #49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for #49973 and avoids the 99th percentile latency regression observed in #49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in #49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like #33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run: #49658. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>

Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.

z0mb1ek · 2020-09-15T11:33:11Z

hi. need this for Keycloak working and may be Liquibase

ajwerner · 2020-10-22T19:19:19Z

@nvanbenschoten do you have thoughts on the likelihood for this in 21.1. and then as an add-on, what are the chances of getting ranged locks?

nvanbenschoten · 2020-10-22T19:26:15Z

I'd defer that to @sumeerbhola. This change doesn't seem particularly involved, though we wouldn't want to make the change until some of the larger lock table changes land in this release. Ranged locks are a much larger ask, but are being considered in the design of #55664.

sumeerbhola · 2020-10-23T01:26:37Z

@ajwerner Replicated range locks have performance implications, since one can't use storage engine bloom filters when looking for conflicting range locks in the lock table. I was speculating on #55664 that we could make the leaseholder maintain an in-memory summary to prevent a performance drop, though a new leaseholder will need to fallback to those reads, until it initializes its in-memory state by reading the range locks from storage (which is not ideal).
Can you make do with unreplicated range locks?

ajwerner · 2020-10-23T02:19:30Z

Unreplicated range locks would be great. I’ll take what I can get. Replicated locks would enable us to avoid refreshing reads but unreplicated locks are a lot better than nothing.

sumeerbhola · 2020-10-23T12:06:50Z

Unreplicated range locks would be great.

What strength. Can you open a separate issue?

ajwerner · 2020-10-23T14:50:23Z

Unreplicated range locks would be great.

What strength. Can you open a separate issue?

#55896

nvanbenschoten · 2021-05-26T02:05:35Z

Cross-posting a snippet of a thread with @sumeerbhola to ensure we don't miss this when we return to this issue:

Regarding an upgrade locking strength, I do still think there's room for increased concurrency between locking SFU reads and non-locking reads. However, we'll want to be guided by benchmarks here, mainly YCSB. I suspect that in doing so, we will find that we need to couple the upgrade locking strength with a second optimization - a way for implicit, single-row UPDATE statements to perform a server-side refresh during their {Put, EndTxn} batch. The idea here would be that if a transaction is going to write to all of the keys that it read from and ends up writing entirely to a single range, it should be able to implicitly refresh away bumps to its write timestamp due to read-write conflicts (but not write-write conflicts!). We already have some notion of server-side refreshes (see CanForwardReadTimestamp), but we'll need to extend that concept. Without this, I fear that the increased concurrency due to the weaker SFU locking strength would cause transaction retries that would negate any benefit provided by the increased concurrency.

The server-side refresh mechanism I'm referring to here is implemented in canDoServersideRetry.

In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. Nevertheless, the retry behavior there seems sound. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.

In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. In that case, which motivated the initial locking addition, we'll leave the locking. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.

67660: jobs: remove FOR UPDATE clause when updating job r=ajwerner a=ajwerner In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. Nevertheless, the retry behavior there seems sound. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS. 68241: changefeedccl: Change unreachable panic to unreachable error r=[miretskiy] a=HonoreDB I missed the context but apparently we're doing this everywhere we can. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Aaron Zinger <[email protected]>

In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. In that case, which motivated the initial locking addition, we'll leave the locking. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.

michae2 · 2023-07-28T19:03:06Z

Regarding how this locking strength would map to SELECT FOR UPDATE and SELECT FOR SHARE statements:

It might be interesting to expose this level of locking strength through SQL to applications (though I'm not sure what we'd call it, since PG already uses "FOR UPDATE" to mean exclusive locking). It could be useful when applications are not sure at the time of the locking SELECT whether any of the rows read will be modified later in the transaction. (FOR UPDATE would then mean the application is sure at least some of the rows read will be modified, and FOR SHARE would continue to mean the application is sure none of the rows read will be modified.)

(Or, alternatively, we could always use this level of locking strength for FOR UPDATE. But then what would exclusive locks be for?)

arulajmani · 2024-03-01T16:27:08Z

Removing the O-support label here. This was from a time when we thought we'd need to switch SELECT FOR UPDATE to acquire Update/Upgrade locks instead of Exclusive locks so that they don't conflict with non-locking readers. Our thinking has evolved since, and we expect non-locking readers to not conflict with Exclusive locks. See #117627 -- I'll apply O-support to that one instead.

nvanbenschoten self-assigned this May 29, 2020

nvanbenschoten mentioned this issue May 29, 2020

Performance regression in covering/duplicate indexes in 20.1 #49658

Closed

nvanbenschoten mentioned this issue Jun 8, 2020

kv/concurrency: drop uncontended replicated lock on unreplicated upgrade #49980

Merged

nvanbenschoten mentioned this issue Jun 11, 2020

release-20.1: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade #50119

Merged

sumeerbhola self-assigned this Oct 23, 2020

ajwerner mentioned this issue Oct 23, 2020

kv: implement ranged locks for unreplicated locks #55896

Closed

ajwerner mentioned this issue Nov 10, 2020

core: provide an temporary linearalizability mode (for taking consistent backups) #36208

Open

jlinder added the T-kv KV Team label Jun 16, 2021

ajwerner mentioned this issue Jul 15, 2021

jobs: remove FOR UPDATE clause when updating job #67660

Merged

blathers-crl bot mentioned this issue Jul 29, 2021

release-21.1: jobs: remove FOR UPDATE clause when updating job #68244

Merged

andreimatei mentioned this issue Nov 10, 2021

kv: buffered writes #72614

Open

exalate-issue-sync bot unassigned sumeerbhola Jan 12, 2022

nvanbenschoten mentioned this issue Mar 10, 2022

kv: Select For Update blocks non-locking read below closed timestamp #77476

Open

arulajmani mentioned this issue Nov 8, 2022

kv: implement SHARED lock strength for unreplicated locks #91545

Closed

nvanbenschoten mentioned this issue Jan 4, 2023

kv: non-locking reads ignore exclusive locks #94729

Closed

arulajmani changed the title ~~kv: implement Upgrade lock strength for unreplicated locks~~ kv: implement Update lock strength for unreplicated locks Jul 28, 2023

nvanbenschoten removed their assignment Feb 21, 2024

shralex added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Feb 29, 2024

arulajmani removed the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: implement Update lock strength for unreplicated locks #49684

kv: implement Update lock strength for unreplicated locks #49684

nvanbenschoten commented May 29, 2020 •

edited by exalate-issue-sync bot

Loading

z0mb1ek commented Sep 15, 2020 •

edited

Loading

ajwerner commented Oct 22, 2020

nvanbenschoten commented Oct 22, 2020

sumeerbhola commented Oct 23, 2020

ajwerner commented Oct 23, 2020

sumeerbhola commented Oct 23, 2020

ajwerner commented Oct 23, 2020

nvanbenschoten commented May 26, 2021

michae2 commented Jul 28, 2023 •

edited

Loading

arulajmani commented Mar 1, 2024

kv: implement Update lock strength for unreplicated locks #49684

kv: implement Update lock strength for unreplicated locks #49684

Comments

nvanbenschoten commented May 29, 2020 • edited by exalate-issue-sync bot Loading

z0mb1ek commented Sep 15, 2020 • edited Loading

ajwerner commented Oct 22, 2020

nvanbenschoten commented Oct 22, 2020

sumeerbhola commented Oct 23, 2020

ajwerner commented Oct 23, 2020

sumeerbhola commented Oct 23, 2020

ajwerner commented Oct 23, 2020

nvanbenschoten commented May 26, 2021

michae2 commented Jul 28, 2023 • edited Loading

arulajmani commented Mar 1, 2024

nvanbenschoten commented May 29, 2020 •

edited by exalate-issue-sync bot

Loading

z0mb1ek commented Sep 15, 2020 •

edited

Loading

michae2 commented Jul 28, 2023 •

edited

Loading