-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: implement Update lock strength for unreplicated locks #49684
Comments
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
49891: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs r=yuzefovich a=yuzefovich **physicalplan: preevaluate subqueries on LocalExprs** When the plan is local, we do not serialize expressions. Previously, in such a case we would also not preevaluate the subqueries in the expressions which made `EXPLAIN (VEC)` return unexpected plan (there would `tree.Subquery` in the expression which we don't support in the vectorized, so we would wrap the plan). Now we will preevalute the subqueries before storing in the processor spec. AFAICT it affects only this explain variant and nothing else. Release note: None **colexec: improve expression parsing** This commit introduces `colexec.ExprHelper` that helps with expression processing. Previously, we were allocating a new `execinfra.ExprHelper` and were calling `Init` on it in order to get the typed expression from possibly serialized representation of each expression. Now, this new expression helper is reused between all expressions in the flow on a single node. There is one caveat, however: we need to make sure that we force deserialization of the expressions during `SupportsVectorized` check if the flow is scheduled to be run on a remote node (different from the one that is performing the check). This is necessary to make sure that the remote nodes will be able to deserialize the expressions without encountering errors (if we don't force the serialization during the check, we will use `LocalExpr` - if available - and might not catch things that we don't support). Release note: None **physicalplan: always store LocalExpr** Previously, we would set either `LocalExpr` (unserialized expression, only when we have the full plan on a single node) or `Expr` (serialized expression, when we have distributed plan as well as in some tests). However, we could be setting both and making best effort to reuse unserialized `LocalExpr` on the gateway even if the plan is distributed. And this commit adds such behavior. Fixes: #49810. Release note: None 49966: roachtest: adjust tpchvec and tpcdsvec r=yuzefovich a=yuzefovich **roachtest: add new tpchvec config** This commit adds a new `tpchvec/perf_no_stats` config that is the same as `tpchvec/perf` except for the fact that stats creation is disabled. The plans without stats are likely to be different, so it gives us an easy way to get more test coverage. One caveat here is that query 9 without stats takes insanely long to run, so some new plumbing has been added to skip that query. Additionally, `tpcdsvec` has been adjusted. The test runs all queries with and without stats present with `on` and `off` vectorize options. However, when stats are not present, `on` config will be reduced to `off` because of `vectorize_row_count_threshold` heuristic. This commit disables that heuristic. Release note: None **roachtest: switch the config order in tpchvec/perf** Let's see whether it makes difference to occasional failures of `tpchvec/perf` which are very hard to explain. This commit also changes the workload command for `perf` config to run only against node 1, thus, eliminating one possible source of "randomness" for the failures. Addresses: #49955. Release note: None 49980: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade r=nvanbenschoten a=nvanbenschoten Fixes #49658. Informs #9521. Informs #49973. Related to #49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for #49973 and avoids the 99th percentile latency regression observed in #49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in #49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like #33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run: #49658. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
hi. need this for Keycloak working and may be Liquibase |
@nvanbenschoten do you have thoughts on the likelihood for this in 21.1. and then as an add-on, what are the chances of getting ranged locks? |
I'd defer that to @sumeerbhola. This change doesn't seem particularly involved, though we wouldn't want to make the change until some of the larger lock table changes land in this release. Ranged locks are a much larger ask, but are being considered in the design of #55664. |
@ajwerner Replicated range locks have performance implications, since one can't use storage engine bloom filters when looking for conflicting range locks in the lock table. I was speculating on #55664 that we could make the leaseholder maintain an in-memory summary to prevent a performance drop, though a new leaseholder will need to fallback to those reads, until it initializes its in-memory state by reading the range locks from storage (which is not ideal). |
Unreplicated range locks would be great. I’ll take what I can get. Replicated locks would enable us to avoid refreshing reads but unreplicated locks are a lot better than nothing. |
What strength. Can you open a separate issue? |
|
Cross-posting a snippet of a thread with @sumeerbhola to ensure we don't miss this when we return to this issue:
The server-side refresh mechanism I'm referring to here is implemented in |
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. Nevertheless, the retry behavior there seems sound. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. In that case, which motivated the initial locking addition, we'll leave the locking. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. In that case, which motivated the initial locking addition, we'll leave the locking. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.
67660: jobs: remove FOR UPDATE clause when updating job r=ajwerner a=ajwerner In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. Nevertheless, the retry behavior there seems sound. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS. 68241: changefeedccl: Change unreachable panic to unreachable error r=[miretskiy] a=HonoreDB I missed the context but apparently we're doing this everywhere we can. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Aaron Zinger <[email protected]>
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That means that both clients trying to inspect jobs and the job adoption loops will both try to scan the table and encounter these locks. For the most part, we don't really update the job from the leaves of a distsql flow. There is an exception which is IMPORT incrementing a sequence. In that case, which motivated the initial locking addition, we'll leave the locking. The other exception is pausing or canceling jobs. I think that in that case we prefer to invalidate the work of the transaction as our intention is to cancel it. If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would not be a problem. Release note (performance improvement): Jobs no longer hold exclusive locks during the duration of their checkpointing transactions which can result in long wait times when trying to run SHOW JOBS.
Regarding how this locking strength would map to It might be interesting to expose this level of locking strength through SQL to applications (though I'm not sure what we'd call it, since PG already uses "FOR UPDATE" to mean exclusive locking). It could be useful when applications are not sure at the time of the locking (Or, alternatively, we could always use this level of locking strength for |
Removing the O-support label here. This was from a time when we thought we'd need to switch |
The
lockTable
currently only supports the exclusive lock strength, but was designed to support varying degrees of lock strengths. Specifically, we'd eventually like to add support forShared
andUpgrade
locks as well.cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto
Lines 40 to 50 in 9e1666b
We currently have no need for
Shared
locks, butUpgrade
locks would be immediately useful for unreplicated locks acquired during the initial row-fetch of anUPDATE
statement (implicit SFU) or by aSELECT ... FOR UPDATE
statement (explicit SFU).cockroach/pkg/kv/kvserver/concurrency/lock/locking.proto
Lines 90 to 130 in 9e1666b
The major benefit of this form of lock is that it does not conflict with non-locking reads. This avoids blocking these reads, at the expense of undermining some of the protection against transactions retries that is provided by SFU. This is because these non-locking reads are allowed to slip underneath Upgrade locks and bump the timestamp cache, which may force the UPDATE to have to push its timestamp when it returns to write an intent. This is probably ok, and will certainly be once we also support
Shared
locks andSELECT ... FOR SHARE
so that users can have full control over row-level locking.This should improve throughput on YCSB-B (95% reads, 5% updates). It's unclear how it will affect YCSB-A (50% reads, 50% updates).
Jira issue: CRDB-4209
The text was updated successfully, but these errors were encountered: