-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: Scans with limit scan too much of lockTable #49973
Comments
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
This is relevant for the separated lock table prototype that I am working on. So I'd like to clarify the approach and correctness.
Do you have a preference? Correctness:
Does this sound right? |
I think I prefer the first option for two reasons:
I had to go back and remind myself that Your point about concurrent releases of replicated locks being harmful because of the adjusted provisional value in interesting. We ideally want to get to a point where we can perform intent resolution without acquiring latches through the use of some indirection in an So in that sense, I think the first option you listed above might make it easier to guarantee this, because we can enforce the invariant at a single atomicity point (during iteration) instead of trying to coordinate between what the iterator returns and what we later see in the lock table when we perform the optimistic conflict check. In other words, option 1 still ensures that we check the lock-table for a given key before ever observing the corresponding MVCC state, it just does so dynamically while scanning instead of upfront like we do today. Option 2 inverts the order of these two operations, which I suspect will lead to complexity here. Does that line up with your thinking? |
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
That makes sense. Option 1 it is.
I wasn't thinking about intent resolution without latches -- I'll keep it in mind. Since we will create the |
49891: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs r=yuzefovich a=yuzefovich **physicalplan: preevaluate subqueries on LocalExprs** When the plan is local, we do not serialize expressions. Previously, in such a case we would also not preevaluate the subqueries in the expressions which made `EXPLAIN (VEC)` return unexpected plan (there would `tree.Subquery` in the expression which we don't support in the vectorized, so we would wrap the plan). Now we will preevalute the subqueries before storing in the processor spec. AFAICT it affects only this explain variant and nothing else. Release note: None **colexec: improve expression parsing** This commit introduces `colexec.ExprHelper` that helps with expression processing. Previously, we were allocating a new `execinfra.ExprHelper` and were calling `Init` on it in order to get the typed expression from possibly serialized representation of each expression. Now, this new expression helper is reused between all expressions in the flow on a single node. There is one caveat, however: we need to make sure that we force deserialization of the expressions during `SupportsVectorized` check if the flow is scheduled to be run on a remote node (different from the one that is performing the check). This is necessary to make sure that the remote nodes will be able to deserialize the expressions without encountering errors (if we don't force the serialization during the check, we will use `LocalExpr` - if available - and might not catch things that we don't support). Release note: None **physicalplan: always store LocalExpr** Previously, we would set either `LocalExpr` (unserialized expression, only when we have the full plan on a single node) or `Expr` (serialized expression, when we have distributed plan as well as in some tests). However, we could be setting both and making best effort to reuse unserialized `LocalExpr` on the gateway even if the plan is distributed. And this commit adds such behavior. Fixes: #49810. Release note: None 49966: roachtest: adjust tpchvec and tpcdsvec r=yuzefovich a=yuzefovich **roachtest: add new tpchvec config** This commit adds a new `tpchvec/perf_no_stats` config that is the same as `tpchvec/perf` except for the fact that stats creation is disabled. The plans without stats are likely to be different, so it gives us an easy way to get more test coverage. One caveat here is that query 9 without stats takes insanely long to run, so some new plumbing has been added to skip that query. Additionally, `tpcdsvec` has been adjusted. The test runs all queries with and without stats present with `on` and `off` vectorize options. However, when stats are not present, `on` config will be reduced to `off` because of `vectorize_row_count_threshold` heuristic. This commit disables that heuristic. Release note: None **roachtest: switch the config order in tpchvec/perf** Let's see whether it makes difference to occasional failures of `tpchvec/perf` which are very hard to explain. This commit also changes the workload command for `perf` config to run only against node 1, thus, eliminating one possible source of "randomness" for the failures. Addresses: #49955. Release note: None 49980: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade r=nvanbenschoten a=nvanbenschoten Fixes #49658. Informs #9521. Informs #49973. Related to #49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for #49973 and avoids the 99th percentile latency regression observed in #49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in #49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like #33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run: #49658. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Yes, we'll need to ensure this for the reasons you listed. I think that's how things would naturally be ordered, anyway, so we shouldn't have to do anything special here. |
Fixes cockroachdb#49658. Informs cockroachdb#9521. Informs cockroachdb#49973. Related to cockroachdb#49684. This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile latency regression observed in cockroachdb#49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible. If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like cockroachdb#33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change. I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run. Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.
(There are bugs here, since existing tests fail. And I have not added new tests yet -- I would like an opinion on the high-level approach before fixing those.) This is a cleanup in preparation for the future, and also has some, probably minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Release note: None
This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs cockroachdb#41720 Release note: None
This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs cockroachdb#41720 Release note: None
This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs cockroachdb#41720 Release note: None
This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in cockroachdb#49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs cockroachdb#41720 Release note: None
57947: concurrency: move finalizedTxnCache into lock table r=sumeerbhola a=sumeerbhola This is a cleanup in preparation for the future, and also has some, possibly minor, immediate benefits. In the future, the lock table will support multiple intents for the same key if all but one are known to be finalized. So the finalizedTxnCache belongs in the lock table data-structure. Additionally, we will support intent resolution without holding latches, which has some implications on data-structure consistency: request evaluation will not be allowed to add discovered intents to the lock table since the discovery may be stale. This PR is not changing this discovery behavior since we need it for now (due to interleaved intents), but it moves us along the path towards the lock table data-structure not relying on external behavior for maintaining its in-memory "cache" of locks. Specifically, removing intents from the lock table when the intent is still present in the engine is not principled. We currently do this in two places: - for optimizing limited scans: a later PR will fix this properly by checking the lock table after request evaluation, as outlined in #49973. - using the finalizedTxnCache in the lockTableWaiterImpl: this use is changed in this PR. The code in the lock table also does removal of intents before resolution, but there is a TODO to fix that in the future. It should be easier to do this with the behavior contained in the lock table. The immediate benefits, which may not have any practical significance, are: - We no longer resolve unreplicated locks -- they are simply removed. - A replicated lock is removed from the lock table data-structure only when the requester has finished a scan and is in a position to do resolution. Earlier one could remove the lock but block on another lock, and not do intent resolution on the first lock. This would cause wasteful evaluation of other requests. Informs #41720 Release note: None Co-authored-by: sumeerbhola <[email protected]>
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note: None
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans to false.
This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) Fixes cockroachdb#49973 Informs cockroachdb#9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false.
58670: concurrency,kvserver: limited scans optimistically check for locks r=sumeerbhola a=sumeerbhola This optimistic checking happens after the evaluation. The evaluation will already discover any conflicting intents, so the subsequent checking of the lock table is primarily to find conflicting unreplicated locks. This structure should be easy to extend to optimistic latching. Benchmark numbers from the new kv roachtest that does SFU to introduce false contention: ``` name old ops/sec new ops/sec delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 5.68k ± 0% 9.96k ± 1% +75.46% (p=0.000 n=8+9) name old p50 new p50 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 13.1 ± 0% 5.5 ± 0% -58.02% (p=0.000 n=8+8) name old p95 new p95 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 18.9 ± 0% 16.8 ± 0% -11.11% (p=0.001 n=8+9) name old p99 new p99 delta kv95limited-spans/enc=false/nodes=1/splt=0/seq 22.0 ± 0% 25.6 ± 2% +16.57% (p=0.000 n=8+9) ``` Microbenchmark numbers for the OptimisticEval microbenchmark, where the real-contention=true case typically causes optimistic evaluation to retry. ``` name old time/op new time/op delta OptimisticEval/real-contention=false-16 5.76ms ± 5% 0.03ms ± 2% -99.49% (p=0.000 n=8+10) OptimisticEval/real-contention=true-16 5.75ms ± 4% 5.63ms ± 5% ~ (p=0.393 n=10+10) name old alloc/op new alloc/op delta OptimisticEval/real-contention=false-16 34.3kB ±24% 9.9kB ± 2% -71.29% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 33.0kB ±20% 35.7kB ± 9% ~ (p=0.065 n=8+8) name old allocs/op new allocs/op delta OptimisticEval/real-contention=false-16 273 ±19% 83 ± 0% -69.56% (p=0.000 n=9+10) OptimisticEval/real-contention=true-16 268 ± 2% 308 ± 2% +14.90% (p=0.000 n=8+8) ``` Fixes #49973 Informs #9521 Release note (performance improvement): A limited scan now checks for conflicting locks in an optimistic manner, which means it will not conflict with locks (typically unreplicated locks) that were held in the scan's full spans, but were not in the spans that were scanned until the limit was reached. This behavior can be turned off by changing the value of the cluster setting kv.concurrency.optimistic_eval_limited_scans.enabled to false. Co-authored-by: sumeerbhola <[email protected]>
This is analogous to #9521, but for the
lockTable
.Now that the
lockTable
is scanned ahead of evaluation, limited scans observe locks over their entire key range instead of just up to the point where they hit their limit. This can artificially increase contention in much the same way it does in #9521. The difference is that in #9521, limited scans will wait excessively long for artificially contending operations to complete. In this issue, limited scans will wait excessively long for the entire transaction that issued the artificially contending operations to complete.The design of the
lockTable
took this into consideration, though we haven't implemented a solution for this issue yet. Namely, thelockTable
is built using an immutable btree which supports O(1) snapshots in the same way the thelatchManager
does. The point of this is 1) to be able to scan the lockTable incrementally once we introduce a lockAwareIterator and 2) to potentially do exactly what we're doing in #33373 and push that prototype over the finish line. #33373 evaluate limited scans optimistically without latching until after the fact, when it has determined the full bounds of the scan. The ability to snapshot thelockTable
means that we could do the same thing there.The saving grace for v20.1 is that most intents don't actually end up in the
lockTable
because replicated locks are only pulled into the lockTable when they are discovered in the MVCC keyspace. This means that in most cases, this is a non-issue under low to moderate contention. However, it can become an issue under either high contention or when unreplicated locks are used heavily. A patch I'm about to push will mitigate the latter concern.The text was updated successfully, but these errors were encountered: