Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: Scans with limit scan too much of lockTable #49973

Closed
nvanbenschoten opened this issue Jun 8, 2020 · 4 comments · Fixed by #58670
Closed

kv: Scans with limit scan too much of lockTable #49973

nvanbenschoten opened this issue Jun 8, 2020 · 4 comments · Fixed by #58670
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@nvanbenschoten
Copy link
Member

This is analogous to #9521, but for the lockTable.

Now that the lockTable is scanned ahead of evaluation, limited scans observe locks over their entire key range instead of just up to the point where they hit their limit. This can artificially increase contention in much the same way it does in #9521. The difference is that in #9521, limited scans will wait excessively long for artificially contending operations to complete. In this issue, limited scans will wait excessively long for the entire transaction that issued the artificially contending operations to complete.

The design of the lockTable took this into consideration, though we haven't implemented a solution for this issue yet. Namely, the lockTable is built using an immutable btree which supports O(1) snapshots in the same way the the latchManager does. The point of this is 1) to be able to scan the lockTable incrementally once we introduce a lockAwareIterator and 2) to potentially do exactly what we're doing in #33373 and push that prototype over the finish line. #33373 evaluate limited scans optimistically without latching until after the fact, when it has determined the full bounds of the scan. The ability to snapshot the lockTable means that we could do the same thing there.

The saving grace for v20.1 is that most intents don't actually end up in the lockTable because replicated locks are only pulled into the lockTable when they are discovered in the MVCC keyspace. This means that in most cases, this is a non-issue under low to moderate contention. However, it can become an issue under either high contention or when unreplicated locks are used heavily. A patch I'm about to push will mitigate the latter concern.

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels Jun 8, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 8, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
@sumeerbhola
Copy link
Collaborator

This is relevant for the separated lock table prototype that I am working on. So I'd like to clarify the approach and correctness.
There are two options:

  • Check for conflicting locks during optimistic request evaluation: To keep the simplifications in mvcc.go regarding not knowing about intents and locks, one could have a special optimistic iterator that the code in mvcc.go is oblivious to, and the iterator would declare an error when encountering a lock conflict. This would cause an early failure so is potentially better than checking after the fact, but it is potentially more complicated.
  • Check for conflicting locks after optimistic request evaluation.

Do you have a preference?

Correctness: lockState holds mutable information, unlike latch. So we can't tell if conflicted with a lock since it may have changed. But it seems latching saves us.

  • latching will prevent other requests from starting after read request R has placed its latches in AcquireOptimistic. So the interesting ones are:
    • Other requests that have started before R but have not yet acquired/released locks. But we will see these when we do the latch check after optimistic evaluation, so they are not a concern.
    • Other requests that have completed and hold a lock. No other request can start after R starts that will change those locks.
  • Concurrent releases of unreplicated locks aren't actually harmful since that conflict is gone. But concurrent releases of replicated locks are harmful since they can change a provisional value to a committed value. Neither can happen, so we will see previously acquired locks when doing the consistency check after evaluation.

Does this sound right?

@nvanbenschoten
Copy link
Member Author

Do you have a preference?

I think I prefer the first option for two reasons:

  1. it avoids useless work in the case that we do hit a conflicting lock, allowing us to fail earlier
  2. it lets us get around answering the question of what we return from the iterator in the case that we hit a lock. For option 2, do we pretend the lock + its provisional value aren't there? Do we pretend they are already committed? This is important because mvcc.go may bail early if it detects certain logical invariant violations. For instance, Something like a CPut will return a ConditionFailedError in some cases, so we'd need to catch those and potentially swallow them after checking for conflicting locks if we went with option 2.

Does this sound right?

I had to go back and remind myself that AcquireOptimistic does, in-fact, place latches that block out other requests, it just doesn't wait on existing latches to start with. Given that, I agree with your reasoning.

Your point about concurrent releases of replicated locks being harmful because of the adjusted provisional value in interesting. We ideally want to get to a point where we can perform intent resolution without acquiring latches through the use of some indirection in an intentAwareIterator. I don't think that's in conflict with what we're discussing here. But to ensure correctness, we need to ensure that if the iterator returns the value for a pre-resolution value, the lock-table check cannot return the post-resolution lock state. The other three orderings (pre-resolution => old lock-table, post-resolution => old lock-table, post-resolution => new lock-table) are all valid.

So in that sense, I think the first option you listed above might make it easier to guarantee this, because we can enforce the invariant at a single atomicity point (during iteration) instead of trying to coordinate between what the iterator returns and what we later see in the lock table when we perform the optimistic conflict check. In other words, option 1 still ensures that we check the lock-table for a given key before ever observing the corresponding MVCC state, it just does so dynamically while scanning instead of upfront like we do today. Option 2 inverts the order of these two operations, which I suspect will lead to complexity here.

Does that line up with your thinking?

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 10, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
@sumeerbhola
Copy link
Collaborator

For instance, Something like a CPut will return a ConditionFailedError in some cases, so we'd need to catch those and potentially swallow them after checking for conflicting locks if we went with option 2.

That makes sense. Option 1 it is.

But to ensure correctness, we need to ensure that if the iterator returns the value for a pre-resolution value, the lock-table check cannot return the post-resolution lock state. The other three orderings (pre-resolution => old lock-table, post-resolution => old lock-table, post-resolution => new lock-table) are all valid.
So in that sense, I think the first option you listed above might make it easier to guarantee this, because we can enforce the invariant at a single atomicity point (during iteration) instead of trying to coordinate between what the iterator returns and what we later see in the lock table when we perform the optimistic conflict check.

I wasn't thinking about intent resolution without latches -- I'll keep it in mind. Since we will create the storage.Iterator in the lockTable before any optimistic evaluation, it should be easy to ensure that concurrently resolved intents are noticed when doing the check if we happened to do option 2.
For request evaluation in option 1 we'll need to order the creation of the storage.Iterator such that the one on the lock table key space is created before the one on the MVCC key space (since we don't want to create snapshots for request evaluation) so that we don't see a provisional value and not see the MVCCMetadata. Seeing the MVCCMetadata without the provisional value is fine since either the MVCCMetadata has a timestamp above this read and we were going to ignore the provisional value (and resolution will keep that invariant), or it is below this read in which case we are going to immediately exit this optimistic evaluation without trying to look for the missing provisional value.

craig bot pushed a commit that referenced this issue Jun 10, 2020
49891: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs r=yuzefovich a=yuzefovich

**physicalplan: preevaluate subqueries on LocalExprs**

When the plan is local, we do not serialize expressions. Previously, in
such a case we would also not preevaluate the subqueries in the
expressions which made `EXPLAIN (VEC)` return unexpected plan (there
would `tree.Subquery` in the expression which we don't support in the
vectorized, so we would wrap the plan). Now we will preevalute the
subqueries before storing in the processor spec. AFAICT it affects only
this explain variant and nothing else.

Release note: None

**colexec: improve expression parsing**

This commit introduces `colexec.ExprHelper` that helps with expression
processing. Previously, we were allocating a new `execinfra.ExprHelper`
and were calling `Init` on it in order to get the typed expression from
possibly serialized representation of each expression. Now, this new
expression helper is reused between all expressions in the flow on
a single node.

There is one caveat, however: we need to make sure that we force
deserialization of the expressions during `SupportsVectorized` check if
the flow is scheduled to be run on a remote node (different from the one
that is performing the check). This is necessary to make sure that the
remote nodes will be able to deserialize the expressions without
encountering errors (if we don't force the serialization during the
check, we will use `LocalExpr` - if available - and might not catch
things that we don't support).

Release note: None

**physicalplan: always store LocalExpr**

Previously, we would set either `LocalExpr` (unserialized expression,
only when we have the full plan on a single node) or `Expr` (serialized
expression, when we have distributed plan as well as in some tests).
However, we could be setting both and making best effort to reuse
unserialized `LocalExpr` on the gateway even if the plan is distributed.
And this commit adds such behavior.

Fixes: #49810.

Release note: None

49966: roachtest: adjust tpchvec and tpcdsvec r=yuzefovich a=yuzefovich

**roachtest: add new tpchvec config**

This commit adds a new `tpchvec/perf_no_stats` config that is the same
as `tpchvec/perf` except for the fact that stats creation is disabled.
The plans without stats are likely to be different, so it gives us an
easy way to get more test coverage. One caveat here is that query
9 without stats takes insanely long to run, so some new plumbing has
been added to skip that query.

Additionally, `tpcdsvec` has been adjusted. The test runs all queries
with and without stats present with `on` and `off` vectorize options.
However, when stats are not present, `on` config will be reduced to
`off` because of `vectorize_row_count_threshold` heuristic. This commit
disables that heuristic.

Release note: None

**roachtest: switch the config order in tpchvec/perf**

Let's see whether it makes difference to occasional failures of
`tpchvec/perf` which are very hard to explain.

This commit also changes the workload command for `perf` config to run
only against node 1, thus, eliminating one possible source of
"randomness" for the failures.

Addresses: #49955.

Release note: None

49980: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade r=nvanbenschoten a=nvanbenschoten

Fixes #49658.
Informs #9521.
Informs #49973.
Related to #49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for #49973 and avoids the 99th percentile latency regression observed in #49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible.

If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in #49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like #33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run: #49658.

Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten
Copy link
Member Author

For request evaluation in option 1 we'll need to order the creation of the storage.Iterator such that the one on the lock table key space is created before the one on the MVCC key space.

Yes, we'll need to ensure this for the reasons you listed. I think that's how things would naturally be ordered, anyway, so we shouldn't have to do anything special here.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 11, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
@sumeerbhola sumeerbhola self-assigned this Dec 3, 2020
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 15, 2020
(There are bugs here, since existing tests fail. And I have
not added new tests yet -- I would like an opinion on the
high-level approach before fixing those.)

This is a cleanup in preparation for the future, and also
has some, probably minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 22, 2020
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs cockroachdb#41720

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 4, 2021
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs cockroachdb#41720

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 5, 2021
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs cockroachdb#41720

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 6, 2021
This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in cockroachdb#49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs cockroachdb#41720

Release note: None
craig bot pushed a commit that referenced this issue Jan 6, 2021
57947: concurrency: move finalizedTxnCache into lock table r=sumeerbhola a=sumeerbhola

This is a cleanup in preparation for the future, and also
has some, possibly minor, immediate benefits.

In the future, the lock table will support multiple intents for
the same key if all but one are known to be finalized. So the
finalizedTxnCache belongs in the lock table data-structure.
Additionally, we will support intent resolution without holding
latches, which has some implications on data-structure
consistency: request evaluation will not be allowed to add
discovered intents to the lock table since the discovery may be
stale. This PR is not changing this discovery behavior since we
need it for now (due to interleaved intents), but it moves us
along the path towards the lock table data-structure not
relying on external behavior for maintaining its in-memory
"cache" of locks. Specifically, removing intents from the lock
table when the intent is still present in the engine is not
principled. We currently do this in two places:
- for optimizing limited scans: a later PR will fix this properly
  by checking the lock table after request evaluation, as
  outlined in #49973.
- using the finalizedTxnCache in the lockTableWaiterImpl: this
  use is changed in this PR. The code in the lock table also does
  removal of intents before resolution, but there is a TODO to
  fix that in the future. It should be easier to do this with the
  behavior contained in the lock table.

The immediate benefits, which may not have any practical
significance, are:
- We no longer resolve unreplicated locks -- they are simply
  removed.
- A replicated lock is removed from the lock table data-structure
  only when the requester has finished a scan and is in a
  position to do resolution. Earlier one could remove the lock
  but block on another lock, and not do intent resolution on
  the first lock. This would cause wasteful evaluation of other
  requests.

Informs #41720 

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 8, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 12, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 12, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 12, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 13, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jan 15, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue May 12, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.

name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans to false.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue May 18, 2021
This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.

name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)

Fixes cockroachdb#49973
Informs cockroachdb#9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.
craig bot pushed a commit that referenced this issue May 20, 2021
58670: concurrency,kvserver: limited scans optimistically check for locks r=sumeerbhola a=sumeerbhola

This optimistic checking happens after the evaluation. The evaluation
will already discover any conflicting intents, so the subsequent
checking of the lock table is primarily to find conflicting
unreplicated locks.

This structure should be easy to extend to optimistic latching.

Benchmark numbers from the new kv roachtest that does SFU to
introduce false contention:
```
name                                           old ops/sec  new ops/sec  delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq   5.68k ± 0%   9.96k ± 1%  +75.46%  (p=0.000 n=8+9)

name                                           old p50      new p50      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    13.1 ± 0%     5.5 ± 0%  -58.02%  (p=0.000 n=8+8)

name                                           old p95      new p95      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    18.9 ± 0%    16.8 ± 0%  -11.11%  (p=0.001 n=8+9)

name                                           old p99      new p99      delta
kv95limited-spans/enc=false/nodes=1/splt=0/seq    22.0 ± 0%    25.6 ± 2%  +16.57%  (p=0.000 n=8+9)
```

Microbenchmark numbers for the OptimisticEval microbenchmark,
where the real-contention=true case typically causes optimistic
evaluation to retry.
```
name                                     old time/op    new time/op    delta
OptimisticEval/real-contention=false-16    5.76ms ± 5%    0.03ms ± 2%  -99.49%  (p=0.000 n=8+10)
OptimisticEval/real-contention=true-16     5.75ms ± 4%    5.63ms ± 5%     ~     (p=0.393 n=10+10)

name                                     old alloc/op   new alloc/op   delta
OptimisticEval/real-contention=false-16    34.3kB ±24%     9.9kB ± 2%  -71.29%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16     33.0kB ±20%    35.7kB ± 9%     ~     (p=0.065 n=8+8)

name                                     old allocs/op  new allocs/op  delta
OptimisticEval/real-contention=false-16       273 ±19%        83 ± 0%  -69.56%  (p=0.000 n=9+10)
OptimisticEval/real-contention=true-16        268 ± 2%       308 ± 2%  +14.90%  (p=0.000 n=8+8)
```

Fixes #49973
Informs #9521

Release note (performance improvement): A limited scan now checks
for conflicting locks in an optimistic manner, which means it will
not conflict with locks (typically unreplicated locks) that were
held in the scan's full spans, but were not in the spans that were
scanned until the limit was reached. This behavior can be turned
off by changing the value of the cluster setting
kv.concurrency.optimistic_eval_limited_scans.enabled to false.


Co-authored-by: sumeerbhola <[email protected]>
@craig craig bot closed this as completed in 31847ac May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants