Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: implement Update lock strength for unreplicated locks #49684

Open
nvanbenschoten opened this issue May 29, 2020 · 10 comments
Open

kv: implement Update lock strength for unreplicated locks #49684

nvanbenschoten opened this issue May 29, 2020 · 10 comments
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented May 29, 2020

The lockTable currently only supports the exclusive lock strength, but was designed to support varying degrees of lock strengths. Specifically, we'd eventually like to add support for Shared and Upgrade locks as well.

// +-----------+-----------+-----------+-----------+-----------+
// | | None | Shared | Upgrade | Exclusive |
// +-----------+-----------+-----------+-----------+-----------+
// | None | | | | X^† |
// +-----------+-----------+-----------+-----------+-----------+
// | Shared | | | X | X |
// +-----------+-----------+-----------+-----------+-----------+
// | Upgrade | | X | X | X |
// +-----------+-----------+-----------+-----------+-----------+
// | Exclusive | X^† | X | X | X |
// +-----------+-----------+-----------+-----------+-----------+

We currently have no need for Shared locks, but Upgrade locks would be immediately useful for unreplicated locks acquired during the initial row-fetch of an UPDATE statement (implicit SFU) or by a SELECT ... FOR UPDATE statement (explicit SFU).

// Upgrade (U) locks are a hybrid of Shared and Exclusive locks which are
// used to prevent a common form of deadlock. When a transaction intends to
// modify existing KVs, it is often the case that it reads the KVs first and
// then attempts to modify them. Under pessimistic concurrency control, this
// would correspond to first acquiring a Shared lock on the keys and then
// converting the lock to an Exclusive lock when modifying the keys. If two
// transactions were to acquire the Shared lock initially and then attempt
// to update the keys concurrently, both transactions would get stuck
// waiting for the other to release its Shared lock and a deadlock would
// occur. To resolve the deadlock, one of the two transactions would need to
// be aborted.
//
// To avoid this potential deadlock problem, an Upgrade lock can be used in
// place of a Shared lock. Upgrade locks are not compatible with any other
// form of locking. As with Shared locks, the lock holder of a Shared lock
// on a key is only allowed to read from the key while the lock is held.
// This resolves the deadlock scenario presented above because only one of
// the transactions would have been able to acquire an Upgrade lock at a
// time while reading the initial state of the KVs. This means that the
// Shared-to-Exclusive lock upgrade would never need to wait on another
// transaction to release its locks.
//
// Under pure pessimistic concurrency control, an Upgrade lock is equivalent
// to an Exclusive lock. However, unlike with Exclusive locks, reads under
// optimistic concurrency control do not conflict with Upgrade locks. This
// is because a transaction can only hold an Upgrade lock on keys that it
// has not yet modified. This improves concurrency between read and write
// transactions compared to if the writing transaction had immediately
// acquired an Exclusive lock.
//
// The trade-off here is twofold. First, if the Upgrade lock holder does
// convert its lock on a key to an Exclusive lock after an optimistic read
// has observed the state of the key, the transaction that performed the
// optimistic read may be unable to perform a successful read refresh if it
// attempts to refresh to a timestamp at or past the timestamp of the lock
// conversion. Second, the optimistic reads permitted while the Upgrade lock
// is held will bump the timestamp cache. This may result in the Upgrade
// lock holder being forced to increase its write timestamp when converting
// to an Exclusive lock, which in turn may force it to restart if its read
// refresh fails.
Upgrade = 2;

The major benefit of this form of lock is that it does not conflict with non-locking reads. This avoids blocking these reads, at the expense of undermining some of the protection against transactions retries that is provided by SFU. This is because these non-locking reads are allowed to slip underneath Upgrade locks and bump the timestamp cache, which may force the UPDATE to have to push its timestamp when it returns to write an intent. This is probably ok, and will certainly be once we also support Shared locks and SELECT ... FOR SHARE so that users can have full control over row-level locking.

This should improve throughput on YCSB-B (95% reads, 5% updates). It's unclear how it will affect YCSB-A (50% reads, 50% updates).

Jira issue: CRDB-4209

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels May 29, 2020
@nvanbenschoten nvanbenschoten self-assigned this May 29, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 8, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 10, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
craig bot pushed a commit that referenced this issue Jun 10, 2020
49891: physicalplan: preevaluate subqueries on LocalExprs and always set LocalExprs r=yuzefovich a=yuzefovich

**physicalplan: preevaluate subqueries on LocalExprs**

When the plan is local, we do not serialize expressions. Previously, in
such a case we would also not preevaluate the subqueries in the
expressions which made `EXPLAIN (VEC)` return unexpected plan (there
would `tree.Subquery` in the expression which we don't support in the
vectorized, so we would wrap the plan). Now we will preevalute the
subqueries before storing in the processor spec. AFAICT it affects only
this explain variant and nothing else.

Release note: None

**colexec: improve expression parsing**

This commit introduces `colexec.ExprHelper` that helps with expression
processing. Previously, we were allocating a new `execinfra.ExprHelper`
and were calling `Init` on it in order to get the typed expression from
possibly serialized representation of each expression. Now, this new
expression helper is reused between all expressions in the flow on
a single node.

There is one caveat, however: we need to make sure that we force
deserialization of the expressions during `SupportsVectorized` check if
the flow is scheduled to be run on a remote node (different from the one
that is performing the check). This is necessary to make sure that the
remote nodes will be able to deserialize the expressions without
encountering errors (if we don't force the serialization during the
check, we will use `LocalExpr` - if available - and might not catch
things that we don't support).

Release note: None

**physicalplan: always store LocalExpr**

Previously, we would set either `LocalExpr` (unserialized expression,
only when we have the full plan on a single node) or `Expr` (serialized
expression, when we have distributed plan as well as in some tests).
However, we could be setting both and making best effort to reuse
unserialized `LocalExpr` on the gateway even if the plan is distributed.
And this commit adds such behavior.

Fixes: #49810.

Release note: None

49966: roachtest: adjust tpchvec and tpcdsvec r=yuzefovich a=yuzefovich

**roachtest: add new tpchvec config**

This commit adds a new `tpchvec/perf_no_stats` config that is the same
as `tpchvec/perf` except for the fact that stats creation is disabled.
The plans without stats are likely to be different, so it gives us an
easy way to get more test coverage. One caveat here is that query
9 without stats takes insanely long to run, so some new plumbing has
been added to skip that query.

Additionally, `tpcdsvec` has been adjusted. The test runs all queries
with and without stats present with `on` and `off` vectorize options.
However, when stats are not present, `on` config will be reduced to
`off` because of `vectorize_row_count_threshold` heuristic. This commit
disables that heuristic.

Release note: None

**roachtest: switch the config order in tpchvec/perf**

Let's see whether it makes difference to occasional failures of
`tpchvec/perf` which are very hard to explain.

This commit also changes the workload command for `perf` config to run
only against node 1, thus, eliminating one possible source of
"randomness" for the failures.

Addresses: #49955.

Release note: None

49980: kv/concurrency: drop uncontended replicated lock on unreplicated upgrade r=nvanbenschoten a=nvanbenschoten

Fixes #49658.
Informs #9521.
Informs #49973.
Related to #49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to drop write-uncontended locks when upgraded from the Unreplicated to Replicated durability in much the same way we drop Replicated locks when first acquired. This is possible because a Replicated lock is also stored as an MVCC intent, so it does not need to also be stored in the lockTable if writers are not queuing on it. This is beneficial because it serves as a mitigation for #49973 and avoids the 99th percentile latency regression observed in #49658. Since we aren't currently great at avoiding excessive contention on limited scans when locks are in the lockTable, it's better the keep locks out of the lockTable when possible.

If any of the readers do truly contend with this lock even after their limit has been applied, they will notice during their MVCC scan and re-enter the queue (possibly recreating the lock through AddDiscoveredLock). Still, in practice this seems to work well in avoiding most of the artificial concurrency discussed in #49973. It's a bit of a hack and I am very interested in fixing this fully in the future (through an approach like #33373 or by incrementally consulting the lockTable in a `lockAwareIterator`), but for now, I don't see a downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one of the demos we like to run: #49658.

Release note (performance improvement): limited SELECT statements now do a better job avoiding unnecessary contention with UPDATE and SELECT FOR UPDATE statements.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 11, 2020
Fixes cockroachdb#49658.
Informs cockroachdb#9521.
Informs cockroachdb#49973.
Related to cockroachdb#49684.

This commit tweaks the `lockTable`'s handling of lock acquisition to
drop write-uncontended locks when upgraded from the Unreplicated to
Replicated durability in much the same way we drop Replicated locks when
first acquired. This is possible because a Replicated lock is also
stored as an MVCC intent, so it does not need to also be stored in the
lockTable if writers are not queuing on it. This is beneficial because
it serves as a mitigation for cockroachdb#49973 and avoids the 99th percentile
latency regression observed in cockroachdb#49658. Since we aren't currently great
at avoiding excessive contention on limited scans when locks are in the
lockTable, it's better the keep locks out of the lockTable when
possible.

If any of the readers do truly contend with this lock even after their
limit has been applied, they will notice during their MVCC scan and
re-enter the queue (possibly recreating the lock through
AddDiscoveredLock). Still, in practice this seems to work well in
avoiding most of the artificial concurrency discussed in cockroachdb#49973. It's a
bit of a hack and I am very interested in fixing this fully in the
future (through an approach like cockroachdb#33373 or by incrementally consulting
the lockTable in a `lockAwareIterator`), but for now, I don't see a
downside to make this change.

I intend to backport this change to v20.1, as it's causing issues in one
of the demos we like to run.

Release note (performance improvement): limited SELECT statements
now do a better job avoiding unnecessary contention with UPDATE and
SELECT FOR UPDATE statements.
@z0mb1ek
Copy link

z0mb1ek commented Sep 15, 2020

hi. need this for Keycloak working and may be Liquibase

@ajwerner
Copy link
Contributor

@nvanbenschoten do you have thoughts on the likelihood for this in 21.1. and then as an add-on, what are the chances of getting ranged locks?

@nvanbenschoten
Copy link
Member Author

I'd defer that to @sumeerbhola. This change doesn't seem particularly involved, though we wouldn't want to make the change until some of the larger lock table changes land in this release. Ranged locks are a much larger ask, but are being considered in the design of #55664.

@sumeerbhola
Copy link
Collaborator

@ajwerner Replicated range locks have performance implications, since one can't use storage engine bloom filters when looking for conflicting range locks in the lock table. I was speculating on #55664 that we could make the leaseholder maintain an in-memory summary to prevent a performance drop, though a new leaseholder will need to fallback to those reads, until it initializes its in-memory state by reading the range locks from storage (which is not ideal).
Can you make do with unreplicated range locks?

@sumeerbhola sumeerbhola self-assigned this Oct 23, 2020
@ajwerner
Copy link
Contributor

Unreplicated range locks would be great. I’ll take what I can get. Replicated locks would enable us to avoid refreshing reads but unreplicated locks are a lot better than nothing.

@sumeerbhola
Copy link
Collaborator

Unreplicated range locks would be great.

What strength. Can you open a separate issue?

@ajwerner
Copy link
Contributor

Unreplicated range locks would be great.

What strength. Can you open a separate issue?

#55896

@nvanbenschoten
Copy link
Member Author

Cross-posting a snippet of a thread with @sumeerbhola to ensure we don't miss this when we return to this issue:

Regarding an upgrade locking strength, I do still think there's room for increased concurrency between locking SFU reads and non-locking reads. However, we'll want to be guided by benchmarks here, mainly YCSB. I suspect that in doing so, we will find that we need to couple the upgrade locking strength with a second optimization - a way for implicit, single-row UPDATE statements to perform a server-side refresh during their {Put, EndTxn} batch. The idea here would be that if a transaction is going to write to all of the keys that it read from and ends up writing entirely to a single range, it should be able to implicitly refresh away bumps to its write timestamp due to read-write conflicts (but not write-write conflicts!). We already have some notion of server-side refreshes (see CanForwardReadTimestamp), but we'll need to extend that concept. Without this, I fear that the increased concurrency due to the weaker SFU locking strength would cause transaction retries that would negate any benefit provided by the increased concurrency.

The server-side refresh mechanism I'm referring to here is implemented in canDoServersideRetry.

@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jul 15, 2021
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow. There is an
exception which is IMPORT incrementing a sequence. Nevertheless, the retry
behavior there seems sound. The other exception is pausing or canceling jobs.
I think that in that case we prefer to invalidate the work of the transaction
as our intention is to cancel it.

If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jul 16, 2021
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow.

There is an exception which is IMPORT incrementing a sequence. In that case,
which motivated the initial locking addition, we'll leave the locking.

The other exception is pausing or canceling jobs. I think that in that case
we prefer to invalidate the work of the transaction as our intention is to
cancel it.

If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jul 29, 2021
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow.

There is an exception which is IMPORT incrementing a sequence. In that case,
which motivated the initial locking addition, we'll leave the locking.

The other exception is pausing or canceling jobs. I think that in that case
we prefer to invalidate the work of the transaction as our intention is to
cancel it.

If cockroach implemented UPGRADE locks (cockroachdb#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.
craig bot pushed a commit that referenced this issue Jul 29, 2021
67660: jobs: remove FOR UPDATE clause when updating job r=ajwerner a=ajwerner

In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow. There is an
exception which is IMPORT incrementing a sequence. Nevertheless, the retry
behavior there seems sound. The other exception is pausing or canceling jobs.
I think that in that case we prefer to invalidate the work of the transaction
as our intention is to cancel it.

If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.

68241: changefeedccl: Change unreachable panic to unreachable error r=[miretskiy] a=HonoreDB

I missed the context but apparently we're doing this everywhere we can.

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jul 29, 2021
In cockroachdb currently, the `FOR UPDATE` lock in an exclusive lock. That
means that both clients trying to inspect jobs and the job adoption loops will
both try to scan the table and encounter these locks. For the most part, we
don't really update the job from the leaves of a distsql flow.

There is an exception which is IMPORT incrementing a sequence. In that case,
which motivated the initial locking addition, we'll leave the locking.

The other exception is pausing or canceling jobs. I think that in that case
we prefer to invalidate the work of the transaction as our intention is to
cancel it.

If cockroach implemented UPGRADE locks (#49684), then this FOR UPDATE would
not be a problem.

Release note (performance improvement): Jobs no longer hold exclusive locks
during the duration of their checkpointing transactions which can result in
long wait times when trying to run SHOW JOBS.
@arulajmani arulajmani changed the title kv: implement Upgrade lock strength for unreplicated locks kv: implement Update lock strength for unreplicated locks Jul 28, 2023
@michae2
Copy link
Collaborator

michae2 commented Jul 28, 2023

Regarding how this locking strength would map to SELECT FOR UPDATE and SELECT FOR SHARE statements:

It might be interesting to expose this level of locking strength through SQL to applications (though I'm not sure what we'd call it, since PG already uses "FOR UPDATE" to mean exclusive locking). It could be useful when applications are not sure at the time of the locking SELECT whether any of the rows read will be modified later in the transaction. (FOR UPDATE would then mean the application is sure at least some of the rows read will be modified, and FOR SHARE would continue to mean the application is sure none of the rows read will be modified.)

(Or, alternatively, we could always use this level of locking strength for FOR UPDATE. But then what would exclusive locks be for?)

@nvanbenschoten nvanbenschoten removed their assignment Feb 21, 2024
@shralex shralex added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Feb 29, 2024
@arulajmani arulajmani removed the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Mar 1, 2024
@arulajmani
Copy link
Collaborator

Removing the O-support label here. This was from a time when we thought we'd need to switch SELECT FOR UPDATE to acquire Update/Upgrade locks instead of Exclusive locks so that they don't conflict with non-locking readers. Our thinking has evolved since, and we expect non-locking readers to not conflict with Exclusive locks. See #117627 -- I'll apply O-support to that one instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

8 participants