Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: eliminate write-too-old deferral mechanism #102751

Closed
nvanbenschoten opened this issue May 3, 2023 · 2 comments · Fixed by #102809
Closed

kv: eliminate write-too-old deferral mechanism #102751

nvanbenschoten opened this issue May 3, 2023 · 2 comments · Fixed by #102809
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented May 3, 2023

Blind-writes can currently defer the handling of WriteTooOld errors on write-write conflicts until after they have written their intent. This serves as a form of pessimistic locking to avoid starvation in the case of contending blind writes to the same key. The mechanism was introduced in CockroachDB early in its life, unintentionally removed in #38668, and then revived by #44654 (issue: #44653).

This mechanism has been confusing, complex, and error-prone. For example, it requires mvccPutInternal and its callers to perform all side-effects (e.g. populate WriteBatches) even if a WriteTooOld error has been thrown, in case it is deferred above in evaluateBatch. We've wanted to remove this mechanism since at least 79c711d#diff-c9f3b8fbff25265dd51555fa7099eae566e5904c4e158332afba5551332ca5bcR293, but the time wasn't right.

The mechanism is also problematic for weaker isolation levels. Specifically, it interacts poorly with parallel commits, conflating read-write conflicts with write-write conflicts and permitting the following bug:

1. read on "a" @ 10
2. read on "b" @ 10
3. parallel commit write "a" @ 10, write "b" @ 10; [{Put("a"), EndTxn}, {Put("b")}]
4. write "a" gets timestamp cache pushed to 15, stages because of weak isolation level
5. write "b" gets write-too-old pushed to 14
6. implicit commit @ 15 with write-write conflict. Uh oh!

Since the deferral mechanism was re-introduced, CockroachDB's KV layer has evolved in two ways which obviate the need for it:

  • Implicit and explicit SELECT FOR UPDATE. Acquiring locks earlier in the lifecycle of an UPDATE/UPSERT statement avoids the kinds of starvation issues that the deferral mechanism was meant to solve. Transaction contention is resolved ahead of the intent write, so the intent write does not typically hit a WriteTooOld error and does not need to write an intent in the presence of a write-write conflict just to avoid starvation.
  • Server-side refreshes. Many transactions are able to ignore a WriteTooOld error through a less error-prone server-side mechanism called server-side refreshes. This mechanism catches errors that could cause a transaction to retry and adjusts the transaction's timestamp in response, under latches.

With these two mechanisms in place, we are ready to eliminate the write-too-old deferral mechanism.

Jira issue: CRDB-27634

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team A-read-committed Related to the introduction of Read Committed labels May 3, 2023
@nvanbenschoten nvanbenschoten self-assigned this May 3, 2023
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 5, 2023
Informs cockroachdb#102751.

This commit eliminates the write-too-old deferral mechanism, where blind-write
BatchRequests that hit a WriteTooOld error would successfully write intents and
then return a Transaction proto with the WriteTooOld flag to the client. The
client would then immediately refresh to remove this flag. However, in the
intermediate period, the written intents would act as locks to ensure that if
the refresh succeeded, the writer would have exclusive access to the previously
written keys and would not hit a WriteTooOld error on its next attempt.

The rationale for the removal of this mechanism is outlined in cockroachdb#102751. At a
high-level, the mechanism is complex, error-prone, and sufficiently unnecessary
today due to unreplicated locks and server-side refreshes. It also interacts
poorly with weak isolation levels, which further motivates its removal.

Cases where the write-too-old deferral mechanism is still hypothetically useful
are difficult to construct, especially from SQL's limited use of KV. They
require the following conditions to all hold:

1. a blind-writing BatchRequest (containing Put or Delete, but not ConditionalPut)
2. a BatchRequest without the CanForwardReadTimestamp flag (needs client-side refresh)
3. a write-write conflict that will not cause a refresh to fail

These requirement are almost always contradictory. A write-write conflict
implies a failed refresh if the refresher has already read the conflicting key.
So the cases where this mechanism help are limited to those where the writer has
not already read the conflicting key. However, SQL rarely issues blind-write KV
requests keys that it has not already read. The cases where this might come up
are fast-path DELETE statements that issue DeleteRequest (not
DeleteRangeRequest) and fast-path UPSERT statements that write all columns in a
table. If either of these are heavily contended and take place in
multi-statement transactions that previously read, this mechanism could help.
However, I suspect that these scenarios are very uncommon. If customers do see
them, they can avoid problems by using SELECT FOR UPDATE earlier in the
transaction or by using Read Committed (which effectively resets the
CanForwardReadTimestamp flag on each statement boundary).

The commit does not yet remove the Transaction.WriteTooOld field, as this must
remain until compatibility with v23.1 nodes is no longer a concern.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 5, 2023
Informs cockroachdb#102751.

This commit eliminates the MVCC-level portion of the write-too-old deferral
mechanism. It adjusts `mvccPutInternal` to return immediately when a write-write
version conflict is encountered with a WriteTooOld error and without also writing
an intent after the conflicting value. This partial-success, partial-error state
is no longer needed now that KV no longer defers write-too-old error handling. As
a result, we can remove the complexity.

Release note: None
@sumeerbhola
Copy link
Collaborator

This mechanism catches errors that could cause a transaction to retry and adjusts the transaction's timestamp in response, under latches.

So is this what prevents starvation and loss of performance in the non-SFU case? IIUC, this would mean that the MVCC layer will return an error without doing the write, and the server-side refresh will bump the timestamp (while holding the latch and the lock reservation) and retry the write successfully. So just some extra iterator seeks for the second Put attempt. Am I correct?

@nvanbenschoten
Copy link
Member Author

You are correct that the server-side refresh will avoid starvation in cases where that is possible. However, this PR won't introduce any new iterator seeks, as we were previously performing a server-side refresh (when permitted by the client) after the deferred write-too-old error anyway:

if pErr == nil {
wto := br.Txn != nil && br.Txn.WriteTooOld
success = !wto
} else {

So the case where this could make a difference is when a server-side refresh is not possible but a client-side refresh is possible and succeeds. The description on #102808 goes into more detail about this. It discusses the limited cases where the write-too-old deferral mechanism could have hypothetically still been useful for avoiding starvation, even with SFU and server-side refreshes. The scenarios that meet all requirements are involved and I don't know of any real workloads that behave like them.

You can much more easily construct scenarios where the client-side refresh is bound to fail and the write-too-old deferral is detrimental.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 13, 2023
Informs cockroachdb#102751.

This commit eliminates the write-too-old deferral mechanism, where blind-write
BatchRequests that hit a WriteTooOld error would successfully write intents and
then return a Transaction proto with the WriteTooOld flag to the client. The
client would then immediately refresh to remove this flag. However, in the
intermediate period, the written intents would act as locks to ensure that if
the refresh succeeded, the writer would have exclusive access to the previously
written keys and would not hit a WriteTooOld error on its next attempt.

The rationale for the removal of this mechanism is outlined in cockroachdb#102751. At a
high-level, the mechanism is complex, error-prone, and sufficiently unnecessary
today due to unreplicated locks and server-side refreshes. It also interacts
poorly with weak isolation levels, which further motivates its removal.

Cases where the write-too-old deferral mechanism is still hypothetically useful
are difficult to construct, especially from SQL's limited use of KV. They
require the following conditions to all hold:

1. a blind-writing BatchRequest (containing Put or Delete, but not ConditionalPut)
2. a BatchRequest without the CanForwardReadTimestamp flag (needs client-side refresh)
3. a write-write conflict that will not cause a refresh to fail

These requirement are almost always contradictory. A write-write conflict
implies a failed refresh if the refresher has already read the conflicting key.
So the cases where this mechanism help are limited to those where the writer has
not already read the conflicting key. However, SQL rarely issues blind-write KV
requests keys that it has not already read. The cases where this might come up
are fast-path DELETE statements that issue DeleteRequest (not
DeleteRangeRequest) and fast-path UPSERT statements that write all columns in a
table. If either of these are heavily contended and take place in
multi-statement transactions that previously read, this mechanism could help.
However, I suspect that these scenarios are very uncommon. If customers do see
them, they can avoid problems by using SELECT FOR UPDATE earlier in the
transaction or by using Read Committed (which effectively resets the
CanForwardReadTimestamp flag on each statement boundary).

The commit does not yet remove the Transaction.WriteTooOld field, as this must
remain until compatibility with v23.1 nodes is no longer a concern.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 13, 2023
Informs cockroachdb#102751.

This commit eliminates the MVCC-level portion of the write-too-old deferral
mechanism. It adjusts `mvccPutInternal` to return immediately when a write-write
version conflict is encountered with a WriteTooOld error and without also writing
an intent after the conflicting value. This partial-success, partial-error state
is no longer needed now that KV no longer defers write-too-old error handling. As
a result, we can remove the complexity.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 16, 2023
Informs cockroachdb#102751.

This commit eliminates the write-too-old deferral mechanism, where blind-write
BatchRequests that hit a WriteTooOld error would successfully write intents and
then return a Transaction proto with the WriteTooOld flag to the client. The
client would then immediately refresh to remove this flag. However, in the
intermediate period, the written intents would act as locks to ensure that if
the refresh succeeded, the writer would have exclusive access to the previously
written keys and would not hit a WriteTooOld error on its next attempt.

The rationale for the removal of this mechanism is outlined in cockroachdb#102751. At a
high-level, the mechanism is complex, error-prone, and sufficiently unnecessary
today due to unreplicated locks and server-side refreshes. It also interacts
poorly with weak isolation levels, which further motivates its removal.

Cases where the write-too-old deferral mechanism is still hypothetically useful
are difficult to construct, especially from SQL's limited use of KV. They
require the following conditions to all hold:

1. a blind-writing BatchRequest (containing Put or Delete, but not ConditionalPut)
2. a BatchRequest without the CanForwardReadTimestamp flag (needs client-side refresh)
3. a write-write conflict that will not cause a refresh to fail

These requirement are almost always contradictory. A write-write conflict
implies a failed refresh if the refresher has already read the conflicting key.
So the cases where this mechanism help are limited to those where the writer has
not already read the conflicting key. However, SQL rarely issues blind-write KV
requests keys that it has not already read. The cases where this might come up
are fast-path DELETE statements that issue DeleteRequest (not
DeleteRangeRequest) and fast-path UPSERT statements that write all columns in a
table. If either of these are heavily contended and take place in
multi-statement transactions that previously read, this mechanism could help.
However, I suspect that these scenarios are very uncommon. If customers do see
them, they can avoid problems by using SELECT FOR UPDATE earlier in the
transaction or by using Read Committed (which effectively resets the
CanForwardReadTimestamp flag on each statement boundary).

The commit does not yet remove the Transaction.WriteTooOld field, as this must
remain until compatibility with v23.1 nodes is no longer a concern.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 16, 2023
Informs cockroachdb#102751.

This commit eliminates the write-too-old deferral mechanism, where blind-write
BatchRequests that hit a WriteTooOld error would successfully write intents and
then return a Transaction proto with the WriteTooOld flag to the client. The
client would then immediately refresh to remove this flag. However, in the
intermediate period, the written intents would act as locks to ensure that if
the refresh succeeded, the writer would have exclusive access to the previously
written keys and would not hit a WriteTooOld error on its next attempt.

The rationale for the removal of this mechanism is outlined in cockroachdb#102751. At a
high-level, the mechanism is complex, error-prone, and sufficiently unnecessary
today due to unreplicated locks and server-side refreshes. It also interacts
poorly with weak isolation levels, which further motivates its removal.

Cases where the write-too-old deferral mechanism is still hypothetically useful
are difficult to construct, especially from SQL's limited use of KV. They
require the following conditions to all hold:

1. a blind-writing BatchRequest (containing Put or Delete, but not ConditionalPut)
2. a BatchRequest without the CanForwardReadTimestamp flag (needs client-side refresh)
3. a write-write conflict that will not cause a refresh to fail

These requirement are almost always contradictory. A write-write conflict
implies a failed refresh if the refresher has already read the conflicting key.
So the cases where this mechanism help are limited to those where the writer has
not already read the conflicting key. However, SQL rarely issues blind-write KV
requests keys that it has not already read. The cases where this might come up
are fast-path DELETE statements that issue DeleteRequest (not
DeleteRangeRequest) and fast-path UPSERT statements that write all columns in a
table. If either of these are heavily contended and take place in
multi-statement transactions that previously read, this mechanism could help.
However, I suspect that these scenarios are very uncommon. If customers do see
them, they can avoid problems by using SELECT FOR UPDATE earlier in the
transaction or by using Read Committed (which effectively resets the
CanForwardReadTimestamp flag on each statement boundary).

The commit does not yet remove the Transaction.WriteTooOld field, as this must
remain until compatibility with v23.1 nodes is no longer a concern.

Release note: None
craig bot pushed a commit that referenced this issue May 16, 2023
102808: kv: eliminate write-too-old deferral mechanism r=nvanbenschoten a=nvanbenschoten

KV half of #102751.

This commit eliminates the write-too-old deferral mechanism, where blind-write BatchRequests that hit a WriteTooOld error would successfully write intents and then return a Transaction proto with the WriteTooOld flag to the client. The client would then immediately refresh to remove this flag. However, in the intermediate period, the written intents would act as locks to ensure that if the refresh succeeded, the writer would have exclusive access to the previously written keys and would not hit a WriteTooOld error on its next attempt.

The rationale for the removal of this mechanism is outlined in #102751. At a high-level, the mechanism is complex, error-prone, and sufficiently unnecessary today due to unreplicated locks and server-side refreshes. It also interacts poorly with weak isolation levels, which further motivates its removal.

Cases where the write-too-old deferral mechanism is still hypothetically useful are difficult to construct, especially from SQL's limited use of KV. They require the following conditions to all hold:

1. a blind-writing BatchRequest (containing Put or Delete, but not ConditionalPut)
2. a BatchRequest without the CanForwardReadTimestamp flag (needs client-side refresh)
3. a write-write conflict that will not cause a refresh to fail

These requirement are almost always contradictory. A write-write conflict implies a failed refresh if the refresher has already read the conflicting key. So the cases where this mechanism help are limited to those where the writer has not already read the conflicting key. However, SQL rarely issues blind-write KV requests keys that it has not already read. The cases where this might come up are fast-path DELETE statements that issue DeleteRequest (not DeleteRangeRequest) and fast-path UPSERT statements that write all columns in a table. If either of these are heavily contended and take place in multi-statement transactions that previously read, this mechanism could help. However, I suspect that these scenarios are very uncommon. If customers do see them, they can avoid problems by using SELECT FOR UPDATE earlier in the transaction or by using Read Committed (which effectively resets the CanForwardReadTimestamp flag on each statement boundary).

The commit does not yet remove the Transaction.WriteTooOld field, as this must remain until compatibility with v23.1 nodes is no longer a concern.

Release note: None

103353: sql: skip flaky TestTxnContentionEventsTable r=gtr a=gtr

Informs #102660.

Release note: None

103354: admission: fix comment in admission.go r=bananabrick a=bananabrick

Epic: none
Release note: None

103404: concurrency: fix bug in lockStateInfo r=nvanbenschoten a=arulajmani

All waiting readers are considered to be actively waiting at a lock;
there's no concept of inactive waiting readers in the lock table.
Previously, when converting a lockState into a roachpb.LockStateInfo,
we would erroneously denote any waiting readers as inactive. This
patch fixes this and adds a test for it.

This patch also fixes how reservations are designated as active or
inactive. Previously, reservations would be marked as active waiters.
This isn't true -- reservation holders do not actively wait at a lock
they hold reservations for. They only wait at other locks or proceed
to evaluation. Now, we mark reservations as inactive waiters as well.
The diff in existing tests is a result of this change.

Epic: none

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: gtr <[email protected]>
Co-authored-by: Arjun Nair <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 16, 2023
Informs cockroachdb#102751.

This commit eliminates the MVCC-level portion of the write-too-old deferral
mechanism. It adjusts `mvccPutInternal` to return immediately when a write-write
version conflict is encountered with a WriteTooOld error and without also writing
an intent after the conflicting value. This partial-success, partial-error state
is no longer needed now that KV no longer defers write-too-old error handling. As
a result, we can remove the complexity.

Release note: None
@craig craig bot closed this as completed in 2e6d3c5 May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants