-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: tolerate write skew under weak isolation levels #100131
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
A-read-committed
Related to the introduction of Read Committed
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
nvanbenschoten
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-transactions
Relating to MVCC and the transactional model.
T-kv
KV Team
A-read-committed
Related to the introduction of Read Committed
labels
Mar 30, 2023
Raw notes about implementation points:
|
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 18, 2023
Informs cockroachdb#100131. This commit adds unit testing around IsEndTxnExceedingDeadline and IsEndTxnTriggeringRetryError. We'll be changing the latter function in future commits, so it's helpful to have tests in place. While here, the commit cleans up the code slightly. It narrows the signature of the IsEndTxnTriggeringRetryError function to make its inputs more clear. It also makes the logic more readable. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 18, 2023
Informs cockroachdb#100131. This commit disables preemptive transaction refreshes under weak isolation levels. These transactions can tolerate write skew, so they can commit even if their write timestamp has been bumped. Transactions run at weak isolation levels may refresh in response to WriteTooOld errors or ReadWithinUncertaintyInterval errors returned by requests, but they do not need to refresh preemptively ahead of an EndTxn request. Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 19, 2023
101766: kv: unit test IsEndTxnExceedingDeadline and IsEndTxnTriggeringRetryError r=arulajmani a=nvanbenschoten Informs #100131. This commit adds unit testing around IsEndTxnExceedingDeadline and IsEndTxnTriggeringRetryError. We'll be changing the latter function in future commits, so it's helpful to have tests in place. While here, the commit cleans up the code slightly. It narrows the signature of the IsEndTxnTriggeringRetryError function to make its inputs more clear. It also makes the logic more readable. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 20, 2023
Informs cockroachdb#100131. This commit revives and modernizes two tests that were removed/stripped down when snapshot isolation was previously deleted. It then expands the tests to exercise all isolation levels. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 20, 2023
Informs cockroachdb#100131. This commit revives the isolation level exploration in kv correctness tests. These were removed in 39ba88b. Very little of the code here is new, beyond extending this code for Read Committed. I have confirmed that `TestTxnDBWriteSkewAnomaly` fails if it does not expect Snapshot isolation to permit write skew and I update `IsEndTxnTriggeringRetryError` to ignore skew under Snapshot isolation. Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 23, 2023
101955: kv: revive TestLostIncrement and TestLostUpdate r=arulajmani a=nvanbenschoten Informs #100131. This commit revives and modernizes two tests that were removed/stripped down when snapshot isolation was previously deleted. It then expands the tests to exercise all isolation levels. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Apr 23, 2023
101970: kv: revive isolation level exploration in kv correctness tests r=arulajmani a=nvanbenschoten Informs #100131. This commit revives the isolation level exploration in kv correctness tests. These were removed in 39ba88b. Very little of the code here is new, beyond extending this code for Read Committed. I have confirmed that `TestTxnDBWriteSkewAnomaly` fails if it does not expect Snapshot isolation to permit write skew and I update `IsEndTxnTriggeringRetryError` to ignore skew under Snapshot isolation. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Apr 24, 2023
101796: kv: don't preemptively refresh under weak isolation levels r=arulajmani a=nvanbenschoten Informs #100131. This commit disables preemptive transaction refreshes under weak isolation levels. These transactions can tolerate write skew, so they can commit even if their write timestamp has been bumped. Transactions run at weak isolation levels may refresh in response to WriteTooOld errors or ReadWithinUncertaintyInterval errors returned by requests, but they do not need to refresh preemptively ahead of an EndTxn request. Release note: None 101979: sql: fix the pretty-printing of CREATE EXTENSION r=otan a=knz Fixes #101978. Release note: None Epic: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
smg260
pushed a commit
to smg260/cockroach
that referenced
this issue
Apr 24, 2023
Informs cockroachdb#100131. This commit revives and modernizes two tests that were removed/stripped down when snapshot isolation was previously deleted. It then expands the tests to exercise all isolation levels. Release note: None
smg260
pushed a commit
to smg260/cockroach
that referenced
this issue
Apr 24, 2023
Informs cockroachdb#100131. This commit revives the isolation level exploration in kv correctness tests. These were removed in 39ba88b. Very little of the code here is new, beyond extending this code for Read Committed. I have confirmed that `TestTxnDBWriteSkewAnomaly` fails if it does not expect Snapshot isolation to permit write skew and I update `IsEndTxnTriggeringRetryError` to ignore skew under Snapshot isolation. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 26, 2023
…ion levels Informs cockroachdb#100131. This commit revives logic in `Txn.IsSerializablePushAndRefreshNotPossible` that looks at the isolation level of the transaction. This logic was removed in 39ba88b. Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 30, 2023
102417: kv: add metric for server-side transaction refreshes r=nvanbenschoten a=nvanbenschoten Generalized server-side transaction refreshes were introduced in 972915d. This PR adds a new metric to track the number of server-side refreshes, called `txn.refresh.success_server_side`. The metric is maintained by the `txnSpanRefresher`, making it easy to test and associate with the specific transaction that experienced the server-side refresh. The primary motivation for this change is that it allows us to make `TestTxnCoordSenderRetries` more interesting. We will be using this test to exercise snapshot isolation (#100131), and it is useful to see when transactions avoid retry errors due to server-side refreshes. Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
May 1, 2023
102669: kv: support multiple isolation levels in `TestTxnCoordSenderRetries` r=nvanbenschoten a=nvanbenschoten First two commits from #102504. Informs #100131. This PR adds support to test multiple isolation levels in `TestTxnCoordSenderRetries`. This new support will be used in a future commit that introduces write-skew tolerance to Snapshot isolation transactions. For now, the tests still only exercise Serializable transactions because Snapshot transactions which cannot tolerate write skew but also don't preemptively refresh result in an unnecessarily large diff. Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
May 1, 2023
101973: kv/sql: re-teach IsSerializablePushAndRefreshNotPossible about isolation levels r=rafiss a=nvanbenschoten Informs #100131. This commit revives logic in `Txn.IsSerializablePushAndRefreshNotPossible` that looks at the isolation level of the transaction. This logic was removed in 39ba88b. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
May 12, 2023
Informs cockroachdb#100131. This commit decouples the concepts of a parallel commit auto-retry from a refresh auto-retry. This is important for weak isolation transactions which can tolerate write skew, where a failed parallel commit does not necessarily imply that the transaction needs to refresh its read spans in order to commit. Parallel commit auto-retries are performed when a parallel committing batch is successful in its execution but fails to qualify for the implicit commit condition. In such cases, the EndTxn request (and just the EndTxn request) is re-issued to move the transaction directly to the COMMITTED state. This only implies the need for a refresh for serializable transactions. For these transactions, a preemptive refresh will be performed by the txnSpanRefresher before the EndTxn is re-issued. Conversely, refresh auto-retries are performed when a batch fails to execute fully and returns an error. In such cases, the txnSpanRefresher will intercept the error, attempt to refresh, and then re-issue the entire original batch. Because we no longer make an attempt to determine which parts of a previously issued batch were successful (this was subtle and error-prone when it existed), we instead re-issue the entire batch and rely on idempotency. To make this change, this commit moves the txnCommitter interceptor above the txnSpanRefresher interceptor in the TxnCoordSender interceptor stack. This means that the txnCommitter no longer needs to guard against re-issued batches, because all retries are handled below it. Instead, the txnSpanRefresher now needs to learn about the STAGING transaction status. The txnSpanRefresher was already taught about the STAGING status on successful batches in ef77322, and is taught about the STAGING status on errors here. While the primary motivation of this change is to permit weak isolation transactions to perform parallel commit auto-retries without needing to refresh, it also has another benefit. Failed parallel commits no longer re-issue their entire batch on serialization failures. Instead, they just re-issue the EndTxn. As a result, this replaces and closes cockroachdb#93099. It also motivates release note below. Release note (performance improvement): Some large, long-running INSERT statements now perform less work during their commit phase and can run faster.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
May 13, 2023
Informs cockroachdb#100131. This commit decouples the concepts of a parallel commit auto-retry from a refresh auto-retry. This is important for weak isolation transactions which can tolerate write skew, where a failed parallel commit does not necessarily imply that the transaction needs to refresh its read spans in order to commit. Parallel commit auto-retries are performed when a parallel committing batch is successful in its execution but fails to qualify for the implicit commit condition. In such cases, the EndTxn request (and just the EndTxn request) is re-issued to move the transaction directly to the COMMITTED state. This only implies the need for a refresh for serializable transactions. For these transactions, a preemptive refresh will be performed by the txnSpanRefresher before the EndTxn is re-issued. Conversely, refresh auto-retries are performed when a batch fails to execute fully and returns an error. In such cases, the txnSpanRefresher will intercept the error, attempt to refresh, and then re-issue the entire original batch. Because we no longer make an attempt to determine which parts of a previously issued batch were successful (this was subtle and error-prone when it existed), we instead re-issue the entire batch and rely on idempotency. To make this change, this commit moves the txnCommitter interceptor above the txnSpanRefresher interceptor in the TxnCoordSender interceptor stack. This means that the txnCommitter no longer needs to guard against re-issued batches, because all retries are handled below it. Instead, the txnSpanRefresher now needs to learn about the STAGING transaction status. The txnSpanRefresher was already taught about the STAGING status on successful batches in ef77322, and is taught about the STAGING status on errors here. While the primary motivation of this change is to permit weak isolation transactions to perform parallel commit auto-retries without needing to refresh, it also has another benefit. Failed parallel commits no longer re-issue their entire batch on serialization failures. Instead, they just re-issue the EndTxn. As a result, this replaces and closes cockroachdb#93099. It also motivates release note below. Release note (performance improvement): Some large, long-running INSERT statements now perform less work during their commit phase and can run faster.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
May 13, 2023
Informs cockroachdb#100131. This commit decouples the concepts of a parallel commit auto-retry from a refresh auto-retry. This is important for weak isolation transactions which can tolerate write skew, where a failed parallel commit does not necessarily imply that the transaction needs to refresh its read spans in order to commit. Parallel commit auto-retries are performed when a parallel committing batch is successful in its execution but fails to qualify for the implicit commit condition. In such cases, the EndTxn request (and just the EndTxn request) is re-issued to move the transaction directly to the COMMITTED state. This only implies the need for a refresh for serializable transactions. For these transactions, a preemptive refresh will be performed by the txnSpanRefresher before the EndTxn is re-issued. Conversely, refresh auto-retries are performed when a batch fails to execute fully and returns an error. In such cases, the txnSpanRefresher will intercept the error, attempt to refresh, and then re-issue the entire original batch. Because we no longer make an attempt to determine which parts of a previously issued batch were successful (this was subtle and error-prone when it existed), we instead re-issue the entire batch and rely on idempotency. To make this change, this commit moves the txnCommitter interceptor above the txnSpanRefresher interceptor in the TxnCoordSender interceptor stack. This means that the txnCommitter no longer needs to guard against re-issued batches, because all retries are handled below it. Instead, the txnSpanRefresher now needs to learn about the STAGING transaction status. The txnSpanRefresher was already taught about the STAGING status on successful batches in ef77322, and is taught about the STAGING status on errors here. While the primary motivation of this change is to permit weak isolation transactions to perform parallel commit auto-retries without needing to refresh, it also has another benefit. Failed parallel commits no longer re-issue their entire batch on serialization failures. Instead, they just re-issue the EndTxn. As a result, this replaces and closes cockroachdb#93099. It also motivates release note below. Release note (performance improvement): Some large, long-running INSERT statements now perform less work during their commit phase and can run faster.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
May 13, 2023
_or "support kv-level snapshot isolation"._ Fixes cockroachdb#100131. This commit adds support for weak transaction isolation levels (Snapshot and Read Committed) to commit even when their read and write timestamps are skewed. Thanks to prior cleanup and refactoring, this change is limited to an update to the transaction commit condition and a clarification of the one-phase commit requirements, along with a collection of updates to tests. Release note: None
craig bot
pushed a commit
that referenced
this issue
May 15, 2023
102793: backupccl,kvserver: log failed ExportRequest trace on client and server r=irfansharif a=adityamaru This change strives to improve observability around backups that fail because of timed out ExportRequests. Currently, there is very little indication of what the request was doing when the client cancelled the context after the pre-determined timeout window. With this change we now log the trace of the ExportRequest that failed. If the ExportRequest was served locally, then the trace will be part of the sender's tracing span. However, if the request was served on a remote node then the sender does not wait for the server side evaluation to notice the context cancellation. To work around this, we also print the trace on the server side if the request encountered a context cancellation and the associating tracing span is not a noop. This change also adds a private cluster setting `bulkio.backup.export_request_verbose_tracing` that if set to true will send all backup export requests with verbose tracing mode. To debug a backup failing with a timed out export request we can now: - SET CLUSTER SETTING bulkio.backup.export_request_verbose_tracing = true; - SET CLUSTER SETTING trace.snapshot.rate = '1m' Once the backup times out we can look at the logs for the server side and client side ExportRequest traces to then understand where we were stuck executing for so long. Fixes: #86047 Release note: None 103241: kv: decouple parallel commit auto-retries from refresh auto-retries r=arulajmani a=nvanbenschoten Informs #100131. This commit decouples the concepts of a parallel commit auto-retry from a refresh auto-retry. This is important for weak isolation transactions which can tolerate write skew, where a failed parallel commit does not necessarily imply that the transaction needs to refresh its read spans in order to commit. Parallel commit auto-retries are performed when a parallel committing batch is successful in its execution but fails to qualify for the implicit commit condition. In such cases, the EndTxn request (and just the EndTxn request) is re-issued to move the transaction directly to the COMMITTED state. This only implies the need for a refresh for serializable transactions. For these transactions, a preemptive refresh will be performed by the txnSpanRefresher before the EndTxn is re-issued. Conversely, refresh auto-retries are performed when a batch fails to execute fully and returns an error. In such cases, the txnSpanRefresher will intercept the error, attempt to refresh, and then re-issue the entire original batch. Because we no longer make an attempt to determine which parts of a previously issued batch were successful (this was subtle and error-prone when it existed), we instead re-issue the entire batch and rely on idempotency. To make this change, this commit moves the txnCommitter interceptor above the txnSpanRefresher interceptor in the TxnCoordSender interceptor stack. This means that the txnCommitter no longer needs to guard against re-issued batches, because all retries are handled below it. Instead, the txnSpanRefresher now needs to learn about the STAGING transaction status. The txnSpanRefresher was already taught about the STAGING status on successful batches in ef77322, and is taught about the STAGING status on errors here. While the primary motivation of this change is to permit weak isolation transactions to perform parallel commit auto-retries without needing to refresh, it also has another benefit. Failed parallel commits no longer re-issue their entire batch on serialization failures. Instead, they just re-issue the EndTxn. As a result, this replaces and closes #93099. It also motivates release note below. Release note (performance improvement): Some large, long-running INSERT statements now perform less work during their commit phase and can run faster. 103278: kvserver: add expiration lease escape hatch r=erikgrinaker a=erikgrinaker This patch adds `COCKROACH_DISABLE_EXPIRATION_LEASES_ONLY`, which can be used to hard-disable expiration-based leases, e.g. in cases where the lease extensions overload the cluster and prevent it from working, and thus prevent operators from disabling the setting. Epic: none Release note: None 103343: logictest: increase max-sql-memory from 192MiB to 256MiB r=yuzefovich a=yuzefovich We have seen some rare flakes where the validation (after all the test queries are done) hits "memory budget exceeded" on the root monitor. To avoid this, let's just bump the root pool a little. Fixes: #103244. Release note: None Co-authored-by: adityamaru <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
May 18, 2023
_or "support kv-level snapshot isolation"._ Fixes cockroachdb#100131. This commit adds support for weak transaction isolation levels (Snapshot and Read Committed) to commit even when their read and write timestamps are skewed. Thanks to prior cleanup and refactoring, this change is limited to an update to the transaction commit condition and a clarification of the one-phase commit requirements, along with a collection of updates to tests. Release note: None
craig bot
pushed a commit
that referenced
this issue
May 18, 2023
103198: roachtest: always unstall on disk-stall failure r=RaduBerinde a=jbowens Previously, if the disk-stalled/* raochtests failed while the disk was stalled, a defer would attempt to unstall the disk before completing the test. This could fail if the context was cancelled. The stalled disk would then prevent collection of artifacts. This change updates the defer'd Unstall call to use a background context that will never be cancelled. Epic: none Informs #102946 Release note: none 103246: kv: tolerate write skew under weak isolation levels r=arulajmani a=nvanbenschoten _or "support kv-level snapshot isolation"._ Fixes #100131. This commit adds support for weak transaction isolation levels (Snapshot and Read Committed) to commit even when their read and write timestamps are skewed. Thanks to prior cleanup and refactoring, this change is limited to an update to the transaction commit condition and a clarification of the one-phase commit requirements, along with a collection of updates to tests. Release note: None 103548: roachtest: add point-tombstone/heterogeneous-value-sizes roachtest r=jbowens a=jbowens Introduce a new roachtest that exercises a current gap in Pebble's point tombstone heuristics. Rows with a heterogeneous size distribution can be problematic, because Pebble's existing heuristics rely on average value sizes in order to encourage disk-space reclaiming compactions. With heterogeneous value sizes, these heuristics can dramatically over-or-under estimate the amount of disk space reclaimed. This new roachtest runs a kv0 workload for a fixed number of rows, all with 1MiB values. It then runs another a kv0 workload for 4x the number of rows, all with 4KiB values. Then it deletes all the 1MiB-valued rows, with a reduced TTL, and expects that a reasonable amount of disk space is reclaimed. Currently, this roachtest is skipped. With a recent nightly build of Cockroach, the test times out, stalled with the approximate disk-bytes size of the `kv` table stagnant at 92 GiB, despite the MVCC logical size of the table totalling just 1.4 GiB. ``` databaseID: 104, tableID: 106, rangeCount: 3003, approxDiskBytes: 92 GiB, liveBytes: 1.4 GiB, totalBytes: 1.4 GiB, livePercentage: 1.0 ``` Examining one store's sstable properties reveals the point tombstones remain uncompacted in levels L3, L4 and L5. ``` L0 L1 L2 L3 L4 L5 L6 TOTAL count 0 0 0 23 122 545 1192 1882 seq num smallest 0 0 0 2973624 2291108 460760 145525 145525 largest 0 0 0 6014495 4886656 4063619 2833606 6014495 size data 0 B 0 B 0 B 62 M 480 M 3.6 G 26 G 31 G blocks 0 0 0 2513 7859 8292 29255 47919 index 0 B 0 B 0 B 123 K 344 K 356 K 1.2 M 2.0 M blocks 0 0 0 23 122 545 1192 1882 top-level 0 B 0 B 0 B 0 B 0 B 0 B 0 B 0 B filter 0 B 0 B 0 B 116 K 104 K 123 K 164 K 508 K raw-key 0 B 0 B 0 B 5.1 M 2.6 M 2.5 M 3.2 M 13 M raw-value 0 B 0 B 0 B 74 M 485 M 3.6 G 26 G 31 G pinned-key 0 B 0 B 0 B 0 B 0 B 0 B 0 B 0 B pinned-value 0 B 0 B 0 B 0 B 0 B 0 B 0 B 0 B records set 0 0 0 33 K 67 K 51 K 91 K 243 K delete 0 0 0 119 K 8.2 K 9.1 K 0 136 K range-delete 0 0 0 0 0 0 0 0 range-key-sets 0 0 0 0 0 0 0 0 range-key-unsets 0 0 0 0 0 0 0 0 range-key-deletes 0 0 0 0 0 0 0 0 merge 0 0 0 6.8 K 6.8 K 14 K 6.7 K 34 K pinned 0 0 0 0 0 0 0 0 ``` <img width="729" alt="Screenshot 2023-05-17 at 4 27 17 PM" src="https://github.com/cockroachdb/cockroach/assets/867352/d8e3188a-75fb-4670-9c61-e5ff8b369894"> <img width="719" alt="Screenshot 2023-05-17 at 4 27 08 PM" src="https://github.com/cockroachdb/cockroach/assets/867352/cd787935-9ea0-4515-a314-2a279243ac0f"> Informs cockroachdb/pebble#2340. Epic: CRDB-25405 Release note: none Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
A-read-committed
Related to the introduction of Read Committed
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
With the (re-)introduction of weak isolation levels, transaction client and server logic will need to change to tolerate write skew (
ReadTimestamp != WriteTimestamp
) for transactions run at weak isolation levels.cockroach/pkg/kv/kvserver/concurrency/isolation/levels.proto
Lines 88 to 91 in f9ac216
Jira issue: CRDB-26575
Epic CRDB-26539
The text was updated successfully, but these errors were encountered: