Skip to content

Commit

Permalink
storage: employ transactional idempotency to refresh mixed-success ba…
Browse files Browse the repository at this point in the history
…tches

This change builds on the introduction of idempotency properties introduced into
the MVCC layer in #33001. It exploits this property to remove a previous
restriction that could prevent transactions from refreshing and result in
transaction retries. The restriction was that batches which had experienced some
success while writing could not refresh even if another part of the batch
required a refresh. This was because it was unclear which parts of the batch had
succeeded in performing writes and which parts of the batch had not. Without
this knowledge, it was unsafe to re-issue the batch because that could result
in duplicating writes (e.g. performing an increment twice).

A lot has changed since then. We now have proper sequence numbers on requests,
we no longer send `BeginTransaction` requests (which threw errors on replays), and
we now have an MVCC layer that is idempotent for writes within a transaction.
With this foundation in place, we can now safely re-issue any write within a
transaction that we're unsure about. As long as writes remain well sequenced
(writes to the same key aren't reordered), everything should behave as expected.

The best way to see that the change works is through the test cases in
`TestTxnCoordSenderRetries` that now succeed. Without #33001, those all fail.

This change will lead to the removal of `MixedSuccessError`s in 19.2!

Release note: None
  • Loading branch information
nvanbenschoten committed Feb 25, 2019
1 parent 745c6d1 commit 618b513
Show file tree
Hide file tree
Showing 4 changed files with 64 additions and 54 deletions.
16 changes: 10 additions & 6 deletions pkg/kv/dist_sender_server_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2542,7 +2542,8 @@ func TestTxnCoordSenderRetries(t *testing.T) {
b.Put("c", "put")
return txn.CommitInBatch(ctx, b)
},
clientRetry: true, // cput with write too old requires restart
txnCoordRetry: false, // non-matching value means we fail txn coord retry
expFailure: "unexpected value", // the failure we get is a condition failed error
},
{
name: "multi-range batch with write too old and successful cput",
Expand All @@ -2558,7 +2559,8 @@ func TestTxnCoordSenderRetries(t *testing.T) {
b.Put("c", "put")
return txn.CommitInBatch(ctx, b)
},
clientRetry: true, // successful cput will still retry because of mixed success
// Expect a transaction coord retry, which should succeed.
txnCoordRetry: true,
},
{
name: "cput within uncertainty interval",
Expand Down Expand Up @@ -2675,8 +2677,9 @@ func TestTxnCoordSenderRetries(t *testing.T) {
b.CPut("c", "cput", "value")
return txn.CommitInBatch(ctx, b)
},
filter: newUncertaintyFilter(roachpb.Key([]byte("c"))),
clientRetry: true, // client-side retry required as this will be an mixed success
filter: newUncertaintyFilter(roachpb.Key([]byte("c"))),
// Expect a transaction coord retry, which should succeed.
txnCoordRetry: true,
},
{
name: "multi-range scan with uncertainty interval error",
Expand All @@ -2692,8 +2695,9 @@ func TestTxnCoordSenderRetries(t *testing.T) {
retryable: func(ctx context.Context, txn *client.Txn) error {
return txn.DelRange(ctx, "a", "d")
},
filter: newUncertaintyFilter(roachpb.Key([]byte("c"))),
clientRetry: true, // can't restart because of mixed success and write batch
filter: newUncertaintyFilter(roachpb.Key([]byte("c"))),
// Expect a transaction coord retry, which should succeed.
txnCoordRetry: true,
},
}

Expand Down
20 changes: 12 additions & 8 deletions pkg/kv/txn_interceptor_span_refresher.go
Original file line number Diff line number Diff line change
Expand Up @@ -175,15 +175,19 @@ func (sr *txnSpanRefresher) maybeRetrySend(
pErr *roachpb.Error,
maxRefreshAttempts int,
) (*roachpb.BatchResponse, *roachpb.Error, hlc.Timestamp) {
// With mixed success, we can't attempt a retry without potentially
// succeeding at the same conditional put or increment request
// twice; return the wrapped error instead. Because the dist sender
// splits up batches to send to multiple ranges in parallel, and
// then combines the results, partial success makes it very
// difficult to determine what can be retried.
// MixedSuccessErrors used to prevent refreshing because it created the
// potential for the writes that already succeeded to succeed again or to
// hit replay errors. This is no longer true because sequenced writes are
// now idempotent within a transaction. Re-issuing writes that have already
// succeeded, even at a higher timestamp, will be handled properly.
// TODO(nvanbenschoten): remove this in 19.2 when we remove MixedSuccessError.
if aPSErr, ok := pErr.GetDetail().(*roachpb.MixedSuccessError); ok {
log.VEventf(ctx, 2, "got partial success; cannot retry %s (pErr=%s)", ba, aPSErr.Wrapped)
return nil, aPSErr.Wrapped, hlc.Timestamp{}
if !sr.st.Version.IsActive(cluster.VersionSequencedReads) {
log.VEventf(ctx, 2, "got partial success; cannot retry %s (pErr=%s)", ba, aPSErr.Wrapped)
return nil, aPSErr.Wrapped, hlc.Timestamp{}
}
// Unwrap the MixedSuccessError.
pErr = aPSErr.Wrapped
}

// Check for an error which can be retried after updating spans.
Expand Down
Loading

0 comments on commit 618b513

Please sign in to comment.