Skip to content

Commit

Permalink
kvcoord: fix flake in TestTransactionUnexpectedlyCommitted
Browse files Browse the repository at this point in the history
The `TestTransactionUnexpectedlyCommitted/recovery_after_transfer_lease`
test, introduced to test cockroachdb#107658, has been flaky (particularly under
deadlock builds) due to a race condition between a retry of a write and
intent resolution. While both orderings in this test result in a correct
`AmbiguousResultError` for the client, when intent resolution wins the
race, the retried write will attempt to push away the current
lockholder; since it is illegal for a committed transaction to perform a
push, this results in a different secondary error attached to the
`AmbiguousResultError`. This change ensures a predefined ordering of
these operations so that the secondary error is consistent across runs
of the test.

Fixes: cockroachdb#110187

Release note: None
  • Loading branch information
AlexTalks committed Oct 6, 2023
1 parent 843596f commit cec602a
Showing 1 changed file with 10 additions and 3 deletions.
13 changes: 10 additions & 3 deletions pkg/kv/kvclient/kvcoord/dist_sender_ambiguous_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1048,7 +1048,7 @@ func TestTransactionUnexpectedlyCommitted(t *testing.T) {
// 7. _->n1: PushTxn(txn2->txn1) -- Discovers txn1 in STAGING and starts
// recovery.
// 8. _->n1: RecoverTxn(txn1) -- Recovery should mark txn1 committed, but
// pauses before returning so that txn1's intents don't get cleaned up.
// intent resolution on txn1 needs to be paused until after txn1 finishes.
if req.ba.IsSingleRecoverTxnRequest() && cp == AfterSending {
close(recoverComplete)
}
Expand All @@ -1075,8 +1075,15 @@ func TestTransactionUnexpectedlyCommitted(t *testing.T) {
// 12. txn1->n1: EndTxn(commit) -- Recovery has already completed, so this
// request fails with "transaction unexpectedly committed".

// <allow (8) recovery to return and txn2 to continue>
if req.ba.IsSingleRecoverTxnRequest() && cp == AfterSending {
// <allow intent resolution after (8) recovery so txn2 can continue>
// If the intent on (b) were resolved and txn2 could grab the lock prior
// to txn1's retry of the Put(b), the retry will cause a PushTxn to txn2.
// Given that the recovery at (8) has already completed, a PushTxn
// request where the pusher is a committed transaction results in an
// "already committed" TransactionStatusError from the txnwait queue.
// While this still results in AmbiguousResultError from the DistSender,
// the reason will be distinct; as such we pause the intent resolution.
if riReq, ok := req.ba.GetArg(kvpb.ResolveIntent); ok && riReq.Header().Key.Equal(keyB) && cp == BeforeSending {
req.pauseUntil(t, txn1Done, cp)
t.Logf("%s - complete, resp={%s}", req.prefix, resp)
}
Expand Down

0 comments on commit cec602a

Please sign in to comment.