Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: EndTxn elision is vulnerable to race conditions #65587

Closed
erikgrinaker opened this issue May 22, 2021 · 0 comments · Fixed by #65592
Closed

kvcoord: EndTxn elision is vulnerable to race conditions #65587

erikgrinaker opened this issue May 22, 2021 · 0 comments · Fixed by #65592
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@erikgrinaker
Copy link
Contributor

The transaction coordinator elides EndTxn requests for non-locking requests in at least two places:

https://github.com/cockroachdb/cockroach/blob/10c71993d2d339119aa26c4eeefc7025aac2bfe7/pkg/kv/kvclient/kvcoord/txn_coord_sender.go#L478-L480

// Determine whether we can elide the EndTxn entirely. We can do so if the
// transaction is read-only, which we determine based on whether the EndTxn
// request contains any writes.
if len(et.LockSpans) == 0 && len(et.InFlightWrites) == 0 {
return tc.sendLockedWithElidedEndTxn(ctx, ba, et)
}

However, this logic is incorrect and vulnerable to race conditions, because the txnPipeliner which is responsible for keeping track of locks and in-flight writes only updates this state after the write requests have completed and the responses received:

// Send through wrapped lockedSender. Unlocks while sending then re-locks.
br, pErr := tp.wrapped.SendLocked(ctx, ba)
// Update the in-flight write set and the lock footprint with the results of
// the request.
tp.updateLockTracking(ctx, ba, br)

Thus, if an EndTxn request is sent while the write requests are in flight, it can incorrectly elide the EndTxn request despite the transaction having taken out locks and written intents.

Luckily, I don't believe this affects commits, because these are synchronous (i.e. the client will await the result of the previous command before sending the EndTxn(commit=true)). This is even enforced in the txnGatekeeper:

// As a special case, allow for async rollbacks and heartbeats to be sent
// whenever.
if !gs.allowConcurrentRequests {
asyncRequest := ba.IsSingleAbortTxnRequest() || ba.IsSingleHeartbeatTxnRequest()
if !asyncRequest {
if gs.requestInFlight {
return nil, roachpb.NewError(
errors.AssertionFailedf("concurrent txn use detected. ba: %s", ba))
}
gs.requestInFlight = true
defer func() {
gs.requestInFlight = false
}()
}
}

However, this does not apply to EndTxn(commit=false), i.e. rollbacks, which can be sent asynchronously due to e.g. the txnHeartbeater finding an aborted txn or the client disconnecting and cancelling the context. This vulnerability has been confirmed with a test, where finalizeNonLockingTxnLocked() is called even though a Put request is sent before the EndTxn.

At the very least, this affects cleanup of aborted transactions. However, there may well be other effects of this (or related issues) in more complex scenarios that I haven't discovered yet -- for example, I see that the concurrent request check above is disabled for leaf transactions.

As part of fixing #65458 (which has the same root cause) I am implementing an EndTxn barrier interceptor that blocks until pending requests complete. Moving the finalizeNonLockingTxnLocked mechanism into or below this barrier interceptor should avoid the race.

/cc @nvanbenschoten @aliher1911

@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-transactions Relating to MVCC and the transactional model. labels May 22, 2021
@erikgrinaker erikgrinaker self-assigned this May 22, 2021
@craig craig bot closed this as completed in b479177 May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant