Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kvcoord: prevent concurrent EndTxn requests #65863

Closed

Conversation

erikgrinaker
Copy link
Contributor

Backport 1/1 commits from #65592.

This should bake on master for a while, in case there are unexpected async requests being sent.

/cc @cockroachdb/release @cockroachdb/kv


TxnCoordSender generally operates synchronously (i.e. the client waits
for the previous response before sending the next request). However, the
txnHeartbeater sends asynchronous EndTxn(commit=false) rollbacks
when it discovers an aborted transaction record. Unfortunately, some code
assumes synchrony, which caused race conditions with txn rollbacks.

In particular, the txnPipeliner attaches lock spans and in-flight
writes to the EndTxn request for e.g. intent cleanup, but it only
records this information when it receives responses. Thus, if an
EndTxn(commit=false) is sent concurrently with a write request, the
lock spans and in-flight writes of that write request will not get
attached to the EndTxn request and the intents will not get cleaned
up.

This patch makes the txnHeartbeater wait for any in-flight requests to
complete before sending asynchronous rollbacks, and collapses incoming
client rollbacks with in-flight async rollbacks.

Resolves #65458.
Resolves #65587.
Resolves #65447.

Release note (bug fix): Fixed a race condition where transaction cleanup
would fail to take into account ongoing writes and clean up their
intents.

@erikgrinaker erikgrinaker self-assigned this May 28, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the backport21.1-65592 branch 2 times, most recently from 1520e50 to c320917 Compare May 31, 2021 12:17
`TxnCoordSender` generally operates synchronously (i.e. the client waits
for the previous response before sending the next request). However, the
`txnHeartbeater` sends asynchronous `EndTxn(commit=false)` rollbacks
when it discovers an aborted transaction record. Unfortunately, some code
assumes synchrony, which caused race conditions with txn rollbacks.

In particular, the `txnPipeliner` attaches lock spans and in-flight
writes to the `EndTxn` request for e.g. intent cleanup, but it only
records this information when it receives responses. Thus, if an
`EndTxn(commit=false)` is sent concurrently with a write request, the
lock spans and in-flight writes of that write request will not get
attached to the `EndTxn` request and the intents will not get cleaned
up.

This patch makes the `txnHeartbeater` wait for any in-flight requests to
complete before sending asynchronous rollbacks, and collapses incoming
client rollbacks with in-flight async rollbacks.

Release note (bug fix): Fixed a race condition where transaction cleanup
would fail to take into account ongoing writes and clean up their
intents.
@erikgrinaker
Copy link
Contributor Author

Since this isn't causing significant issues with 21.1 clusters, and the change is non-trivial, let's drop the backport. It'll be in 21.2.

@erikgrinaker erikgrinaker deleted the backport21.1-65592 branch September 22, 2021 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants