-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: fix issues around failed 1PC txns #26497
Conversation
r1 is #26479 Review status: 0 of 8 files reviewed at latest revision, all discussions resolved, some commit checks pending. Comments from Reviewable |
How did you determine that this fix was needed? I assume you were seeing cases where intents were being left pending; should we have a roachtest that exhibits this more clearly? Reviewed 7 of 7 files at r2. pkg/kv/txn_coord_sender.go, line 635 at r2 (raw file):
It seems like a lot of plumbing to get the pkg/kv/txn_coord_sender_test.go, line 1975 at r2 (raw file):
Remove these lines? Comments from Reviewable |
Reviewed 3 of 7 files at r2. pkg/internal/client/sender.go, line 111 at r2 (raw file):
Change this back to pkg/kv/txn_coord_sender.go, line 104 at r2 (raw file):
We need to do this even for scans that span ranges. Comments from Reviewable |
Review status: 7 of 8 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. pkg/kv/txn_coord_sender.go, line 104 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
do we? If the txn is read-only, isn't a consistent timestamp the only thing we need across ranges? Comments from Reviewable |
The recent cockroachdb#25541 changed the way "tracking" (the heartbeat loop and intent collection) is initiated for transactions. It aimed to simplify things and put the burden on the client to decide when a txn needs tracking. This introduced a problem - the client.Txn was not initiating tracking when sending 1PC batches. However, tracking is needed for these transactions too: even though usually they'll succeed and so the TxnCoordSender state can be quickly destroyed, when they fail their intents and heartbeat loop need to be kept around just like for any other txn. This patch backtracks on the previous move to make it the client's responsibility to initiate tracking (it didn't stand the test of time): the client.Txn is no longer in charge of calling tcs.StartTracking(). Instead, the TCS does whatever needs to be done when it sees an EndTransaction. I also took the opportunity to spruce up comments on the TxnCoordSender. Release note: None
As we discussed, badness had not been seen in the wild. I think the test I've added is sufficient. Review status: 2 of 8 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/internal/client/sender.go, line 111 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/kv/txn_coord_sender.go, line 635 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/kv/txn_coord_sender_test.go, line 1975 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
Review status: 2 of 8 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. Comments from Reviewable |
bors r+ Review status: 2 of 8 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. Comments from Reviewable |
26497: kv: fix issues around failed 1PC txns r=andreimatei a=andreimatei The recent #25541 changed the way "tracking" (the heartbeat loop and intent collection) is initiated for transactions. It aimed to simplify things and put the burden on the client to decide when a txn needs tracking. This introduced a problem - the client.Txn was not initiating tracking when sending 1PC batches. However, tracking is needed for these transactions too: even though usually they'll succeed and so the TxnCoordSender state can be quickly destroyed, when they fail their intents and heartbeat loop need to be kept around just like for any other txn. This patch backtracks on the previous move to make it the client's responsibility to initiate tracking (it didn't stand the test of time): the client.Txn is no longer in charge of calling tcs.StartTracking(). Instead, the TCS does whatever needs to be done when it sees an EndTransaction. I also took the opportunity to spruce up comments on the TxnCoordSender. Release note: None Co-authored-by: Andrei Matei <[email protected]>
Build succeeded |
I've introduced a bug in cockroachdb#26497 - if there's concurrent writing requests going though a single client.Txn, it can be that some of the intents are not tracked correctly. Before this patch, intents started being tracked when the TxnCoordSender sees a BeginTransaction. Even through the client.Txn attaches an EndTransaction to the first writing requests that it sees, a 2nd request might pass through and get to the TxnCoordSender before that BeginTxn. In fact, it might even get a response before the TxnCoordSender sees the BeginTxn, in which case we wouldn't have recorded its intents. This patch makes the TxnCoordSender track intents from all writing requests. The BeginTxn doesn't matter for tracking intents any more. It continues to matter for starting the heartbeat loop. We could alternatively move to starting the hb loop when seeing the first write, but I didn't do it. Fixes cockroachdb#26538 Release note: None
26606: kv: fix intent tracking r=andreimatei a=andreimatei I've introduced a bug in #26497 - if there's concurrent writing requests going though a single client.Txn, it can be that some of the intents are not tracked correctly. Before this patch, intents started being tracked when the TxnCoordSender sees a BeginTransaction. Even through the client.Txn attaches an EndTransaction to the first writing requests that it sees, a 2nd request might pass through and get to the TxnCoordSender before that BeginTxn. In fact, it might even get a response before the TxnCoordSender sees the BeginTxn, in which case we wouldn't have recorded its intents. This patch makes the TxnCoordSender track intents from all writing requests. The BeginTxn doesn't matter for tracking intents any more. It continues to matter for starting the heartbeat loop. We could alternatively move to starting the hb loop when seeing the first write, but I didn't do it. Fixes #26538 Release note: None Co-authored-by: Andrei Matei <[email protected]>
The recent #25541 changed the way "tracking" (the heartbeat loop and
intent collection) is initiated for transactions. It aimed to simplify
things and put the burden on the client to decide when a txn needs
tracking. This introduced a problem - the client.Txn was not initiating
tracking when sending 1PC batches. However, tracking is needed for these
transactions too: even though usually they'll succeed and so the
TxnCoordSender state can be quickly destroyed, when they fail their
intents and heartbeat loop need to be kept around just like for any
other txn.
This patch backtracks on the previous move to make it the client's
responsibility to initiate tracking (it didn't stand the test of time):
the client.Txn is no longer in charge of calling tcs.StartTracking().
Instead, the TCS does whatever needs to be done when it sees an
EndTransaction.
I also took the opportunity to spruce up comments on the TxnCoordSender.
Release note: None