-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: we should create txn records eagerly on retries #57042
Comments
Wrote this on an internal support issue https://github.com/cockroachlabs/support/issues/1019#issuecomment-865923066, thought I'd replicate it here: In a perfect world, once the Now consider two cases separately: when there's a lot of contention on contention on a: the first batch will basically spend a lot of time in contention handling. So it does not actually "ever" write the staging txn record. In the meantime, the second batch easily got through. Someone might try to push the intent at contention on b: this time the staging record shows up right away. Someone might content on |
Now in a "truly ideal" world, we'd know "exactly" (to the degree that this is possible on the kvcoord without talking to kvserver) when to start the hb loop, no? We want to start the heartbeat loop when we split the batch in DistSender (or when we know for other reasons that this won't be a 1PC txn, etc). The problem is really that the hb loop is owned by TxnCoordSender, and it doesn't know what DistSender is doing. It seems that in the current proposal we are trying, maybe a bit too hard, to bend over backwards to accommodate the way things are currently set up. In a sense, if we squint a bit and say that |
It sounds like we are still vulnerable if the range where the txn record needs to be written also has other writes that are contended, while a different range with writes has no contention. On retry we should also be splitting of the txn record write into its own batch so that it is written without contention otherwise it would repeat the same cycle. |
No, if we start a heartbeat loop that will create the txn record, even if the batch that wants to make it STAGING is held up. |
(and if we do something like #57042 (comment) then the only case in which we don't start the hb loop is that in which the RPC actually hits a single range, so intents will never be visible before the txn record) |
It needs to be above some of the other interceptors to process |
Have looked into the DistSender/TxnCoordSender coordination here a bit. I think the best approach would be to have If this doesn't work out for whatever reason, plan B is to use a |
This is an interesting idea. A slight variation of it would be to keep the responsibility for splitting a batch in the DistSender, but to inform the DistSender of when a BatchRequest is sent for a txn that is not yet heartbeating - which we actually already do with This may have an impact on performance, we'd want to check. The most trivial workload that would be impacted by this back and forth would be single-row inserts into a 2 index table: I think it would also be worth doing an experiment on how expensive it would be to always assume a transaction could have sent a heartbeat and always set Also, if it wasn't cheap enough, we may be able to do something with the timestamp cache to make the |
Yeah, that's an interesting variation, thanks! I'll try a few different approaches and see what works.
I really like this suggestion, didn't realize that was an option (since we'd presumably have tried it before and chosen to disable heartbeats instead). Will give it a shot. |
Initial results are promising, on
I'm going to verify this and do some more benchmarks (notably with kv0), but it looks like this may be viable. |
Doesn't seem to have any significant effect (new always starts heartbeat loop):
@tbg On the issue, you said "starting a HB loop for an actual 1PC txn has some thorny edge cases", I'd like to know more about this. |
I think the progression was the other way around, which may be why we never tested this option thoroughly. For the longest time, it wasn't possible for a transaction record to exist for a 1PC transaction. This changed when we introduced unreplicated locks, like the ones we acquire during the initial row scan of an UPDATE. With their addition, we "revealed" the existence of a transaction that could eventually perform a 1 phase commit to others earlier, so other txns would try to abort it. So we then had to start the txn loop for these txns. We realized after that that it was a serious bug to have a transaction with a txn record perform a 1 phase commit. But we also didn't want to pessimize existing 1PC txns and assumed the transaction record read would be too expensive. So we did our best to perform this Your experiments indicate that this assumption may have been incorrect. |
In https://github.com/cockroachlabs/support/issues/674 we see some contending queries thrashing and constantly retrying. These queries are
delete from system.jobs where id=any(...list of 1000 ids)
. We attempt to run this query as a 1PC. The situation would probably be greatly improved if these queries would have a txn record, and thus they wouldn't constantly push/abort each other.We already have code that splits up the
EndTxn
from the rest of the batch at theDistSender
level whenepoch > 0
. We should do something similar for the heartbeat interceptor, and start heartbeating on the first write whenepoch > 0
. We need to pay attention to set theEndTxn.TxnHeartbeating
flag.We probably also want to cover the
TransactionAbortedError
cases with this policy, so one thing we've discussed is finding a way to start transactions resulting from aTransactionAbortedError
at a higher epoch - basically use the epoch as a retry counter across retry types.cc @nvanbenschoten
Epic: CRDB-8282
gz#9005
The text was updated successfully, but these errors were encountered: