-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: closed timestamp can starve read-write txns that take longer than 600ms #51294
Comments
While stressing kvnemesis with merges enabled for #50265 (though I can reproduce this failure mode with and without that PR), I ran into timeouts that seem to be caused by roughly what this issue describes. cockroach/pkg/kv/kvnemesis/kvnemesis_test.go Lines 71 to 74 in 053ec8b
I was using a relatively low closed timestamp duration (500ms) because that was more likely to lead to scenarios like the one that PR fixes. I noticed that txns that scanned a lot of keys and then wrote something would refresh/retry a lot (what this issue talks about). However, the logs also indicate that a ton of time was being spent pushing conflicting transactions. and also that the sort of transactions that can end up in this endless retry loop seem to also have long queues of pusher txns waiting on them. I saw over 1000 pending txns in my logs waiting on one such endlessly-retrying txn (highest epoch I saw for it was ~50). Maybe I’m missing something here, but I don’t see anything preventing this sort of transitive starvation (of the waiting pushers). The When I don’t use a low closed timestamp, I can’t reproduce this failure mode. Edit: Here is a log file from one such run in case someone is interested: |
This is tangential to the issue, but I would have guessed that readers at lower timestamps get unblocked when the pushee bumps its timestamp. Is that not the case? |
It's not happening in that |
@nvanbenschoten what's the story with -^ ? |
If the pushee's timestamp changes due to a push then yes, all txn's in the However, if the pushee gets bumped due to the closed timestamp or a timestamp cache while issuing one of its requests then no one in the
Yes, anyone waiting the the |
We need to do something here for v20.2 and maybe consider backporting something to v20.1. I suggest the following:
With these in place, the timeline of the offending query would look something like the following and we would avoid the indefinite starvation regardless of the closed timestamp duration:
Option 1 from above is also something we can consider, although we'd need to weigh its trade-offs. |
Relates to cockroachdb#51294. This commit adds logic to the txnSpanRefresher to preemptively perform refreshes before issuing requests when doing is guaranteed to be beneficial. We perform a preemptive refresh if either a) doing so would be free because we have not yet accumulated any refresh spans, or b) the batch contains a committing EndTxn request that we know will be rejected if issued. The first case is straightforward. If the transaction has yet to perform any reads but has had its write timestamp bumped, refreshing is a trivial no-op. In this case, refreshing eagerly prevents the transaction for performing any future reads at its current read timestamp. Not doing so preemptively guarantees that we will need to perform a real refresh in the future if the transaction ever performs a read. At best, this would be wasted work. At worst, this could result in the future refresh failing. So we might as well refresh preemptively while doing so is free. Note that this first case here does NOT obviate the need for server-side refreshes. Notably, a transaction's write timestamp might be bumped in the same batch in which it performs its first read. In such cases, a preemptive refresh would not be needed but a reactive refresh would not be a trivial no-op. These situations are common for one-phase commit transactions. The second case is more complex. If the batch contains a committing EndTxn request that we know will need a refresh, we don't want to bother issuing it just for it to be rejected. Instead, preemptively refresh before issuing the EndTxn batch. If we view reads as acquiring a form of optimistic read locks under an optimistic concurrency control scheme (as is discussed in the comment on txnSpanRefresher) then this preemptive refresh immediately before the EndTxn is synonymous with the "validation" phase of a standard OCC transaction model. However, as an optimization compared to standard OCC, the validation phase is only performed when necessary in CockroachDB (i.e. if the transaction's writes have been pushed to higher timestamps). This second case will play into the solution for cockroachdb#51294. Now that we perform these preemptive refreshes when possible, we know that it is always to right choice to split off the EndTxn from the rest of the batch during a txnSpanRefresher auto-retry. Without this change, it was unclear whether the first refresh of an EndTxn batch was caused by earlier requests in the transaction or by other requests in the current batch. Now, it is always caused by other requests in the same batch, so it is always clear that we should split the EndTxn from the rest of the batch immediately after the first refresh. Release notes (performance improvement): validation of optimistic reads is now performed earlier in transactions when doing so can save work. This eliminates certain types of transaction retry errors and avoids wasted RPC traffic.
Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.
Relates to cockroachdb#51294. This commit adds logic to the txnSpanRefresher to preemptively perform refreshes before issuing requests when doing is guaranteed to be beneficial. We perform a preemptive refresh if either a) doing so would be free because we have not yet accumulated any refresh spans, or b) the batch contains a committing EndTxn request that we know will be rejected if issued. The first case is straightforward. If the transaction has yet to perform any reads but has had its write timestamp bumped, refreshing is a trivial no-op. In this case, refreshing eagerly prevents the transaction for performing any future reads at its current read timestamp. Not doing so preemptively guarantees that we will need to perform a real refresh in the future if the transaction ever performs a read. At best, this would be wasted work. At worst, this could result in the future refresh failing. So we might as well refresh preemptively while doing so is free. Note that this first case here does NOT obviate the need for server-side refreshes. Notably, a transaction's write timestamp might be bumped in the same batch in which it performs its first read. In such cases, a preemptive refresh would not be needed but a reactive refresh would not be a trivial no-op. These situations are common for one-phase commit transactions. The second case is more complex. If the batch contains a committing EndTxn request that we know will need a refresh, we don't want to bother issuing it just for it to be rejected. Instead, preemptively refresh before issuing the EndTxn batch. If we view reads as acquiring a form of optimistic read locks under an optimistic concurrency control scheme (as is discussed in the comment on txnSpanRefresher) then this preemptive refresh immediately before the EndTxn is synonymous with the "validation" phase of a standard OCC transaction model. However, as an optimization compared to standard OCC, the validation phase is only performed when necessary in CockroachDB (i.e. if the transaction's writes have been pushed to higher timestamps). This second case will play into the solution for cockroachdb#51294. Now that we perform these preemptive refreshes when possible, we know that it is always to right choice to split off the EndTxn from the rest of the batch during a txnSpanRefresher auto-retry. Without this change, it was unclear whether the first refresh of an EndTxn batch was caused by earlier requests in the transaction or by other requests in the current batch. Now, it is always caused by other requests in the same batch, so it is always clear that we should split the EndTxn from the rest of the batch immediately after the first refresh. Release notes (performance improvement): validation of optimistic reads is now performed earlier in transactions when doing so can save work. This eliminates certain types of transaction retry errors and avoids wasted RPC traffic.
52881: sql: disallow non-admin users from dropping admins r=solongordon a=solongordon Fixes #52582 Release note(sql change): Non-admin users with the CREATEROLE are no longer permitted to drop users with the admin role. 52884: kv: preemptively refresh transaction timestamps r=nvanbenschoten a=nvanbenschoten Relates to #51294. This commit adds logic to the txnSpanRefresher to preemptively perform refreshes before issuing requests when doing is guaranteed to be beneficial. We perform a preemptive refresh if either a) doing so would be free because we have not yet accumulated any refresh spans, or b) the batch contains a committing EndTxn request that we know will be rejected if issued. The first case is straightforward. If the transaction has yet to perform any reads but has had its write timestamp bumped, refreshing is a trivial no-op. In this case, refreshing eagerly prevents the transaction for performing any future reads at its current read timestamp. Not doing so preemptively guarantees that we will need to perform a real refresh in the future if the transaction ever performs a read. At best, this would be wasted work. At worst, this could result in the future refresh failing. So we might as well refresh preemptively while doing so is free. Note that this first case here does NOT obviate the need for server-side refreshes. Notably, a transaction's write timestamp might be bumped in the same batch in which it performs its first read. In such cases, a preemptive refresh would not be needed but a reactive refresh would not be a trivial no-op. These situations are common for one-phase commit transactions. The second case is more complex. If the batch contains a committing EndTxn request that we know will need a refresh, we don't want to bother issuing it just for it to be rejected. Instead, preemptively refresh before issuing the EndTxn batch. If we view reads as acquiring a form of optimistic read locks under an optimistic concurrency control scheme (as is discussed in the comment on txnSpanRefresher) then this preemptive refresh immediately before the EndTxn is synonymous with the "validation" phase of a standard OCC transaction model. However, as an optimization compared to standard OCC, the validation phase is only performed when necessary in CockroachDB (i.e. if the transaction's writes have been pushed to higher timestamps). This second case will play into the solution for #51294. Now that we perform these preemptive refreshes when possible, we know that it is always to right choice to split off the EndTxn from the rest of the batch during a txnSpanRefresher auto-retry. Without this change, it was unclear whether the first refresh of an EndTxn batch was caused by earlier requests in the transaction or by other requests in the current batch. Now, it is always caused by other requests in the same batch, so it is always clear that we should split the EndTxn from the rest of the batch immediately after the first refresh. Release notes (performance improvement): validation of optimistic reads is now performed earlier in transactions when doing so can save work. This eliminates certain types of transaction retry errors. 53074: cli: add aliases for userfile commands and switch ls to list r=miretskiy a=adityamaru Added: - `rm` for `userfile delete` - `ls` for `userfile list` Release note (cli change): Adds alias commands `ls` and `rm` for `userfile list` and `userfile delete`. 53086: sql: add privilege validation for different object types r=rohany a=RichardJCai Add logic in Validate to ensure only certain privileges can be granted on certain descriptors. Example: USAGE is invalid on tables, SELECT is invalid on types. Release note: None Co-authored-by: Solon Gordon <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: richardjcai <[email protected]>
52885: kv: split EndTxn into sub-batch on auto-retry after successful refresh r=nvanbenschoten a=nvanbenschoten Fixes #51294. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely. Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.
Now that we've dropped
kv.closed_timestamp.target_duration
down to 3s, we've seen an increase in read-write mutations that get starved by the closed timestamp. The earlier understanding was that these transactions would only be bumped by the closed timestamp if they took over 3s to commit, and would then be given 3s to refresh and complete their final batch if not. This first part is true, but the second part was a misunderstanding.If a transaction takes more than 3s it will be bumped up to the current closed timestamp's value. It will then hit a retry error on its EndTxn batch. It will refresh at the old closed timestamp value and then try to commit again. If this commit is again bumped by the closed timestamp, we can enter a refresh loop that will terminate after 5 attempts and then be kicked back to the client. Since we're only ever refreshing up to the current closed timestamp, the transaction actually only has
kv.closed_timestamp.target_duration * kv.closed_timestamp.close_fraction
=600ms
to refresh and commit.The easiest way to reproduce this is to perform a large DELETE statement in an implicit txn. If the DELETE takes long enough, it can get stuck in a starvation loop. This looks something like:
This starvation is clearly undesirable, but it doesn't seem fundamental. There seems to be a few different things we could do here to improve the situation:
So in an optimal world, step 5 would be able to ignore the closed timestamp entirely as long as its intents were previously written in step 2.
The text was updated successfully, but these errors were encountered: