-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient: want cluster setting to refuse transactions that exceed the intent tracking memory budget #66742
Labels
A-kv-client
Relating to the KV client and the KV interface.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
N-followup
Needs followup.
O-postmortem
Originated from a Postmortem action item.
T-kv
KV Team
Comments
andreimatei
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-client
Relating to the KV client and the KV interface.
O-postmortem
Originated from a Postmortem action item.
T-kv
KV Team
labels
Jun 22, 2021
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jun 24, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its span budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jun 26, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 2, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 20, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 21, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 21, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
craig bot
pushed a commit
that referenced
this issue
Jul 21, 2021
66927: kvcoord: setting to reject txns above lock span limit r=andreimatei a=andreimatei This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes #66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue. 67444: rangecache: add a gcassert:noescape r=jordanlewis a=jordanlewis #66374 made some changes to the rangecache to avoid allocations. https://github.com/jordanlewis/gcassert just learned the `//gcassert:noescape` annotation, so upgrade the library, add the annotation to one of the spots that we don't want to escape, and add the rangecache package to the list of packages checked with gcassert. Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Jordan Lewis <[email protected]>
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 23, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Jul 23, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If set, this changes our behavior when a txn exceeds its locks+in-flight write budget (kv.transaction.max_intents_bytes): instead of compacting some of its lock spans with precision loss, the request causing the budget to be exceeded will be rejected instead. The idea is that we've seen transactions that exceed this budget be very expensive to clean up - they have to scan a lot to find their intents, and these cleanups take wide latches. So now one has the option to reject these transactions, instead of risking this performance cliff. Each request is checked against the budget by the pipeliner before being sent out for evaluation. This check is not precise, since the exact effects of the request on the memory budget are only known at response time because of ResumeSpans, effects of QueryIntents, etc. So, the check is best-effort. If a slips through and then the response overflows the budget, we keep the locks non-condensed; if a further request in the txn tries to lock more, it'll be rejected. A commit/rollback is always allowed to pass through, since it doesn't lock anything by itself. Fixes cockroachdb#66742 Release note (general change): A new cluster setting (kv.transaction.reject_over_max_intents_budget) affords control over the behavior when a transaction exceeds its "locks-tracking memory budget" (dictated by kv.transaction.max_intents_bytes). Instead of allowing such transaction to continue with imprecise tracking of their locks, setting this new option rejects the query that would push its transaction over this budget with an error (error code 53400 - "configuration limit exceeded). Transactions that don't track their locks precisely are potentially destabilizing for the cluster since cleaning them up can take considerable resources. Transactions that change many rows have the potential to run into this memory budget issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-client
Relating to the KV client and the KV interface.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
N-followup
Needs followup.
O-postmortem
Originated from a Postmortem action item.
T-kv
KV Team
When a transaction exceeds its memory budget for tracking locks (
kv.transaction.max_intents_bytes
), it starts collapsing those locks into lock spans. If that happens, cleaning up the locks at commit or rollback time becomes a potentially very expensive operation. Although we try to be a bit smart about the way in which we collapse these locks, we keep seeing signs thatResolveIntentRanges
that end up being sent to clean up these collapsed ranges can be very expensive - both in terms of work they have to do (scan wide key spans), and in terms of the wide latches they take.Many clients would prefer that the cluster protects itself from potentially falling off of this performance cliff by simply rejecting transactions at the point when they exceed the memory budget. So, we should give them this option.
There's many improvements we can make to ranged intent resolution (e.g. execute it without latches, make it use the lock table), or to the memory budget (share the budget across a node so that a single transaction can grow much larger absent other concurrent large txns), but still, it seems that users with good control over their applications might still prefer simply rejecting this class of transactions.
Epic: CRDB-8282
gz#9005
The text was updated successfully, but these errors were encountered: