Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: want cluster setting to refuse transactions that exceed the intent tracking memory budget #66742

Closed
andreimatei opened this issue Jun 22, 2021 · 0 comments · Fixed by #66927
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) N-followup Needs followup. O-postmortem Originated from a Postmortem action item. T-kv KV Team

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Jun 22, 2021

When a transaction exceeds its memory budget for tracking locks (kv.transaction.max_intents_bytes), it starts collapsing those locks into lock spans. If that happens, cleaning up the locks at commit or rollback time becomes a potentially very expensive operation. Although we try to be a bit smart about the way in which we collapse these locks, we keep seeing signs that ResolveIntentRanges that end up being sent to clean up these collapsed ranges can be very expensive - both in terms of work they have to do (scan wide key spans), and in terms of the wide latches they take.

Many clients would prefer that the cluster protects itself from potentially falling off of this performance cliff by simply rejecting transactions at the point when they exceed the memory budget. So, we should give them this option.

There's many improvements we can make to ranged intent resolution (e.g. execute it without latches, make it use the lock table), or to the memory budget (share the budget across a node so that a single transaction can grow much larger absent other concurrent large txns), but still, it seems that users with good control over their applications might still prefer simply rejecting this class of transactions.

Epic: CRDB-8282

gz#9005

@andreimatei andreimatei added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-client Relating to the KV client and the KV interface. O-postmortem Originated from a Postmortem action item. T-kv KV Team labels Jun 22, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 24, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget.  If
set, this changes our behavior when a txn exceeds its span budget
(kv.transaction.max_intents_bytes): instead of compacting some of its
lock spans with precision loss, the request causing the budget to be
exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
@lunevalex lunevalex added the N-followup Needs followup. label Jun 25, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 26, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 2, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 20, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 21, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 21, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
craig bot pushed a commit that referenced this issue Jul 21, 2021
66927: kvcoord: setting to reject txns above lock span limit r=andreimatei a=andreimatei

This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes #66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.

67444: rangecache: add a gcassert:noescape r=jordanlewis a=jordanlewis

#66374 made some changes to the rangecache to avoid allocations.

https://github.com/jordanlewis/gcassert just learned the `//gcassert:noescape` annotation, so upgrade the library, add the annotation to one of the spots that we don't want to escape, and add the rangecache package to the list of packages checked with gcassert.

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Jordan Lewis <[email protected]>
@craig craig bot closed this as completed in 66903db Jul 21, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 23, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jul 23, 2021
This patch introduces kv.transaction.reject_over_max_intents_budget. If
set, this changes our behavior when a txn exceeds its locks+in-flight
write budget (kv.transaction.max_intents_bytes): instead of compacting
some of its lock spans with precision loss, the request causing the
budget to be exceeded will be rejected instead.

The idea is that we've seen transactions that exceed this budget be very
expensive to clean up - they have to scan a lot to find their intents,
and these cleanups take wide latches. So now one has the option to
reject these transactions, instead of risking this performance cliff.

Each request is checked against the budget by the pipeliner before being
sent out for evaluation. This check is not precise, since the exact
effects of the request on the memory budget are only known at response
time because of ResumeSpans, effects of QueryIntents, etc. So, the check
is best-effort. If a slips through and then the response overflows the
budget, we keep the locks non-condensed; if a further request in the txn
tries to lock more, it'll be rejected. A commit/rollback is
always allowed to pass through, since it doesn't lock anything by
itself.

Fixes cockroachdb#66742

Release note (general change): A new cluster setting
(kv.transaction.reject_over_max_intents_budget) affords control over the
behavior when a transaction exceeds its "locks-tracking memory budget"
(dictated by kv.transaction.max_intents_bytes). Instead of allowing such
transaction to continue with imprecise tracking of their locks, setting
this new option rejects the query that would push its transaction over
this budget with an error (error code 53400 - "configuration limit
exceeded). Transactions that don't track their locks precisely are
potentially destabilizing for the cluster since cleaning them up can
take considerable resources. Transactions that change many rows have the
potential to run into this memory budget issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) N-followup Needs followup. O-postmortem Originated from a Postmortem action item. T-kv KV Team
Projects
None yet
2 participants