-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: we should collapse the refresh spans when exceeding the memory budget #46095
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
andreimatei
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Mar 13, 2020
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Mar 18, 2020
Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes cockroachdb#46095 Release note: Transsactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Mar 24, 2020
Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes cockroachdb#46095 Release note (general change): Transactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration.
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Mar 31, 2020
Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes cockroachdb#46095 Release note (general change): Transactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration.
craig bot
pushed a commit
that referenced
this issue
Mar 31, 2020
45920: UI Telemetry for Statements r=dhartunian a=nathanstilwell fixes #45506 - [x] Changing sort order (want to see which column and asc vs desc) - [x] Search - [x] Clicking to paginate - [x] Diagnostic bundle activations (capturing high level statement performance information as part of the event is nice, e.g. latency or execution count; we should scrub the actual fingerprint) Adding a tracking function to [`analytics.ts`]() to send analytics payloads Segment.io. I begin by adding tracking calls to interesting events on the [Statements]() page of the Admin UI. The events being tracked are as follows, ### Table Sort This event is fired when the sorting order is changed by clicking a column header on the Statements page. ![statements-sort-order](https://user-images.githubusercontent.com/397448/77473575-0a419100-6dec-11ea-850d-8663af4ccc37.gif) ``` { userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e' event: 'Table Sort', properties: { columnName: 'Txn Type', pagePath: '/statements', sortDirection: 'desc', tableName: 'statements-table' } } ``` ### Search This event is fired when a the statements are filtered using a search term. ![statements-search](https://user-images.githubusercontent.com/397448/77474910-5097ef80-6dee-11ea-828c-43b387354327.gif) ``` { userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e', event: 'Search', properties: { numberOfResults: 17, pagePath: '/statements', searchTerm: 'system' } } ``` ### Paginate This event is fired when the user interacts with pagination on the statements page. ![statements-pagination](https://user-images.githubusercontent.com/397448/77474995-7ae9ad00-6dee-11ea-948f-00fcc500438a.gif) ``` { userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e', event: 'Paginate', properties: { pagePath: '/statements/', selectedPage: 4 } } ``` ### Diagnostics Activation This event it tracked when a user clicks "Activate" on the diagnostics activation modal. ![statements-diagnostics-activation](https://user-images.githubusercontent.com/397448/77475062-95bc2180-6dee-11ea-96c6-d72c91915000.gif) ``` { userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e', event: 'Diagnostics Activation', properties: { fingerprint: 'SELECT blah, blah FROM blah.blah WHERE blah blah blah' pagePath: '/statements/' } } ``` 46190: ui: removed metric function `setDefaultTime` r=dhartunian a=elkmaster Removed setter the default graph time window based on the age of the older node in the cluster because we have default redux scale. That fixed our problem when we using the time dropdown on the metrics page and clicking on "10m" we get result with "6h" duration. Resolves: #46145 Release justification: bug fixes and low-risk updates to new functionality Release note (ui): default timescale on metrics page is always 10m. previously it defaulted to the age of the longest running node 46275: kvcoord: condense read spans when they exceed the memory limit r=andreimatei a=andreimatei Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes #46095 Release note (general change): Transactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration. 46557: ui: Jobs / Statements description tooltip r=dhartunian a=elkmaster Updated job description tool tip to truncate at around ~425 characters Updated tooltip to 500px wide Resolves: #46078 Release justification: bug fixes and low-risk updates to new functionality Release note (ui): tooltips showing statements and jobs are limited in size for very long statements 46588: sql: use user transaction if we have one to prepare queries r=andreimatei a=ajwerner Prepartion of certain queries requires performing reads against the database. If the user has already laid down intents, these reads may become part of a dependency cycle. Prior to this commit, these reads would be on a different transaction and thus this cycle would not be detected by our deadlock detection mechanism. This change opts to use the user's transaction for planning if there is one and thus will properly interact with deadlock detection. Fixes #46447. Release justification: fixes for high-priority or high-severity bugs in existing functionality Release note: None 46700: sql: make aggregate builtins share the same memory account r=yuzefovich a=yuzefovich Release justification: fixes for high-priority or high-severity bugs in existing functionality (we could hit memory limit due to accounting long before the available RAM is actually used up). We recently fixed a couple of leaks in memory accounting by aggregate builtins. It was done in the same way that other similar aggregates were doing - by instantiating a separate memory account for each aggregate struct. However, when a single aggregate, say `anyNotNull`, wants to store a single datum, say `DInt` of size 8 bytes, when growing its own memory account will actually make a reservation of `mon.DefaultPoolAllocation = 10240` bytes although we will only be using 8 bytes. This can result in "memory-starvation" for OLAP-y queries because we're likely to hit `max-sql-memory` limit long before we're getting close to it because of such "overestimation" in the accounting. This commit fixes this problem by making all aggregates that aggregate a single datum (these are all aggregate builtins that perform memory accounting except for `arrayAgg` which works with multiple datums) to share the same memory account (when non-nil) which is plumbed via `tree.EvalContext` (this is unfortunate but, as always, seems like necessary evil). That account is instantiated by `rowexec.aggregator` and `rowexec.windower` processors. Also it is acceptable from the concurrency's point of view because the processors run in a single goroutine, so we will never have concurrent calls to grow/shrink this shared memory account. If for some reason the field for shared memory account is nil in the eval context, then we will fallback to old behavior of instantiating a memory account for each aggregate builtin struct. A helper struct was created (that is now embedded by all aggregate builtins in question) that unifies the memory accounting. Fixes: #46664. Release note: None (a follow-up to a recent PR). 46780: ui: Default sort by Execution Count column for Statements r=dhartunian a=koorosh Resolves: #46427 Before, default sorting was set to Latency column in Statements page that was unintuitive. Now it is sorted by Execution count column. Release note (admin ui change): Change default sorting column on Statements page to Execution Count Release justification: bug fixes and low-risk updates to new functionality <img width="1421" alt="Screenshot 2020-03-31 at 14 56 52" src="https://user-images.githubusercontent.com/3106437/78023771-edfba200-735f-11ea-93e6-db77c2582a00.png"> 46784: roachprod: Update Slack DM user lookup r=bobvawter a=bobvawter This change switches to finding users by their email addresses. It also logs any DM-lookup failures instead of silently ignoring them. X-Ref: https://github.com/cockroachdb/dev-inf/issues/65 Release note: None Co-authored-by: Nathan Stilwell <[email protected]> Co-authored-by: Vlad Los <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrii Vorobiov <[email protected]> Co-authored-by: Bob Vawter <[email protected]>
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Mar 31, 2020
Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes cockroachdb#46095 Release note (general change): Transactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration.
craig bot
pushed a commit
that referenced
this issue
Mar 31, 2020
46803: release-20.1: kvcoord: condense read spans when they exceed the memory limit r=andreimatei a=andreimatei Backport 7/7 commits from #46275. /cc @cockroachdb/release --- Before this patch, once a transaction exceeds the kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads and it didn't attempt to refresh any more when pushed. This patch make the span refresher condense the spans when it runs out of memory instead. So we'll get bigger spans and potentially false conflicts, but at least we have a chance at refreshing. In particular, it'll succeed if there's no writes anywhere. The condensing is performed using the condensableSpanSet, like we do in the pipeliner interceptor for the tracking of write intents. Internally, that guy condenses spans in ranges with lots of reads. We've seen people run into kv.transaction.max_refresh_spans_bytes in the past, so this should help many uses cases. But in particular I've written this patch because, without it, I'm scared about the effects of 20.1's reduction in the closed timestamp target duration to 3s from a previous 30s. Every transaction writing something after having run for longer than that will get pushed, so being able to refresh is getting more important. Fixes #46095 Release note (general change): Transactions reading a lot of data behave better when exceeding the memory limit set by kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to resolve the conflicts they run into instead of being forced to always retry. Increasing kv.transaction.max_refresh_spans_bytes should no longer be necessary for most workloads. Release justification: fix for new "functionality" - the reduction in the closed timestamp target duration. Co-authored-by: Andrei Matei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
The memory budget is
kv.transaction.max_refresh_spans_bytes
. Once a transaction exceeds it, it can't refresh any more. Combined with a short closed timestamp duration, this means that such transactions are likely to have to refresh.Instead, we should do what we do for write footprint tracking, and start collapsing adjacent spans when we're running out of budget. This can introduce false conflicts, but it'd gives a chance to succeed at refreshing (hopefully, a good one in common cases).
cc @ajwerner @nvanbenschoten
The text was updated successfully, but these errors were encountered: