Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: we should collapse the refresh spans when exceeding the memory budget #46095

Closed
andreimatei opened this issue Mar 13, 2020 · 0 comments · Fixed by #46275
Closed

kv: we should collapse the refresh spans when exceeding the memory budget #46095

andreimatei opened this issue Mar 13, 2020 · 0 comments · Fixed by #46275
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@andreimatei
Copy link
Contributor

The memory budget is kv.transaction.max_refresh_spans_bytes. Once a transaction exceeds it, it can't refresh any more. Combined with a short closed timestamp duration, this means that such transactions are likely to have to refresh.
Instead, we should do what we do for write footprint tracking, and start collapsing adjacent spans when we're running out of budget. This can introduce false conflicts, but it'd gives a chance to succeed at refreshing (hopefully, a good one in common cases).

cc @ajwerner @nvanbenschoten

@andreimatei andreimatei added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 13, 2020
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 18, 2020
Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes cockroachdb#46095

Release note: Transsactions reading a lot of data behave better when
exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should
no longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 24, 2020
Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes cockroachdb#46095

Release note (general change): Transactions reading a lot of data behave
better when exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should no
longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 31, 2020
Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes cockroachdb#46095

Release note (general change): Transactions reading a lot of data behave
better when exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should no
longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.
craig bot pushed a commit that referenced this issue Mar 31, 2020
45920: UI Telemetry for Statements r=dhartunian a=nathanstilwell

fixes #45506

- [x] Changing sort order (want to see which column and asc vs desc)
- [x] Search
- [x] Clicking to paginate
- [x] Diagnostic bundle activations (capturing high level statement performance information as part of the event is nice, e.g. latency or execution count; we should scrub the actual fingerprint)

Adding a tracking function to [`analytics.ts`]() to send analytics payloads Segment.io. I begin by adding tracking calls to interesting events on the [Statements]() page of the Admin UI. The events being tracked are as follows,

### Table Sort
This event is fired when the sorting order is changed by clicking a column header on the Statements page. 

![statements-sort-order](https://user-images.githubusercontent.com/397448/77473575-0a419100-6dec-11ea-850d-8663af4ccc37.gif)
```
{
  userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e'
  event: 'Table Sort',
  properties: {
    columnName: 'Txn Type',
    pagePath: '/statements',
    sortDirection: 'desc',
    tableName: 'statements-table'
  }
}
```

### Search
This event is fired when a the statements are filtered using a search term.

![statements-search](https://user-images.githubusercontent.com/397448/77474910-5097ef80-6dee-11ea-828c-43b387354327.gif)

```
{
  userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e',
  event: 'Search',
  properties: {
    numberOfResults: 17,
    pagePath: '/statements',
    searchTerm: 'system'
  }
}
```

### Paginate
This event is fired when the user interacts with pagination on the statements page.
![statements-pagination](https://user-images.githubusercontent.com/397448/77474995-7ae9ad00-6dee-11ea-948f-00fcc500438a.gif)

```
{
  userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e',
  event: 'Paginate',
  properties: {
    pagePath: '/statements/',
    selectedPage: 4
  }
}
```

### Diagnostics Activation
This event it tracked when a user clicks "Activate" on the diagnostics activation modal.
![statements-diagnostics-activation](https://user-images.githubusercontent.com/397448/77475062-95bc2180-6dee-11ea-96c6-d72c91915000.gif)

```
{
  userId: 'ac7aafbc-1a79-4a5b-bc60-c1221cf80e1e',
  event: 'Diagnostics Activation',
  properties: {
    fingerprint: 'SELECT blah, blah FROM blah.blah WHERE blah blah blah'
    pagePath: '/statements/'
  }
}
```


46190: ui: removed metric function `setDefaultTime` r=dhartunian a=elkmaster

Removed setter the default graph time window based on the age of the older node in the cluster because we have default redux scale. That fixed our problem when we using the time dropdown on the metrics page and clicking on "10m" we get result with "6h" duration.

Resolves: #46145

Release justification: bug fixes and low-risk updates to new functionality

Release note (ui): default timescale on metrics page is always 10m. previously it defaulted to the age of the longest running node

46275: kvcoord: condense read spans when they exceed the memory limit r=andreimatei a=andreimatei

Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes #46095

Release note (general change): Transactions reading a lot of data behave
better when exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should no
longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.

46557: ui: Jobs / Statements description tooltip r=dhartunian a=elkmaster

Updated job description tool tip to truncate at around ~425 characters
Updated tooltip to 500px wide

Resolves: #46078

Release justification: bug fixes and low-risk updates to new functionality

Release note (ui): tooltips showing statements and jobs are limited in size for very long statements

46588: sql: use user transaction if we have one to prepare queries r=andreimatei a=ajwerner

Prepartion of certain queries requires performing reads against the database.
If the user has already laid down intents, these reads may become part of a
dependency cycle. Prior to this commit, these reads would be on a different
transaction and thus this cycle would not be detected by our deadlock detection
mechanism.

This change opts to use the user's transaction for planning if there is one
and thus will properly interact with deadlock detection.

Fixes #46447.

Release justification: fixes for high-priority or high-severity bugs in
existing functionality

Release note: None

46700: sql: make aggregate builtins share the same memory account r=yuzefovich a=yuzefovich

Release justification: fixes for high-priority or high-severity
bugs in existing functionality (we could hit memory limit due to
accounting long before the available RAM is actually used up).

We recently fixed a couple of leaks in memory accounting by aggregate
builtins. It was done in the same way that other similar aggregates were
doing - by instantiating a separate memory account for each aggregate
struct. However, when a single aggregate, say `anyNotNull`, wants to
store a single datum, say `DInt` of size 8 bytes, when growing its own
memory account will actually make a reservation of
`mon.DefaultPoolAllocation = 10240` bytes although we will only be using
8 bytes. This can result in "memory-starvation" for OLAP-y queries
because we're likely to hit `max-sql-memory` limit long before we're
getting close to it because of such "overestimation" in the accounting.

This commit fixes this problem by making all aggregates that aggregate
a single datum (these are all aggregate builtins that perform memory
accounting except for `arrayAgg` which works with multiple datums) to
share the same memory account (when non-nil) which is plumbed via
`tree.EvalContext` (this is unfortunate but, as always, seems like
necessary evil). That account is instantiated by `rowexec.aggregator`
and `rowexec.windower` processors. Also it is acceptable from the
concurrency's point of view because the processors run in a single
goroutine, so we will never have concurrent calls to grow/shrink this
shared memory account.

If for some reason the field for shared memory account is nil in the
eval context, then we will fallback to old behavior of instantiating
a memory account for each aggregate builtin struct. A helper struct was
created (that is now embedded by all aggregate builtins in question)
that unifies the memory accounting.

Fixes: #46664.

Release note: None (a follow-up to a recent PR).

46780: ui: Default sort by Execution Count column for Statements r=dhartunian a=koorosh

Resolves: #46427

Before, default sorting was set to Latency column in
Statements page that was unintuitive.
Now it is sorted by Execution count column.

Release note (admin ui change): Change default sorting column on Statements
page to Execution Count

Release justification: bug fixes and low-risk updates to new functionality

<img width="1421" alt="Screenshot 2020-03-31 at 14 56 52" src="https://user-images.githubusercontent.com/3106437/78023771-edfba200-735f-11ea-93e6-db77c2582a00.png">


46784: roachprod: Update Slack DM user lookup r=bobvawter a=bobvawter

This change switches to finding users by their email addresses.  It also logs
any DM-lookup failures instead of silently ignoring them.

X-Ref: https://github.com/cockroachdb/dev-inf/issues/65

Release note: None

Co-authored-by: Nathan Stilwell <[email protected]>
Co-authored-by: Vlad Los <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrii Vorobiov <[email protected]>
Co-authored-by: Bob Vawter <[email protected]>
@craig craig bot closed this as completed in d230743 Mar 31, 2020
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 31, 2020
Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes cockroachdb#46095

Release note (general change): Transactions reading a lot of data behave
better when exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should no
longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.
craig bot pushed a commit that referenced this issue Mar 31, 2020
46803: release-20.1: kvcoord: condense read spans when they exceed the memory limit r=andreimatei a=andreimatei

Backport 7/7 commits from #46275.

/cc @cockroachdb/release

---

Before this patch, once a transaction exceeds the
kv.transaction.max_refresh_spans_bytes limit, it stopped tracking reads
and it didn't attempt to refresh any more when pushed.
This patch make the span refresher condense the spans when it runs out
of memory instead. So we'll get bigger spans and potentially false
conflicts, but at least we have a chance at refreshing. In particular,
it'll succeed if there's no writes anywhere.

The condensing is performed using the condensableSpanSet, like we do in
the pipeliner interceptor for the tracking of write intents. Internally,
that guy condenses spans in ranges with lots of reads.

We've seen people run into kv.transaction.max_refresh_spans_bytes in the
past, so this should help many uses cases. But in particular I've
written this patch because, without it, I'm scared about the effects of
20.1's reduction in the closed timestamp target duration to 3s from a
previous 30s. Every transaction writing something after having run for
longer than that will get pushed, so being able to refresh is getting
more important.

Fixes #46095

Release note (general change): Transactions reading a lot of data behave
better when exceeding the memory limit set by
kv.transaction.max_refresh_spans_bytes. Such transactions now attempt to
resolve the conflicts they run into instead of being forced to always
retry. Increasing kv.transaction.max_refresh_spans_bytes should no
longer be necessary for most workloads.

Release justification: fix for new "functionality" - the reduction in
the closed timestamp target duration.


Co-authored-by: Andrei Matei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant