Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce stats allocations in optimizer #80186

Closed
rharding6373 opened this issue Apr 19, 2022 · 0 comments · Fixed by #86460
Closed

Reduce stats allocations in optimizer #80186

rharding6373 opened this issue Apr 19, 2022 · 0 comments · Fixed by #86460
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team

Comments

@rharding6373
Copy link
Collaborator

rharding6373 commented Apr 19, 2022

Since the optimizer uses the table stat AvgSize to cost scans and index joins, it stores all the table stats in cases where it previously did not. We could reduce the allocations by either discarding and not storing these stats, or storing fewer stats when AvgSize is needed.

See #78592 (comment)

Jira issue: CRDB-15817

@rharding6373 rharding6373 added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Apr 19, 2022
@rharding6373 rharding6373 self-assigned this Apr 19, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 19, 2022
@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
@mgartner mgartner assigned mgartner and unassigned rharding6373 Jun 28, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Aug 19, 2022
Prior to the commit, a column's average size in bytes was included in
column statistics. To fetch this average size, the coster requested an
individual column statistic each scanned column. For scans and joins
involving many columns, this caused many allocations of column
statistics and column sets.

Because we only use a column's average size when costing scans and
lookup joins, there was no need to include it in column statistics.
Average size doesn't propagate up an expression tree like other
statistics do.

This commit removes average size from column statistics and instead
builds a map in `props.Statistics` that maps column IDs to average size.
This significantly reduces allocations in some cases.

The only downside to this change is that we no longer set a columns
average size to zero if it has all NULL values, according to statistics.
I believe this is a pretty rare edge case that is unlikely to
significantly affect query plans, so I think the trade-off is worth it.

Fixes cockroachdb#80186

Release justification: This is a minor change that improves optimizer
performance.

Release note: None
craig bot pushed a commit that referenced this issue Aug 22, 2022
86460: opt: reduce statistics allocations for avg size r=mgartner a=mgartner

Prior to the commit, a column's average size in bytes was included in
column statistics. To fetch this average size, the coster requested an
individual column statistic each scanned column. For scans and joins
involving many columns, this caused many allocations of column
statistics and column sets.

Because we only use a column's average size when costing scans and
lookup joins, there was no need to include it in column statistics.
Average size doesn't propagate up an expression tree like other
statistics do.

This commit removes average size from column statistics and instead
builds a map in `props.Statistics` that maps column IDs to average size.
This significantly reduces allocations in some cases.

The only downside to this change is that we no longer set a columns
average size to zero if it has all NULL values, according to statistics.
I believe this is a pretty rare edge case that is unlikely to
significantly affect query plans, so I think the trade-off is worth it.

Fixes #80186

Release justification: This is a minor change that improves optimizer
performance.

Release note: None


86528: storage: add default-off setting for MVCC range tombstones r=msbutler,nicktrav a=erikgrinaker

This patch adds the default-off cluster setting
`storage.mvcc.range_tombstones.enabled` to control whether or not to
write MVCC range tombstones. The setting is internal and system-only.
The read path is always active, this only determines whether KV clients
should write them.

A helper function `CanUseMVCCRangeTombstones()` has also been added.
Callers have not yet been updated to respect this.

Note that any in-flight jobs may not pick up this change, so these need
to be waited out before being certain that the setting has taken effect.

If disabled after being enabled, this will prevent new range tombstones
from being written, but already written tombstones will remain until
GCed. The above note on jobs above also applies in this case.

Release justification: bug fixes and low-risk updates to new functionality

Release note: None

86572: ui: update styles on sessions details page r=maryliag a=maryliag

The Session Details page was updated to use the
same style of summary cards as the other details
pages (e.g. statement, transaction, job).

Fixes #85257

Before
<img width="1236" alt="Screen Shot 2022-08-22 at 12 26 43 PM" src="https://user-images.githubusercontent.com/1017486/185971455-3dc7b57f-07bc-45df-94e3-f0bd7b3e541a.png">


After
<img width="1250" alt="Screen Shot 2022-08-22 at 12 26 19 PM" src="https://user-images.githubusercontent.com/1017486/185971475-1bd563f6-a596-4321-9f90-d6c68470dbb9.png">


Release justification: low risk change
Release note (ui change): New styles of summary cards
on Session Details page to align with other details pages.

86579: sql/builtins: update `Info` for `pg_get_viewdef` r=ZhouXing19 a=ZhouXing19

The pg_builtin func `pg_get_viewdef` was updated from a no-op to an actual
function long time ago, but the info field is still `notUsableInfo`, which
made the doc to miss it by mistake. This PR is to update the info, and let it
be recorded in `functions.md`.

Release justification: bug fix, update a builtin's visibility in docs.
Release note: none

86586: README: make sure roachprod/roachtest docs use dev, not make r=rail a=rickystewart

Release justification: Non-production code changes
Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in 38cc71e Aug 22, 2022
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants