Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: collect statistics over system tables #80123

Closed
irfansharif opened this issue Apr 18, 2022 · 12 comments · Fixed by #80887
Closed

sql/stats: collect statistics over system tables #80123

irfansharif opened this issue Apr 18, 2022 · 12 comments · Fixed by #80887
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Apr 18, 2022

Is your feature request related to a problem? Please describe.

We're using system tables for increasingly sophisticated usages, mostly around driving subsystems in CRDB. system.span_configurations is one example and is used to drive the span configs infrastructure. We currently don't auto-collect statistics over system tables, and when statistics are available (if triggered manually through ANALYZE system.some_table for e.g.) we ignore system table statistics. See:

if catalog.IsSystemDescriptor(table) {
// Don't try to get statistics for system tables (most importantly,
// for table_statistics itself).
return false
}

Describe the solution you'd like

For us to collect stats over system tables (we could either have tables opt-in or opt-out depending; we have some uncertainty around whether it made sense to consult stats over system.table_statistics).

Describe alternatives you've considered

Not doing anything, which has severe performance implications for the span configs infrastructure. In #78215 we attributed the lack of statistics to the reason why the optimizer chooses a full table scan for an internal query, as opposed to bounded segmented index scans. For a large number of schemas, this is going to be slow.

+cc @rytaft, @ajwerner.

Jira issue: CRDB-15838

@irfansharif irfansharif added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 18, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 18, 2022
@mgartner
Copy link
Collaborator

There's two parts to this:

  1. Unblock @irfansharif by loading system table stats into the stats cache.
  2. Auto-collect table stats for system tables.

msirek pushed a commit to msirek/cockroach that referenced this issue Apr 29, 2022
Informs cockroachdb#80123

Previously, statistics could be collected on system tables, but their
use in the optimizer for planning queries was disabled.

This was inadequate because without stats, full table scan might be
chosen by the optimizer for queries involving system tables in cases
where index access is actually cheaper. Also, it is confusing that one
could ANALYZE a system table, but the collected statistics would not be
used when querying the table, as shown in the EXPLAIN output.

To address this, this patch enables stats usage on all system tables
except system.table_statistics and system.lease, as doing so may cause
hangs. A new tenant-writable cluster setting is added,
`sql.stats.system_tables.enabled`, which when true allows statistics on
system tables to be used by the optimizer for costing and planning
queries. The default value of this setting is `true`. Collection of
statistics on `system.table_statistics` and `system.lease` has been
disabled because it would be confusing to allow stats to be collected
that would never be used by the optimizer.

Release note: none
@msirek
Copy link
Contributor

msirek commented Apr 29, 2022

@irfansharif In this issue you mention system.table_statistics. The PR I opened enables stats collection on any system table except for system.table_statistics and system.lease. I know stats on system.lease is problematic, but just assumed stats on system.table_statistics might be as well. Do you think you'll need to collect stats on system.table_statistics? If so, I might take a closer look.

@irfansharif
Copy link
Contributor Author

I don't need statistics over system.table_statistics, no; was just reporting that I observed the same hangs you did when I tried a naive form of #80761. Thanks for the PR! Feel free to close this issue out after it lands.

@mgartner
Copy link
Collaborator

Is there a reason we should automatically collect stats on system tables? Or is manual collection for system tables sufficient?

@irfansharif
Copy link
Contributor Author

Some of these system tables are driven by automatic processes/jobs. system.span_configurations for example is being driven by something the reconciler you'll see when running SHOW AUTOMATIC JOBS. I think we want automatic statistics; that table's being maintained with every schema change. In #78215 (comment) we found that without automatic collection, we'll fall back to full table scans for a critical part of the the subsystem. Does #80761 also introduce auto stats collection for these tables?

msirek pushed a commit to msirek/cockroach that referenced this issue Apr 29, 2022
Informs cockroachdb#80123

Previously, statistics could be collected on system tables, but their
use in the optimizer for planning queries was disabled.

This was inadequate because without stats, full table scan might be
chosen by the optimizer for queries involving system tables in cases
where index access is actually cheaper. Also, it is confusing that one
could ANALYZE a system table, but the collected statistics would not be
used when querying the table, as shown in the EXPLAIN output.

To address this, this patch enables stats usage on all system tables
except system.table_statistics and system.lease, as doing so may cause
hangs. A new tenant-writable cluster setting is added,
`sql.stats.system_tables.enabled`, which when true allows statistics on
system tables to be used by the optimizer for costing and planning
queries. The default value of this setting is `true`. Collection of
statistics on `system.table_statistics` and `system.lease` has been
disabled because it would be confusing to allow stats to be collected
that would never be used by the optimizer.

Release note: none
@mgartner
Copy link
Collaborator

Does #80761 also introduce auto stats collection for these tables?

It does not, but that's something we can do in a follow-up PR. Unless @msirek or @rytaft have a reason why we shouldn't enable auto-stats for system tables.

@msirek
Copy link
Contributor

msirek commented Apr 29, 2022

Unless @msirek or @rytaft have a reason why we shouldn't enable auto-stats for system tables.

I don't know of any specific reason not to enable it, but I can imagine if some system tables are small and constantly getting updates, it might trigger a lot of stats collections, and I'm not sure if the extra reads from system tables might cause contention with writes. Also, with stats on system tables changing a lot, what would the stability of query plans for internal queries be like? With no stats, nothing is changing, but with fluctuating stats, plans will change, and if they change for the worse that may be difficult to debug. Regressions in system queries may also affect system stability so could have a greater impact than regressions in user queries. Though maybe our internal queries aren't too complex. I haven't really looked at all of them.

@irfansharif
Copy link
Contributor Author

but I can imagine if some system tables are small and constantly getting updates, it might trigger a lot of stats collections, and I'm not sure if the extra reads from system tables might cause contention with writes.

Is this concern any different from small + write-heavy user tables? Or is it just that our existing system tables so far haven't functioned with auto stats, so we should tread carefully. That's fair -- at the very least I'd like system.span_configurations to be opted into the stats collection. Perhaps we should ask teams responsible for the other tables to opt-in/out.

@msirek
Copy link
Contributor

msirek commented Apr 29, 2022

Or is it just that our existing system tables so far haven't functioned with auto stats, so we should tread carefully.

Yes, mainly just the fear of adding more moving parts to the system. I don't know if anyone else would be worried about this. Having autostats on just a subset of system tables sounds like a good option, but maybe my fears are overblown. Would need to take a closer look.

@mgartner
Copy link
Collaborator

I don't think the reads on system tables will contend with writes because we use AS OF SYSTEM TIME when scanning tables for generating automatic stats:

func (r *Refresher) refreshStats(ctx context.Context, tableID descpb.ID, asOf time.Duration) error {
// Create statistics for all default column sets on the given table.
_ /* rows */, err := r.ex.Exec(
ctx,
"create-stats",
nil, /* txn */
fmt.Sprintf(
"CREATE STATISTICS %s FROM [%d] WITH OPTIONS THROTTLING %g AS OF SYSTEM TIME '-%s'",
jobspb.AutoStatsName,
tableID,
AutomaticStatisticsMaxIdleTime.Get(&r.st.SV),
asOf.String(),
),
)
return err
}

I'm hesitant about introducing this too. We could put it behind a setting that defaults to off. If we got confidence in it before the 22.2 release we could switch it to on by default.

@msirek
Copy link
Contributor

msirek commented Apr 29, 2022

If we got confidence in it before the 22.2 release we could switch it to on by default.

Or we could default it to on at first to get more testing exposure, then switch it off before 22.2 if we see any abnormalities or suspicious slowdowns in tests.

craig bot pushed a commit that referenced this issue May 2, 2022
80761: stats: enable optimizer stats use on system tables for query planning r=msirek a=msirek

Informs #80123

Previously, statistics could be collected on system tables, but their
use in the optimizer for planning queries was disabled.

This was inadequate because without stats, full table scan might be
chosen by the optimizer for queries involving system tables in cases
where index access is actually cheaper. Also, it is confusing that one
could ANALYZE a system table, but the collected statistics would not be
used when querying the table, as shown in the EXPLAIN output.

To address this, this patch enables stats usage on all system tables
except system.table_statistics and system.lease, as doing so may cause
hangs. A new tenant-writable cluster setting is added,
`sql.stats.system_tables.enabled`, which when true allows statistics on
system tables to be used by the optimizer for costing and planning
queries. The default value of this setting is `true`. Collection of
statistics on `system.table_statistics` and `system.lease` has been
disabled because it would be confusing to allow stats to be collected
that would never be used by the optimizer.

Release note: none

Co-authored-by: Mark Sirek <[email protected]>
@rytaft
Copy link
Collaborator

rytaft commented May 2, 2022

Sorry for joining the discussion late...

Or we could default it to on at first to get more testing exposure, then switch it off before 22.2 if we see any abnormalities or suspicious slowdowns in tests.

Yes, I'd be in favor of enabling this now, and we can disable it later if need be.

msirek pushed a commit to msirek/cockroach that referenced this issue May 2, 2022
Fixes cockroachdb#80123

Previously, mutations to system tables did not trigger automatic
collection of statistics to influence optimizer costs and plan
selection.

This was inadequate because system tables are being used in
increasingly sophisticated ways in queries, most notably around driving
subsystems in CRDB, requiring avoidance of full table scans. Manual
collection of stats on system tables is not sufficient to meet
requirements as system tables are driven by automatic processes/jobs.

To address this, this patch enables auto stats collection on system
tables by default, which can be disabled by setting new cluster setting
`sql.stats.system_tables_autostats.enabled` to false. Auto stats are
always disabled on `system.lease` and `system.table_statistics`, no
matter to value of the cluster setting, because they could potentially
cause hangs if enabled.

Release note: none
msirek pushed a commit to msirek/cockroach that referenced this issue May 3, 2022
Fixes cockroachdb#80123

Previously, mutations to system tables did not trigger automatic
collection of statistics to influence optimizer costs and plan
selection.

This was inadequate because system tables are being used in
increasingly sophisticated ways in queries, most notably around driving
subsystems in CRDB, requiring avoidance of full table scans. Manual
collection of stats on system tables is not sufficient to meet
requirements as system tables are driven by automatic processes/jobs.

To address this, this patch enables auto stats collection on system
tables by default, which can be disabled by setting new cluster setting
`sql.stats.system_tables_autostats.enabled` to false. Auto stats are
always disabled on `system.lease` and `system.table_statistics`, no
matter the value of the cluster setting, because they could potentially
cause hangs if enabled.

Release note: none
msirek pushed a commit to msirek/cockroach that referenced this issue May 4, 2022
Fixes cockroachdb#80123

Previously, mutations to system tables did not trigger automatic
collection of statistics to influence optimizer costs and plan
selection.

This was inadequate because system tables are being used in
increasingly sophisticated ways in queries, most notably around driving
subsystems in CRDB, requiring avoidance of full table scans. Manual
collection of stats on system tables is not sufficient to meet
requirements as system tables are driven by automatic processes/jobs.

To address this, this patch enables auto stats collection on system
tables by default, which can be disabled by setting new cluster setting
`sql.stats.system_tables_autostats.enabled` to false. Auto stats are
always disabled on `system.lease`, `system.table_statistics`,
`system.jobs` and `system.scheduled_jobs`, no matter the value of the
cluster setting. Autostats on the first two tables could potentially
cause hangs, and autostats on the last two tables could potentially
impact system performance.

Release note: none
msirek pushed a commit to msirek/cockroach that referenced this issue May 4, 2022
Fixes cockroachdb#80123

Previously, mutations to system tables did not trigger automatic
collection of statistics to influence optimizer costs and plan
selection.

This was inadequate because system tables are being used in
increasingly sophisticated ways in queries, most notably around driving
subsystems in CRDB, requiring avoidance of full table scans. Manual
collection of stats on system tables is not sufficient to meet
requirements as system tables are driven by automatic processes/jobs.

To address this, this patch enables auto stats collection on system
tables by default, which can be disabled by setting new cluster setting
`sql.stats.system_tables_autostats.enabled` to false. Auto stats are
always disabled on `system.lease`, `system.table_statistics`,
`system.jobs` and `system.scheduled_jobs`, no matter the value of the
cluster setting. Autostats on the first two tables could potentially
cause hangs, and autostats on the last two tables could potentially
impact system performance.

Release note: none
craig bot pushed a commit that referenced this issue May 5, 2022
80887: stats: enable auto stats collection on system tables r=msirek a=msirek

Fixes #80123

Previously, mutations to system tables did not trigger automatic
collection of statistics to influence optimizer costs and plan
selection.

This was inadequate because system tables are being used in
increasingly sophisticated ways in queries, most notably around driving
subsystems in CRDB, requiring avoidance of full table scans. Manual
collection of stats on system tables is not sufficient to meet
requirements as system tables are driven by automatic processes/jobs.

To address this, this patch enables auto stats collection on system
tables by default, which can be disabled by setting new cluster setting
`sql.stats.system_tables_autostats.enabled` to false. Auto stats are
always disabled on `system.lease`, `system.table_statistics`, 
`system.jobs` and `system.scheduled_jobs`, no matter the value of the 
cluster setting. Autostats on the first two tables could potentially 
cause hangs, and autostats on the last two tables could potentially   
impact system performance.

Release note: none


Co-authored-by: Mark Sirek <[email protected]>
@craig craig bot closed this as completed in #80887 May 5, 2022
@craig craig bot closed this as completed in e811eb6 May 5, 2022
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants