Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: table setting to turn auto stats collection on/off #78110

Merged
merged 1 commit into from
May 3, 2022

Conversation

msirek
Copy link
Contributor

@msirek msirek commented Mar 18, 2022

Fixes #40989

Previously, there was no way to enable or disable automatic statistics
collection at the table level. It could only be turned on or off via the
sql.stats.automatic_collection.enabled cluster setting.

This was inadequate because statistics collection can be expensive for
large tables, and it would be desirable to defer collection until after
data is finished loading, or in off hours. Also, small tables which are
frequently updated may trigger statistics collection leading to
unnecessary overhead and/or unpredictable query plan changes.

To address this, this patch adds support for setting of the following
cluster settings at the table level:

sql_stats_automatic_collection_enabled
sql_stats_automatic_collection_min_stale_rows
sql_stats_automatic_collection_fraction_stale_rows

which correspond to the similarly-named cluster settings:

sql.stats.automatic_collection.enabled
sql.stats.automatic_collection.min_stale_rows
sql.stats.automatic_collection.fraction_stale_rows

for example:

ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = true);
ALTER TABLE t1
      SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1,
           sql_stats_automatic_collection_min_stale_rows = 2000);

The table-level setting takes precedence over the cluster setting.

Release justification: Low risk fix for missing fine-grained control
over automatic statistics collection.

Release note (sql change): Automatic statistics collection can now be
enabled or disabled for individual tables, taking precedence over the
cluster setting, for example:

ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = true);
ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = false);
ALTER TABLE t1 RESET (sql_stats_automatic_collection_enabled);

RESET removes the setting value entirely, in which case the
similarly-name cluster setting,
sql.stats.automatic_collection.enabled, is in effect for the table.

Cluster settings sql.stats.automatic_collection.fraction_stale_rows
and sql.stats.automatic_collection.min_stale_rows now also have table
setting counterparts:

sql_stats_automatic_collection_fraction_stale_rows
sql_stats_automatic_collection_min_stale_rows

The table settings may be set at table creation time, or later via ALTER
TABLE ... SET, independent of whether auto stats is enabled:

ALTER TABLE t1
      SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1,
           sql_stats_automatic_collection_min_stale_rows = 2000);
CREATE TABLE t1 (a INT, b INT)
       WITH (sql_stats_automatic_collection_enabled = true,
             sql_stats_automatic_collection_min_stale_rows = 1000000,
             sql_stats_automatic_collection_fraction_stale_rows= 0.05
             );

The current table settings (storage parameters) are shown in the WITH
clause output of SHOW CREATE TABLE.
Note, any row mutations which have occurred a minute or two before
disabling auto stats collection via ALTER TABLE ... SET may trigger
stats collection, though DML submitted after the setting change will
not.

@msirek msirek requested review from rytaft, rharding6373 and a team March 18, 2022 23:20
@msirek msirek requested a review from a team as a code owner March 18, 2022 23:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msirek msirek force-pushed the 9059 branch 2 times, most recently from 53e335a to 0943279 Compare March 19, 2022 05:58
@msirek msirek requested a review from vy-ton March 20, 2022 19:24
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the parts of this change which directly apply to SQL Schema.

Do you intend to backport this to 22.1? I ask because this commit changes the descriptor protobufs. cc @ajwerner

Reviewed 4 of 10 files at r1, 3 of 6 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @rharding6373, @rytaft, and @vy-ton)


pkg/sql/catalog/catpb/catalog.proto, line 240 at r2 (raw file):

// TableLevelSettings represents cluster or session settings specified at the
// table level. Each SettingValue is nullable so queries of the descriptor in
// JSON form only list values which have been set.

Assuming that we can't get by with primitive values here, I'd much prefer that you define a nested message for each setting in this message, rather than define some generic Int or String box message at the top level


pkg/sql/catalog/catpb/catalog.proto, line 241 at r2 (raw file):

// table level. Each SettingValue is nullable so queries of the descriptor in
// JSON form only list values which have been set.
message TableLevelSettings {

Technically, RowLevelTTL are also table-level settings. Perhaps it would be preferable to rename this message type (and its corresponding field in the table descriptor) to TableAutomaticStatsCollectionSettings or something?


pkg/sql/catalog/descpb/structured.go, line 281 at r2 (raw file):

// true, auto stats are on for this table. If ok is false, there is no setting
// for this table.
func (desc *TableDescriptor) AutoStatsCollectionEnabled() (enabled bool, ok bool) {

Do we really need two boolean return values here?


pkg/sql/catalog/descpb/structured.go, line 317 at r2 (raw file):

	}
	return desc.TableLevelSettings.SqlStatsAutomaticCollectionFractionStaleRows.Value, true
}

Can you define these methods in the catalog.TableDescriptor interface please?


pkg/sql/catalog/tabledesc/structured.go, line 2573 at r2 (raw file):

	}
	if desc.TableLevelSettings != nil {
		settings := desc.TableLevelSettings

nit: if settings := desc.TableLevelSettings; settings != nil { would be more idiomatic


pkg/sql/catalog/tabledesc/structured.go, line 2578 at r2 (raw file):

			value := settings.SqlStatsAutomaticCollectionEnabled.Value
			appendStorageParam(`"sql.stats.automatic_collection.enabled"`,
				boolAsString(value))

fmt.Sprintf("%v", value) aught to do the trick here


pkg/sql/catalog/tabledesc/validate.go, line 1510 at r2 (raw file):

// validateTableLevelSettings validates that any new table-level settings hold
// a valid value.
func (desc *wrapper) validateTableLevelSettings() error {

This isn't obvious when you look a lot of this validation code, but the preferred pattern is to call vea.Report on errors as much as possible instead of returning on the first error, to report as many errors as possible.


pkg/sql/catalog/tabledesc/validate.go, line 1511 at r2 (raw file):

// a valid value.
func (desc *wrapper) validateTableLevelSettings() error {
	if desc.TableLevelSettings != nil {

In situations like these consider returning early to avoid large blocks and lots of nesting. Alternatively, hoist this nil-check to the call site.


pkg/sql/catalog/tabledesc/validate.go, line 1535 at r2 (raw file):

		}
		if desc.TableLevelSettings.SqlStatsAutomaticCollectionFractionStaleRows != nil {
			setting := "sql.stats.automatic_collection.fraction_stale_rows"

nit: consider defining these strings as global consts in the stats package.


pkg/sql/catalog/tabledesc/validate_test.go, line 134 at r2 (raw file):

			"NextConstraintID":              {status: iSolemnlySwearThisFieldIsValidated},
			"DeclarativeSchemaChangerState": {status: iSolemnlySwearThisFieldIsValidated},
			"TableLevelSettings":            {status: iSolemnlySwearThisFieldIsValidated},

Can you also throw in a couple of unit test cases that exercise at least some of this validation logic? Let that not distract from the fact that I'm grateful that you did add validation logic, it's something that's easy to overlook.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing stats to be enabled at the table-level is going to be useful. Thanks for this! Nice work so far.

Reviewed 2 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rytaft, and @vy-ton)


pkg/sql/catalog/catpb/catalog.proto, line 218 at r2 (raw file):

}

// Create separate messages for setting values so we may access the

Maybe I'm just not following this...why does this mean that we can't use primitive values to represent these settings? E.g., it looks like view_query (also accessed in the SQL in automatic_stats.go) is an optional string.


pkg/sql/catalog/descpb/structured.go, line 281 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Do we really need two boolean return values here?

+1, I think one boolean is fine with true for enabled and false for disabled/no setting.


pkg/sql/catalog/descpb/structured.go, line 288 at r2 (raw file):

		return false, false
	}
	return desc.TableLevelSettings.SqlStatsAutomaticCollectionEnabled.Value, true

Should we also consider whether auto stats is enabled overall?


pkg/sql/catalog/descpb/structured.go, line 296 at r2 (raw file):

// been set at the table level.
func (desc *TableDescriptor) AutoStatsMinStaleRows() (minStaleRows int64, ok bool) {
	if desc.TableLevelSettings == nil {

Could dedupe some code with AutoStatsCollectionEnabled


pkg/sql/catalog/descpb/structured.go, line 310 at r2 (raw file):

// not been set at the table level.
func (desc *TableDescriptor) AutoStatsFractionStaleRows() (fractionStaleRows float64, ok bool) {
	if desc.TableLevelSettings == nil {

Could dedupe here, too


pkg/sql/catalog/descpb/structured.proto, line 1201 at r2 (raw file):

    (gogoproto.customname) = "NextConstraintID", (gogoproto.casttype) = "ConstraintID"];

  // Next ID: 51

Keep comment and increment id.


pkg/sql/catalog/tabledesc/structured.go, line 2588 at r2 (raw file):

			value := settings.SqlStatsAutomaticCollectionFractionStaleRows.Value
			appendStorageParam(`"sql.stats.automatic_collection.fraction_stale_rows"`,
				fmt.Sprintf("%g", value)) //strconv.FormatFloat(value, 10))

nit: comment seems unnecessary


pkg/sql/paramparse/paramobserver.go, line 709 at r2 (raw file):

func settingValuePointer(settingName string, tableSettings *catpb.TableLevelSettings) interface{} {
	if idx, ok := tableSettingsDict[settingName]; ok {

After taking Marius's suggestion of adding the setting names as const strings, you could get rid of the tableSettingsDict, which will make this function easier to read.

Copy link
Contributor

@vy-ton vy-ton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing myself as an assigned reviewer. If there are specific product questions, please @me.

@msirek msirek requested a review from a team as a code owner March 22, 2022 23:11
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@postamar , @rharding6373 Thanks for the great comments!

@postamar It's possible we'd want to backport it to 22.1.1, but I don't know for sure. Would that mean the protobuf changes would need to be merged to 21.1.0? I believe @rytaft will be talking to the SQL Schema team about this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar, @rharding6373, and @rytaft)


pkg/sql/catalog/catpb/catalog.proto, line 218 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Maybe I'm just not following this...why does this mean that we can't use primitive values to represent these settings? E.g., it looks like view_query (also accessed in the SQL in automatic_stats.go) is an optional string.

When I tested this previously, using the nullable primitive directly, for some reason the boolean conditional to check the value wasn't evaluating properly. I just tried it again, and it's working, so I'm not sure what is different this time. I agree is preferable to use the primitive directly, so I've made this change.


pkg/sql/catalog/catpb/catalog.proto, line 240 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Assuming that we can't get by with primitive values here, I'd much prefer that you define a nested message for each setting in this message, rather than define some generic Int or String box message at the top level

Thanks. I was able to get this working with primitive values.


pkg/sql/catalog/catpb/catalog.proto, line 241 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Technically, RowLevelTTL are also table-level settings. Perhaps it would be preferable to rename this message type (and its corresponding field in the table descriptor) to TableAutomaticStatsCollectionSettings or something?

Good point. What I'm trying to do here is group together all cluster settings that are also applicable as table-level settings, which we could add to in the future. I've renamed this to ClusterSettingsForTable.


pkg/sql/catalog/descpb/structured.go, line 281 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

+1, I think one boolean is fine with true for enabled and false for disabled/no setting.

We have to distinguish between disabled and no setting because disabled overrides an enabled setting at the cluster level (for the given table), whereas no setting does not. We pass the table-level setting out, then the caller can see if a value is set, and use that to override the cluster setting.

Code quote:

func (desc *TableDescriptor) AutoStatsCollectionEnabled() (enabled bool, ok bool) {

pkg/sql/catalog/descpb/structured.go, line 288 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Should we also consider whether auto stats is enabled overall?

The intent here is to just return the table-level setting then let the caller decide how to use that value (in this case, use the table-level setting, if available, and only if not set defer to the cluster-level setting). The current design of auto stats tries to continue or break out of the main loop if the cluster setting value has changed since the mutations were recorded:
https://gist.github.com/msirek/c2cfae90384fa4a398e8ad727dda5e59
Combining the cluster setting and the table-level setting here would write the result into the Refresher.autoStatsEnabled map when the mutations are recorded and then if we didn't check the cluster setting again in this loop there would be some lag time in the cluster setting taking effect.


pkg/sql/catalog/descpb/structured.go, line 296 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Could dedupe some code with AutoStatsCollectionEnabled

Done.

Code quote:

AutoStatsMinStaleRows

pkg/sql/catalog/descpb/structured.go, line 310 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Could dedupe here, too

Done


pkg/sql/catalog/descpb/structured.go, line 317 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Can you define these methods in the catalog.TableDescriptor interface please?

Done.

Code quote:

AutoStatsFractionStaleRows()

pkg/sql/catalog/descpb/structured.proto, line 1201 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

Keep comment and increment id.

Done.

Code quote:

// Next ID: 51

pkg/sql/catalog/tabledesc/structured.go, line 2573 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: if settings := desc.TableLevelSettings; settings != nil { would be more idiomatic

Done.

Code quote:

appendStorageParam(`exclude_data_from_backup`, `true`)

pkg/sql/catalog/tabledesc/structured.go, line 2578 at r2 (raw file):

fmt.Sprintf("%v", value)

Thanks! Modified as suggested.


pkg/sql/catalog/tabledesc/structured.go, line 2588 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit: comment seems unnecessary

Removed it, thanks.


pkg/sql/catalog/tabledesc/validate.go, line 1510 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

This isn't obvious when you look a lot of this validation code, but the preferred pattern is to call vea.Report on errors as much as possible instead of returning on the first error, to report as many errors as possible.

Thanks, modified it to pass vea.Report as a parameter and report on all errors.


pkg/sql/catalog/tabledesc/validate.go, line 1511 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

In situations like these consider returning early to avoid large blocks and lots of nesting. Alternatively, hoist this nil-check to the call site.

Yes, this looks better. I've modified it, and also decomposed the validations into functions to de-dup some code.


pkg/sql/catalog/tabledesc/validate.go, line 1535 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: consider defining these strings as global consts in the stats package.

Done. This caused import cycles if defined in the stats, sql or opt packages. I finally settled on defining them in the cluster package.


pkg/sql/paramparse/paramobserver.go, line 709 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

After taking Marius's suggestion of adding the setting names as const strings, you could get rid of the tableSettingsDict, which will make this function easier to read.

Yeah, tableSettingsDict was only useful for an older version of the code. Removed it.


pkg/sql/catalog/tabledesc/validate_test.go, line 134 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Can you also throw in a couple of unit test cases that exercise at least some of this validation logic? Let that not distract from the fact that I'm grateful that you did add validation logic, it's something that's easy to overlook.

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! It sounds like people are generally on board with merging this in 22.1.0. Seems best not to wait until 22.1.1 because of the descriptor changes etc. One option is to split the descriptor/protobuf changes out into a separate commit and backport only those changes for 22.1.0 and do the others later, but after reviewing the PR I think I'd rather just keep this all together. The remaining changes are pretty minimal.

Reviewed 2 of 10 files at r1, 3 of 14 files at r3, 11 of 11 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, and @rharding6373)


-- commits, line 58 at r4:
nit: some of these lines have tabs and some don't


-- commits, line 71 at r4:
Looks like this line is copy pasta. Can you just remove it?


pkg/sql/catalog/tabledesc/validate.go, line 1548 at r4 (raw file):

		desc.verifyProperTableForStatsSetting(errReportFn, settingName)
		if err :=
			validateFunc(*value); err != nil {

nit: this can all go on one line


pkg/sql/catalog/tabledesc/validate.go, line 1562 at r4 (raw file):

	if value != nil {
		desc.verifyProperTableForStatsSetting(errReportFn, settingName)
		if err :=

ditto


pkg/sql/stats/automatic_stats.go, line 213 at r4 (raw file):

	// autoStatsEnabled is true if auto stats collection is explicitly enabled for
	// the given table, which may override the cluster setting.

nit: mention it could also be explicitly disabled


pkg/sql/stats/automatic_stats.go, line 233 at r4 (raw file):

	autoStatsEnabled           *bool
	autoStatsFractionStaleRows *float64
	autoStatsMinStaleRows      *int64

What's the benefit of including these settings in this struct? I think it would be better to check them only when needed.


pkg/sql/stats/automatic_stats.go, line 258 at r4 (raw file):

		autoStatsEnabled:           make(map[descpb.ID]bool),
		autoStatsFractionStaleRows: make(map[descpb.ID]float64),
		autoStatsMinStaleRows:      make(map[descpb.ID]int64),

I'm not really convinced we should be storing these values at all -- we should probably just be grabbing the value from the descriptor when it's needed


pkg/sql/stats/automatic_stats.go, line 391 at r4 (raw file):

		getTablesWithAutoStatsExplicitlyEnabledQuery := fmt.Sprintf(
			`
SELECT

Do you have some tests that exercise this query?


pkg/sql/stats/automatic_stats.go, line 438 at r4 (raw file):

			// call to this method).
			log.Errorf(ctx, "failed to get tables for automatic stats: %v", err)
		}

nit: this feels like a lot of code duplication with what's below. Not sure if it's possible to remove, but maybe there's an opportunity to avoid it?


pkg/sql/stats/automatic_stats.go, line 529 at r4 (raw file):

	if fractionStaleRows, ok := table.TableDesc().AutoStatsFractionStaleRows(); ok {
		fractionStaleRowsForTable = &fractionStaleRows
	}

NotifyMutation is a hot path. We don't need these settings here, so I'd rather not get them until later


pkg/sql/catalog/tabledesc/validate_test.go, line 2207 at r4 (raw file):

		// Views.
		{ // 10
			err: ``,

you shouldn't need to set this -- empty string is the default in go


pkg/sql/catalog/tabledesc/validate_test.go, line 2300 at r4 (raw file):

		},
		{ // 13
			err:      ``,

ditto

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rharding6373, and @rytaft)


-- commits, line 58 at r4:

Previously, rytaft (Rebecca Taft) wrote…

nit: some of these lines have tabs and some don't

Removed the tabs.


-- commits, line 71 at r4:

Previously, rytaft (Rebecca Taft) wrote…

Looks like this line is copy pasta. Can you just remove it?

Done.


pkg/sql/catalog/tabledesc/validate.go, line 1548 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: this can all go on one line

Fixed

Code quote:

desc.verifyProperTableForStatsSetting(errReportFn, settingName)

pkg/sql/catalog/tabledesc/validate.go, line 1562 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto

Fixed


pkg/sql/stats/automatic_stats.go, line 213 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: mention it could also be explicitly disabled

Added.

Code quote:

autoStatsEnabled is true if auto stats collection is explicitly enabled

pkg/sql/stats/automatic_stats.go, line 233 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

What's the benefit of including these settings in this struct? I think it would be better to check them only when needed.

The benefit is that if we don't save this information in mutation, maybeRefreshStats would have to look up the table descriptor to get the table-level settings. This information is already available in NotifyMutation.


pkg/sql/stats/automatic_stats.go, line 258 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'm not really convinced we should be storing these values at all -- we should probably just be grabbing the value from the descriptor when it's needed

What do you think the overhead would be of looking up the table descriptor when each mutation is processed in maybeRefreshStats? That's the only thing I was worried about, extra overhead. The trade-off here is that we're using more memory to create hash maps. So far I'm having trouble locating a function that can look up a TableDescriptor using a TableID. Please let me know if there is one. If the TableDescriptor can't be looked up in maybeRefreshStats, the only way I know of to get this information is within NotifyMutation, and then save it away somewhere for later processing.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go, line 258 at r4 (raw file):

Previously, msirek (Mark Sirek) wrote…

What do you think the overhead would be of looking up the table descriptor when each mutation is processed in maybeRefreshStats? That's the only thing I was worried about, extra overhead. The trade-off here is that we're using more memory to create hash maps. So far I'm having trouble locating a function that can look up a TableDescriptor using a TableID. Please let me know if there is one. If the TableDescriptor can't be looked up in maybeRefreshStats, the only way I know of to get this information is within NotifyMutation, and then save it away somewhere for later processing.

Actually we wouldn't have to look up the TableDescriptor on every mutation, just when maybeRefreshStats gets called. I'm still wondering if this could cause noticeable overhead.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rharding6373, and @rytaft)


-- commits, line 58 at r4:

Previously, msirek (Mark Sirek) wrote…

Removed the tabs.

Doesn't look like you pushed the latest update


pkg/sql/stats/automatic_stats.go, line 258 at r4 (raw file):

Previously, msirek (Mark Sirek) wrote…

Actually we wouldn't have to look up the TableDescriptor on every mutation, just when maybeRefreshStats gets called. I'm still wondering if this could cause noticeable overhead.

Right -- NotifyMutation is called much more frequently than maybeRefreshStats. I think maybeRefreshStats is called at most once per minute per table. But that's a good point that the table descriptor isn't readily available... maybe the schema team knows a good way to get it?

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rharding6373, and @rytaft)


-- commits, line 58 at r4:

Previously, rytaft (Rebecca Taft) wrote…

Doesn't look like you pushed the latest update

Yeah, I'm in the middle of changes, but wanted to get my comment posted about the table descriptor.


pkg/sql/stats/automatic_stats.go, line 258 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Right -- NotifyMutation is called much more frequently than maybeRefreshStats. I think maybeRefreshStats is called at most once per minute per table. But that's a good point that the table descriptor isn't readily available... maybe the schema team knows a good way to get it?

Thanks, the schema team told me how to get it. Adding the code now.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 10 files at r1, 3 of 14 files at r3, 11 of 11 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @postamar, @rharding6373, and @rytaft)


-- commits, line 74 at r4:
It's very useful to know how to view settings for a table, but I don't think a crdb_internal function belongs in a release note - they are generally not meant for users. A statement like SHOW SETTING FOR TABLE t would certainly be nice. Do you mind creating an issue to add a statement? cc @vy-ton for visibility.


pkg/settings/cluster/cluster_settings.go, line 84 at r4 (raw file):

	// AutoStatsClusterSettingName is the name of the automatic stats collection
	// enabled cluster setting.
	AutoStatsClusterSettingName = "sql.stats.automatic_collection.enabled"

nit: cluster.AutoStatsClusterSettingName is redundant. Is AutoStatsEnabledSettingName a better name?


pkg/sql/catalog/catpb/catalog.proto, line 231 at r4 (raw file):

}

nit: remove blank line


pkg/sql/catalog/descpb/structured.go, line 281 at r2 (raw file):

Previously, msirek (Mark Sirek) wrote…

We have to distinguish between disabled and no setting because disabled overrides an enabled setting at the cluster level (for the given table), whereas no setting does not. We pass the table-level setting out, then the caller can see if a value is set, and use that to override the cluster setting.

You could return an enum representing the three states. tree.Nullability is a good example:

// Nullability represents either NULL, NOT NULL or an unspecified value (silent
// NULL).
type Nullability int
// The values for NullType.
const (
NotNull Nullability = iota
Null
SilentNull
)


pkg/sql/catalog/tabledesc/structured.go, line 2606 at r4 (raw file):

// NoClusterSettingsForTable implements the TableDescriptor interface.
func (desc *wrapper) NoClusterSettingsForTable() bool {

nit: the name of this is confusing because a "cluster setting" and "table setting" are two different concepts. Maybe NoClusterSettingOverrides is better?


pkg/sql/paramparse/paramobserver.go, line 592 at r4 (raw file):

	}
	*setting = boolVal
	return nil

This generalization adds a lot of complexity that's hard to follow. If the double pointers and such can't be simplified, I'd prefer declaring the onSet and onReset functions directly as anonymous functions, even if there is some repetition:

cluster.AutoStatsClusterSettingName: {
  onSet: func (ctx context.Context, po *TableStorageParamObserver, semaCtx *tree.SemaCtx,
    evalCtx *tree.EvalCtx, key string, datum tree.Datum) error {
      boolVal, err := boolFromDatum(evalCtx, key, datum)
      if err != nil {
        return err
      }
      if po.tableDesc.ClusterSettingsForTable == nil {
        po.tableDesc.ClusterSettingsForTable = &catpb.ClusterSettingsForTable{}
      }
    po.tableDesc.ClustSettingsForTable.SqlStatsAutomaticCollectionEnabled = &boolVal
    return nil
  },
},
// ...

pkg/sql/stats/automatic_stats.go, line 321 at r4 (raw file):

								// remaining mutations in the autoStatsEnabledForTable map.
								if len(autoStatsEnabledForTable) == 0 {
									break

nit: reduce nested blocks by moving this check into its own block before the lookup in autoStatsEnabledForTable, like:

if len(autoStatsEnabledForTable) == 0 && !AutomaticStatisticsClusterMode.Get(&r.st.SV) {
    break
}
if enabled, ok := autoStatsEnabledForTable[tableID]; ok {
    // ...
}

pkg/sql/stats/automatic_stats.go, line 437 at r4 (raw file):

			// entry is idempotent (i.e. we didn't mess up anything for the next
			// call to this method).
			log.Errorf(ctx, "failed to get tables for automatic stats: %v", err)

nit: place the if err != nil { } before the successful code path, with an early return. That will eliminate the need for nesting the successful path in if err == nil { }.


pkg/sql/stats/automatic_stats.go, line 613 at r4 (raw file):

	}
	statsMinStaleRows := AutomaticStatisticsMinStaleRows.Get(&r.st.SV)
	if minStaleRowsForTable != nil {

There's quite a few places with a similar check, e.g., "use a table-level setting if set, use the cluster setting if not". It'd be nice to encapsulate this in an API that encapsulates this repetitive logic.


pkg/sql/catalog/tabledesc/validate_test.go, line 2403 at r4 (raw file):

			}
			// Validate other descriptors for specific test cases.
			if test.testName == "Stats setting in view" {

Why do you need this and test.otherErr? Can you instead follow the pattern in the other view test cases and make desc the view and put the foo table in otherDescs?


pkg/sql/stats/automatic_stats_test.go, line 88 at r4 (raw file):

		ctx, s.Stopper(), descA.GetID(), 0 /* rowsAffected */, time.Microsecond, /* asOf */
		false,                                    /* enabledForTable */
		nil /* fractionStaleRowsForTable */, nil, /* minStaleRowsForTable */

nit: put each argument on its own line here an in the same cases below

Copy link
Contributor

@vy-ton vy-ton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @msirek, @postamar, @rharding6373, and @rytaft)


-- commits, line 74 at r4:

Previously, mgartner (Marcus Gartner) wrote…

It's very useful to know how to view settings for a table, but I don't think a crdb_internal function belongs in a release note - they are generally not meant for users. A statement like SHOW SETTING FOR TABLE t would certainly be nice. Do you mind creating an issue to add a statement? cc @vy-ton for visibility.

+1 to filing a future issue. You can see configured storage parameters with SHOW CREATE TABLE and pg_class.

I had a related observation while testing row-level TTL

@mgartner
Copy link
Collaborator


-- commits, line 74 at r4:

but I don't think a crdb_internal function belongs in a release note - they are generally not meant for users

To be more clear: crdb_internal functions are not mean for users, release notes are meant for users.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sounds good.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @postamar, @rharding6373, @rytaft, and @vy-ton)


-- commits, line 74 at r4:

Previously, mgartner (Marcus Gartner) wrote…

but I don't think a crdb_internal function belongs in a release note - they are generally not meant for users

To be more clear: crdb_internal functions are not mean for users, release notes are meant for users.

OK, I've created issue #78395 and removed the query from the release note.


pkg/settings/cluster/cluster_settings.go, line 84 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: cluster.AutoStatsClusterSettingName is redundant. Is AutoStatsEnabledSettingName a better name?

Changed as suggested.


pkg/sql/catalog/catpb/catalog.proto, line 231 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: remove blank line

Done, and also added SessionSettingsForTable, referenced in message TableDescriptor in case we want to set things like optimizer_use_histograms or optimizer_use_multicol_stats at the table level in the future.

Code quote:

ClusterSettingsForTable

pkg/sql/catalog/descpb/structured.go, line 281 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

You could return an enum representing the three states. tree.Nullability is a good example:

// Nullability represents either NULL, NOT NULL or an unspecified value (silent
// NULL).
type Nullability int
// The values for NullType.
const (
NotNull Nullability = iota
Null
SilentNull
)

Thanks. Added cluster.BoolSetting
https://gist.github.com/msirek/bb9963e322d8037ae461b1a6cf0e4482

Code quote:

AutoStatsCollectionEnabled

pkg/sql/catalog/tabledesc/structured.go, line 2606 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: the name of this is confusing because a "cluster setting" and "table setting" are two different concepts. Maybe NoClusterSettingOverrides is better?

Changed to NoClusterSettingOverrides


pkg/sql/paramparse/paramobserver.go, line 592 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This generalization adds a lot of complexity that's hard to follow. If the double pointers and such can't be simplified, I'd prefer declaring the onSet and onReset functions directly as anonymous functions, even if there is some repetition:

cluster.AutoStatsClusterSettingName: {
  onSet: func (ctx context.Context, po *TableStorageParamObserver, semaCtx *tree.SemaCtx,
    evalCtx *tree.EvalCtx, key string, datum tree.Datum) error {
      boolVal, err := boolFromDatum(evalCtx, key, datum)
      if err != nil {
        return err
      }
      if po.tableDesc.ClusterSettingsForTable == nil {
        po.tableDesc.ClusterSettingsForTable = &catpb.ClusterSettingsForTable{}
      }
    po.tableDesc.ClustSettingsForTable.SqlStatsAutomaticCollectionEnabled = &boolVal
    return nil
  },
},
// ...

I agree, the splitting of logic between the two functions was not ideal. I cleaned it up by putting the set/reset logic directly in new functions setBoolValue, setIntValue, setFloatValue and resetValue. I think having generalized functions will make adding new settings easier and less error prone.


pkg/sql/stats/automatic_stats.go, line 321 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: reduce nested blocks by moving this check into its own block before the lookup in autoStatsEnabledForTable, like:

if len(autoStatsEnabledForTable) == 0 && !AutomaticStatisticsClusterMode.Get(&r.st.SV) {
    break
}
if enabled, ok := autoStatsEnabledForTable[tableID]; ok {
    // ...
}

This has been refactored to remove the autoStatsEnabledForTable map.


pkg/sql/stats/automatic_stats.go, line 391 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Do you have some tests that exercise this query?

I had some similar queries in logic tests, but not this exact query. I've added function TestEnsureAllTablesQueries to test it.


pkg/sql/stats/automatic_stats.go, line 437 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: place the if err != nil { } before the successful code path, with an early return. That will eliminate the need for nesting the successful path in if err == nil { }.

This pre-existing code has two places it's checking for errors: on creation of the query iterator and in the row reading loop. The comment indicates it's letting the row reading loop proceed as far as it can error-free, so if we did an early return that would change the current behavior.


pkg/sql/stats/automatic_stats.go, line 438 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: this feels like a lot of code duplication with what's below. Not sure if it's possible to remove, but maybe there's an opportunity to avoid it?

Defined the SQL as a template with a pluggable predicate.


pkg/sql/stats/automatic_stats.go, line 529 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

NotifyMutation is a hot path. We don't need these settings here, so I'd rather not get them until later

Removed.


pkg/sql/stats/automatic_stats.go, line 613 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

There's quite a few places with a similar check, e.g., "use a table-level setting if set, use the cluster setting if not". It'd be nice to encapsulate this in an API that encapsulates this repetitive logic.

Great point. Created new Refresher functions to encapsulate these.


pkg/sql/catalog/tabledesc/validate_test.go, line 2207 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you shouldn't need to set this -- empty string is the default in go

OK, but the test wasn't running properly without it. The value of i was not properly getting set in the main test loop and some of the TableDescriptor fields that had initializations were not seeing those values.


pkg/sql/catalog/tabledesc/validate_test.go, line 2300 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto

Same response as above.


pkg/sql/catalog/tabledesc/validate_test.go, line 2403 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Why do you need this and test.otherErr? Can you instead follow the pattern in the other view test cases and make desc the view and put the foo table in otherDescs?

This was built simliar to cases 10 and 11 which put the view in otherDescs. The main desc does not call ValidateSelf, but does cross reference checks. I was trying to put together a complete set of TableDescriptors, and call ValidateSelf on the view only (because some of these test cases don't pass the ValidateSelf check. In any case, I've gone ahead and moved the TableDescriptor for just the view into TestValidateTableDesc and removed the specialized code in TestValidateCrossTableReferences.

Code quote:

Validate other descriptors for specific test cases.

pkg/sql/stats/automatic_stats_test.go, line 88 at r4 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: put each argument on its own line here an in the same cases below

All of these new parameters are removed in the latest refactor.

@msirek msirek force-pushed the 9059 branch 2 times, most recently from 6124a6f to b8bf1f9 Compare March 24, 2022 12:47
@otan otan removed their request for review April 18, 2022 21:41
@rharding6373 rharding6373 removed their request for review April 25, 2022 18:19
@blathers-crl blathers-crl bot requested a review from otan April 25, 2022 19:41
@@ -225,10 +225,10 @@ message RowLevelTTL {
// only list values which have been set. Protobuf type double is float64 in Go.
message AutoStatsSettings {
option (gogoproto.equal) = true;
// sql.stats.automatic_collection.enabled
// sql_stats_automatic_collection_enabled
optional bool enabled = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments are usually Enabled represents ...

@msirek msirek force-pushed the 9059 branch 2 times, most recently from 69b2cb9 to 7da9360 Compare April 26, 2022 03:31
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I thought there was a remaining performance issue, but it turns out the build is just running some additional maybe_stress and maybe_stressrace tests, which account for the extra time taken. I plan on merging this tomorrow (April 26). Please let me know if anyone has any more review comments coming.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/catalog/catpb/catalog.proto line 229 at r28 (raw file):

Previously, otan (Oliver Tan) wrote…

nit: comments are usually Enabled represents ...

Updated comments to start with the field name.

Code quote:

message AutoStatsSettings {

pkg/sql/logictest/testdata/logic_test/alter_table line 2468 at r27 (raw file):

Previously, ajwerner wrote…

I don't know why it decodes int64 to a string, but not the float64

Json numbers are float64s, so they can safely round-trip by using the number type in json. Int64 might not fit in a float64 in a lossless way, so it gets encoded as a string always.

Thanks for the explanation.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r28, 3 of 3 files at r29, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @msirek, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 402 at r26 (raw file):
Still seems worthwhile to make this change to avoid missing the setting overrides during retries:

remove this line, and copy only the values of r.settingOverrides above that are contained in r.mutationCounts

I'm also fine with doing that in a follow-on PR if you just want to get this merged.

@msirek msirek force-pushed the 9059 branch 2 times, most recently from eb5b63f to 09127aa Compare April 28, 2022 01:56
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 402 at r26 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Still seems worthwhile to make this change to avoid missing the setting overrides during retries:

remove this line, and copy only the values of r.settingOverrides above that are contained in r.mutationCounts

I'm also fine with doing that in a follow-on PR if you just want to get this merged.

OK, I removed the line:

r.settingOverrides = make(map[descpb.ID]catpb.AutoStatsSettings)

... and re-added the code to build a local settingOverrides map.
I realized that settingOverrides may get stale with long-running stats collection jobs, so have added table header lookups in the case where processing a given table mutation count starts more than 1 minute (DefaultRefreshInterval) after mutationCounts processing has started.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 402 at r26 (raw file):

Previously, msirek (Mark Sirek) wrote…

OK, I removed the line:

r.settingOverrides = make(map[descpb.ID]catpb.AutoStatsSettings)

... and re-added the code to build a local settingOverrides map.
I realized that settingOverrides may get stale with long-running stats collection jobs, so have added table header lookups in the case where processing a given table mutation count starts more than 1 minute (DefaultRefreshInterval) after mutationCounts processing has started.

@rytaft When you get the chance, please let me know if you have any comments on the new code to look up table headers if the recorded settings are more than a minute old. Thanks.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r30, 2 of 2 files at r31, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @msirek, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 402 at r26 (raw file):

Previously, msirek (Mark Sirek) wrote…

@rytaft When you get the chance, please let me know if you have any comments on the new code to look up table headers if the recorded settings are more than a minute old. Thanks.

Sorry for the delay!


pkg/sql/stats/automatic_stats.go line 377 at r30 (raw file):

				mutationCounts := r.mutationCounts

				settingOverrides := make(map[descpb.ID]catpb.AutoStatsSettings)

nit: you could wait to allocate this map until it's needed below


pkg/sql/stats/automatic_stats.go line 429 at r30 (raw file):

									useDescriptor = false
								}
							}

This logic seems convoluted. I'd suggest removing the useDescriptor boolean and doing something like this:

now := timeutil.Now()
elapsed := now.Sub(start)
// If a long-running stats collection caused a delay in
// processing the current table longer than the refresh
// interval, look up the table descriptor to ensure we don't
// have stale table settings.
if elapsed > DefaultRefreshInterval {
  desc = r.getTableDescriptor(ctx, tableID)
  if desc != nil {
    if !r.autoStatsEnabled(desc) {
      continue
    }
    settingOverrides[tableID] = desc.GetAutoStatsSettings()  
  }
}

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 402 at r26 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Sorry for the delay!

No worries. Thanks for all the great comments!


pkg/sql/stats/automatic_stats.go line 377 at r30 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: you could wait to allocate this map until it's needed below

Done

Code quote:

settingOverrides := make(map[descpb.ID]catpb.AutoStatsSettings)

pkg/sql/stats/automatic_stats.go line 429 at r30 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This logic seems convoluted. I'd suggest removing the useDescriptor boolean and doing something like this:

now := timeutil.Now()
elapsed := now.Sub(start)
// If a long-running stats collection caused a delay in
// processing the current table longer than the refresh
// interval, look up the table descriptor to ensure we don't
// have stale table settings.
if elapsed > DefaultRefreshInterval {
  desc = r.getTableDescriptor(ctx, tableID)
  if desc != nil {
    if !r.autoStatsEnabled(desc) {
      continue
    }
    settingOverrides[tableID] = desc.GetAutoStatsSettings()  
  }
}

OK, the point about useDescriptor was to avoid recalculating the elapsed time if we have already exceeded the time limit and to avoid calling autoStatsEnabledForTableID twice. I have removed useDescriptor as suggested. I've moved the declaration of desc to within the for loop so I could avoid a duplicate call to autoStatsEnabledForTableID without adding another variable.

@msirek msirek force-pushed the 9059 branch 2 times, most recently from 38b4fce to 5708783 Compare May 3, 2022 02:13
Fixes cockroachdb#40989

Previously, there was no way to enable or disable automatic statistics
collection at the table level. It could only be turned on or off via the
`sql.stats.automatic_collection.enabled` cluster setting.

This was inadequate because statistics collection can be expensive for
large tables, and it would be desirable to defer collection until after
data is finished loading, or in off hours. Also, small tables which are
frequently updated may trigger statistics collection leading to
unnecessary overhead and/or unpredictable query plan changes.

To address this, this patch adds support for setting of the following
cluster settings at the table level:
```
sql_stats_automatic_collection_enabled
sql_stats_automatic_collection_min_stale_rows
sql_stats_automatic_collection_fraction_stale_rows
```
which correspond to the similarly-named cluster settings:
```
sql.stats.automatic_collection.enabled
sql.stats.automatic_collection.min_stale_rows
sql.stats.automatic_collection.fraction_stale_rows
```
for example:
```
ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = true);
ALTER TABLE t1
      SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1,
           sql_stats_automatic_collection_min_stale_rows = 2000);
```
The table-level setting takes precedence over the cluster setting.

Release justification: Low risk fix for missing fine-grained control
over automatic statistics collection.

Release note (sql change): Automatic statistics collection can now be
enabled or disabled for individual tables, taking precedence over the
cluster setting, for example:
```
ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = true);
ALTER TABLE t1 SET (sql_stats_automatic_collection_enabled = false);
ALTER TABLE t1 RESET (sql_stats_automatic_collection_enabled);
```
RESET removes the setting value entirely, in which case the
similarly-name cluster setting,
`sql.stats.automatic_collection.enabled`, is in effect for the table.

Cluster settings `sql.stats.automatic_collection.fraction_stale_rows`
and `sql.stats.automatic_collection.min_stale_rows` now also have table
setting counterparts:
```
sql_stats_automatic_collection_fraction_stale_rows
sql_stats_automatic_collection_min_stale_rows
```
The table settings may be set at table creation time, or later via ALTER
TABLE ... SET, independent of whether auto stats is enabled:
```
ALTER TABLE t1
      SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1,
           sql_stats_automatic_collection_min_stale_rows = 2000);
CREATE TABLE t1 (a INT, b INT)
       WITH (sql_stats_automatic_collection_enabled = true,
             sql_stats_automatic_collection_min_stale_rows = 1000000,
             sql_stats_automatic_collection_fraction_stale_rows= 0.05
             );
```
The current table settings (storage parameters) are shown in the `WITH`
clause output of `SHOW CREATE TABLE`.
Note, any row mutations which have occurred a minute or two before
disabling auto stats collection via `ALTER TABLE ... SET` may trigger
stats collection, though DML submitted after the setting change will
not.
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: ship it!

Reviewed 7 of 7 files at r32, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)


pkg/sql/stats/automatic_stats.go line 429 at r30 (raw file):

Previously, msirek (Mark Sirek) wrote…

OK, the point about useDescriptor was to avoid recalculating the elapsed time if we have already exceeded the time limit and to avoid calling autoStatsEnabledForTableID twice. I have removed useDescriptor as suggested. I've moved the declaration of desc to within the for loop so I could avoid a duplicate call to autoStatsEnabledForTableID without adding another variable.

Thanks, I think this is easier to understand now.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!
bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @ajwerner, @mgartner, @otan, @postamar, @rharding6373, and @rytaft)

@craig
Copy link
Contributor

craig bot commented May 3, 2022

Build succeeded:

@craig craig bot merged commit dd41d39 into cockroachdb:master May 3, 2022
@blathers-crl
Copy link

blathers-crl bot commented May 3, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from f9a102d to blathers/backport-release-22.1-78110: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: control automatic statistics on a per-table basis
9 participants