Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: include partial stats in results of statsCache.GetTableStats #95145

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Jan 12, 2023

We were not including partial stats in the list of table statistics returned by statsCache.GetTableStats. This was fine for the optimizer, which currently cannot use partial stats directly, but it was a problem for backup.

We'd like to use partial stats directly in the optimizer eventually, so this commit goes ahead and adds them to the results of GetTableStats. The optimizer then must filter them out. To streamline this we add some helper functions.

Finally, in an act of overzealous refactoring, this commit also changes MergedStatistics and ForecastTableStatistics to accept partial statistics and full statistics mixed together in the same input list. This simplifies the code that calls these functions.

Fixes: #95056
Part of: #93983

Epic: CRDB-19449

Release note: None

@michae2 michae2 requested review from stevendanna, mgartner, rytaft and a team January 12, 2023 07:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks pretty reasonable to me. I like the various predicate methods you added.

Should we perhaps filter partial stats from backupRestore with a TODO until we fix the insertion of partial stats? Unless I'm mistaken (very possible), I think we probably do the wrong thing with the FullHistogramID on the partial stat.

@michae2
Copy link
Collaborator Author

michae2 commented Jan 13, 2023

Should we perhaps filter partial stats from backupRestore with a TODO until we fix the insertion of partial stats? Unless I'm mistaken (very possible), I think we probably do the wrong thing with the FullHistogramID on the partial stat.

Ah, thanks for reminding me, I forgot to file an issue!

I think we should go ahead and backup and restore them anyway. It won't break anything serious if we do the wrong thing with the FullHistogramID, it just means we won't be able to create those merged stats after restore.

@michae2
Copy link
Collaborator Author

michae2 commented Jan 13, 2023

Should we perhaps filter partial stats from backupRestore with a TODO until we fix the insertion of partial stats? Unless I'm mistaken (very possible), I think we probably do the wrong thing with the FullHistogramID on the partial stat.

Ah, thanks for reminding me, I forgot to file an issue!

Updated #94101 about this.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice cleanup!

Reviewed 10 of 14 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/distsql_plan_stats.go line 230 at r2 (raw file):

		// someone could create a statistic named __forecast__ or __merged__.
		// Update system.table_statistics to add an enum to indicate which
		// type of statistic it is.

nit: update comment now that you've removed the name references


pkg/sql/stats/forecast.go line 67 at r2 (raw file):

// return the same forecasts.
func ForecastTableStatistics(ctx context.Context, observed []*TableStatistic) []*TableStatistic {
	// Early sanity check. We'll check this again in forecastColumnStatistics.

Is this not needed anymore?

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


pkg/sql/distsql_plan_stats.go line 230 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: update comment now that you've removed the name references

I ended up removing it altogether in favor of a comment on #50655.


pkg/sql/stats/forecast.go line 67 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is this not needed anymore?

Yeah, it was only really needed to protect the latest := observed[0].CreatedAt which has become part of the first loop over observed. Now if len(observed) is zero then latest.IsZero() will be true after the first loop, and we will return nil. If len(observed) is less than minObservationsForForecast we will catch it at the beginning of forecastColumnStatistics.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

We were not including partial stats in the list of table statistics
returned by `statsCache.GetTableStats`. This was fine for the optimizer,
which currently cannot use partial stats directly, but it was a problem
for backup.

We'd like to use partial stats directly in the optimizer eventually, so
this commit goes ahead and adds them to the results of `GetTableStats`.
The optimizer then must filter them out. To streamline this we add some
helper functions.

Finally, in an act of overzealous refactoring, this commit also changes
`MergedStatistics` and `ForecastTableStatistics` to accept partial
statistics and full statistics mixed together in the same input list.
This simplifies the code that calls these functions.

Fixes: cockroachdb#95056
Part of: cockroachdb#93983

Epic: CRDB-19449

Release note: None
@michae2 michae2 requested a review from a team as a code owner January 19, 2023 21:47
@michae2
Copy link
Collaborator Author

michae2 commented Jan 20, 2023

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build failed (retrying...):

@craig craig bot merged commit 1b79102 into cockroachdb:master Jan 20, 2023
@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build succeeded:

@michae2 michae2 deleted the 95056 branch January 24, 2023 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql,backupccl: partial stats are not included in backup
4 participants