Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add setting to always use histograms to calculate stats #98194

Merged
merged 1 commit into from
Mar 8, 2023

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Mar 8, 2023

Informs #64570

Release note (sql change): Added a new session setting, optimizer_always_use_histograms, which ensures that the optimizer always uses histograms when available to calculate the statistics of every plan that it explores. Enabling this setting can prevent the optimizer from choosing a suboptimal index when statistics for a table are stale.

@rytaft rytaft requested review from michae2 and DrewKimball March 8, 2023 02:04
@rytaft rytaft requested a review from a team as a code owner March 8, 2023 02:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

@DrewKimball
Copy link
Collaborator

Looks like it just needs some of the session var tests regenerated.

Informs cockroachdb#64570

Release note (sql change): Added a new session setting,
optimizer_always_use_histograms, which ensures that the optimizer
always uses histograms when available to calculate the statistics
of every plan that it explores. Enabling this setting can prevent
the optimizer from choosing a suboptimal index when statistics for
a table are stale.
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

Reviewed 7 of 7 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@michae2
Copy link
Collaborator

michae2 commented Mar 8, 2023

I think this is great, but noting that I think I found a case where this does not fix the problem:

CREATE TABLE abc (a INT, b INT, c STRING, UNIQUE INDEX (a), INDEX (b)) WITH (sql_stats_automatic_collection_enabled = false);
INSERT INTO abc SELECT unordered_unique_rowid(), i, REPEAT('c', 2048) FROM generate_series(0, 32767) s(i);
ANALYZE abc;

INSERT INTO abc VALUES (4000000000000000000, 40000, '');
EXPLAIN (OPT, VERBOSE) SELECT * FROM abc WHERE a = 4000000000000000000 AND b = 40000;

This picks the index on b even with optimizer_always_use_histograms set to true.

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I think I found a case where this does not fix the problem

Yea... this definitely isn't going to fix everything. But seems like it will still fix a lot of things. I'm going to go ahead and merge this and let's keep thinking about ways to chip away at these edge cases... Thanks for finding this one!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

@craig
Copy link
Contributor

craig bot commented Mar 8, 2023

Build succeeded:

@craig craig bot merged commit 329a232 into cockroachdb:master Mar 8, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 8, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from ef1604f to blathers/backport-release-22.1-98194: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from ef1604f to blathers/backport-release-22.2-98194: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Reviewed 7 of 7 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)


pkg/sql/vars.go line 2533 at r2 (raw file):

			return formatBoolAsPostgresSetting(evalCtx.SessionData().OptimizerAlwaysUseHistograms), nil
		},
		GlobalDefault: globalFalse,

@rytaft, would it make sense to enable this by default in v23.1?

@rytaft
Copy link
Collaborator Author

rytaft commented Mar 8, 2023

pkg/sql/vars.go line 2533 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

@rytaft, would it make sense to enable this by default in v23.1?

I was thinking of adding another change which checks whether there are multiple indexes for the table, and always uses the histogram if so, but maybe it would be better to just enable it. I don't think the performance overhead in case of a single index is significant enough that this would be noticeable.

@rytaft
Copy link
Collaborator Author

rytaft commented Mar 8, 2023

pkg/sql/vars.go line 2533 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I was thinking of adding another change which checks whether there are multiple indexes for the table, and always uses the histogram if so, but maybe it would be better to just enable it. I don't think the performance overhead in case of a single index is significant enough that this would be noticeable.

#98248

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants