Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: ensure statistics builder correctly estimates selectivity of b @> '[]' and b @> '{}' #56870

Closed
rytaft opened this issue Nov 18, 2020 · 2 comments · Fixed by #59326
Closed
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@rytaft
Copy link
Collaborator

rytaft commented Nov 18, 2020

Prior to #56732, queries like SELECT * from d where b @> '[]'; and SELECT * from d where b @> '{}'; would cause a full scan of the primary index, even if there was an inverted index on column b.

After #56732, the optimizer now plans an inverted index scan for these queries followed by an inverted filter. This is a very bad plan, however, because b @> '[]' and b @> '{}' cause a full scan of the inverted index. The statistics builder should be taught that this is the case so that the optimizer will never choose an inverted index scan with these predicates.

@rytaft rytaft added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-optimizer SQL logical planning and optimizations. labels Nov 18, 2020
@rytaft rytaft self-assigned this Nov 18, 2020
@mgartner
Copy link
Collaborator

If there are many rows where b is a number, string, boolean, or null, for the b @> '[] filter it could be advantageous to scan the inverted index for all arrays; something like /b: [/Arr - /Arr] [/[] - /[]]. It's probably not a super common case though. And I'm not sure we could do the same for b @> '{}'.

@rytaft
Copy link
Collaborator Author

rytaft commented Nov 19, 2020

Good point! Hopefully our histograms will be able to give us a good idea about how selective these filters actually are for a given dataset. In some cases it might make sense to use the index.

@mgartner mgartner assigned mgartner and unassigned rytaft Jan 22, 2021
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 22, 2021
Fixes cockroachdb#56870

Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 23, 2021
Fixes cockroachdb#56870

Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.
craig bot pushed a commit that referenced this issue Jan 26, 2021
59326: opt: use histograms for inverted JSON/ARRAY scan statistics r=rytaft a=mgartner

#### memo: simplify statisticsBuilder.constrainScan

A `ScanExpr`, which includes an `InvertedConstraint`, is always passed
to `statisticsBuilder.constrainScan`, therefore there is no need to pass
the `InvertedConstraint` separately.

Release note: None

#### memo: rename multi-column-inverted-geo test file to inverted-geo-multi-column

Release note: None

#### opt: use histograms for inverted JSON/ARRAY scan statistics

Fixes #56870

Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.


Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in a0f98f2 Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants