Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: use histograms for inverted JSON/ARRAY scan statistics #59326

Merged
merged 3 commits into from
Jan 26, 2021

Conversation

mgartner
Copy link
Collaborator

memo: simplify statisticsBuilder.constrainScan

A ScanExpr, which includes an InvertedConstraint, is always passed
to statisticsBuilder.constrainScan, therefore there is no need to pass
the InvertedConstraint separately.

Release note: None

memo: rename multi-column-inverted-geo test file to inverted-geo-multi-column

Release note: None

opt: use histograms for inverted JSON/ARRAY scan statistics

Fixes #56870

Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.

@mgartner mgartner requested a review from a team as a code owner January 22, 2021 23:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author


pkg/sql/opt/xform/testdata/rules/select, line 1755 at r3 (raw file):

# TODO(rytaft): ensure that the stats prevent us from using the inverted index.
opt
SELECT * FROM b WHERE j @> '{}'

I removed these tests, because I think they're not necessary anymore. They seem more stats-related than xform-related to me. I can add them back in, but if we want to prevent a full inverted index scan for the test I'll have to inject some histogram statistics.

I haven't bothered with trying to make the statistics builder smarter about the j @> '{}' expression without histograms because it seems like an uncommon case. I'm assuming we don't care too much about cases where histograms don't exist, but correct me if that's wrong.

A `ScanExpr`, which includes an `InvertedConstraint`, is always passed
to `statisticsBuilder.constrainScan`, therefore there is no need to pass
the `InvertedConstraint` separately.

Release note: None
@mgartner mgartner force-pushed the full-scan-inverted-fix branch from 11a6a69 to b245f8d Compare January 23, 2021 00:50
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Awesome that this didn't require a lot of changes.

Reviewed 9 of 9 files at r4, 1 of 1 files at r5, 8 of 8 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)


pkg/sql/opt/memo/testdata/stats/inverted-json, line 17 at r6 (raw file):

#   - 10 empty arrays
#   - 990 arrays encoded into 1110 index entries
#   - 10 empty object

[nit] objects


pkg/sql/opt/memo/testdata/stats/inverted-json, line 26 at r6 (raw file):

    "created_at": "2018-01-01 1:00:00.00000+00:00",
    "row_count": 2000,
    "distinct_count": 3,

should this be ~10?


pkg/sql/opt/memo/testdata/stats/inverted-json, line 263 at r6 (raw file):

    "created_at": "2018-01-01 1:00:00.00000+00:00",
    "row_count": 2000,
    "distinct_count": 3,

should this be 5?


pkg/sql/opt/memo/testdata/stats/inverted-json, line 328 at r6 (raw file):

           │         ├── ["7\x00\x019", "7\x00\x019"]
           │         └── ["7\x00\xff", "8")
           ├── stats: [rows=2e-07, distinct(1)=1.99999931e-07, null(1)=0, distinct(4)=2e-07, null(4)=0]

Why is the number of rows so tiny? Shouldn't this be 100?


pkg/sql/opt/memo/testdata/stats/inverted-json, line 359 at r6 (raw file):

           │         ├── ["7\x00\x018", "7\x00\x018"]
           │         └── ["7\x00\x03", "7\x00\x03"]
           ├── stats: [rows=2e-07, distinct(1)=1.99999931e-07, null(1)=0, distinct(4)=2e-07, null(4)=0]

ditto


pkg/sql/opt/xform/testdata/rules/select, line 1755 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I removed these tests, because I think they're not necessary anymore. They seem more stats-related than xform-related to me. I can add them back in, but if we want to prevent a full inverted index scan for the test I'll have to inject some histogram statistics.

I haven't bothered with trying to make the statistics builder smarter about the j @> '{}' expression without histograms because it seems like an uncommon case. I'm assuming we don't care too much about cases where histograms don't exist, but correct me if that's wrong.

Sounds good

Fixes cockroachdb#56870

Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/memo/testdata/stats/inverted-json, line 17 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] objects

Done.


pkg/sql/opt/memo/testdata/stats/inverted-json, line 26 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this be ~10?

Good catch. There's no exact number of distinct values based on the histograms, but there some minimum. I've set it to 10.


pkg/sql/opt/memo/testdata/stats/inverted-json, line 263 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

should this be 5?

Done.


pkg/sql/opt/memo/testdata/stats/inverted-json, line 328 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Why is the number of rows so tiny? Shouldn't this be 100?

Good catch. The histogram buckets were no in order, which caused problems. Fixed.


pkg/sql/opt/memo/testdata/stats/inverted-json, line 359 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto

Done.

@mgartner mgartner force-pushed the full-scan-inverted-fix branch from b245f8d to a0f98f2 Compare January 25, 2021 20:21
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)

@mgartner
Copy link
Collaborator Author

TFTR!

bors r=rytaft

@craig
Copy link
Contributor

craig bot commented Jan 26, 2021

Build succeeded:

@craig craig bot merged commit d86781c into cockroachdb:master Jan 26, 2021
@mgartner mgartner deleted the full-scan-inverted-fix branch January 26, 2021 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: ensure statistics builder correctly estimates selectivity of b @> '[]' and b @> '{}'
3 participants