-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: use histograms for inverted JSON/ARRAY scan statistics #59326
Conversation
pkg/sql/opt/xform/testdata/rules/select, line 1755 at r3 (raw file):
I removed these tests, because I think they're not necessary anymore. They seem more stats-related than xform-related to me. I can add them back in, but if we want to prevent a full inverted index scan for the test I'll have to inject some histogram statistics. I haven't bothered with trying to make the statistics builder smarter about the |
A `ScanExpr`, which includes an `InvertedConstraint`, is always passed to `statisticsBuilder.constrainScan`, therefore there is no need to pass the `InvertedConstraint` separately. Release note: None
…i-column Release note: None
11a6a69
to
b245f8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Awesome that this didn't require a lot of changes.
Reviewed 9 of 9 files at r4, 1 of 1 files at r5, 8 of 8 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @RaduBerinde)
pkg/sql/opt/memo/testdata/stats/inverted-json, line 17 at r6 (raw file):
# - 10 empty arrays # - 990 arrays encoded into 1110 index entries # - 10 empty object
[nit] objects
pkg/sql/opt/memo/testdata/stats/inverted-json, line 26 at r6 (raw file):
"created_at": "2018-01-01 1:00:00.00000+00:00", "row_count": 2000, "distinct_count": 3,
should this be ~10?
pkg/sql/opt/memo/testdata/stats/inverted-json, line 263 at r6 (raw file):
"created_at": "2018-01-01 1:00:00.00000+00:00", "row_count": 2000, "distinct_count": 3,
should this be 5?
pkg/sql/opt/memo/testdata/stats/inverted-json, line 328 at r6 (raw file):
│ ├── ["7\x00\x019", "7\x00\x019"] │ └── ["7\x00\xff", "8") ├── stats: [rows=2e-07, distinct(1)=1.99999931e-07, null(1)=0, distinct(4)=2e-07, null(4)=0]
Why is the number of rows so tiny? Shouldn't this be 100?
pkg/sql/opt/memo/testdata/stats/inverted-json, line 359 at r6 (raw file):
│ ├── ["7\x00\x018", "7\x00\x018"] │ └── ["7\x00\x03", "7\x00\x03"] ├── stats: [rows=2e-07, distinct(1)=1.99999931e-07, null(1)=0, distinct(4)=2e-07, null(4)=0]
ditto
pkg/sql/opt/xform/testdata/rules/select, line 1755 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I removed these tests, because I think they're not necessary anymore. They seem more stats-related than xform-related to me. I can add them back in, but if we want to prevent a full inverted index scan for the test I'll have to inject some histogram statistics.
I haven't bothered with trying to make the statistics builder smarter about the
j @> '{}'
expression without histograms because it seems like an uncommon case. I'm assuming we don't care too much about cases where histograms don't exist, but correct me if that's wrong.
Sounds good
Fixes cockroachdb#56870 Release note (performance improvement): The optimizer now uses collected histograms statistics to better estimate the cost of JSON and ARRAY inverted index scans, which may lead to more efficient query plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/testdata/stats/inverted-json, line 17 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] objects
Done.
pkg/sql/opt/memo/testdata/stats/inverted-json, line 26 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
should this be ~10?
Good catch. There's no exact number of distinct values based on the histograms, but there some minimum. I've set it to 10.
pkg/sql/opt/memo/testdata/stats/inverted-json, line 263 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
should this be 5?
Done.
pkg/sql/opt/memo/testdata/stats/inverted-json, line 328 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Why is the number of rows so tiny? Shouldn't this be 100?
Good catch. The histogram buckets were no in order, which caused problems. Fixed.
pkg/sql/opt/memo/testdata/stats/inverted-json, line 359 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
Done.
b245f8d
to
a0f98f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
TFTR! bors r=rytaft |
Build succeeded: |
memo: simplify statisticsBuilder.constrainScan
A
ScanExpr
, which includes anInvertedConstraint
, is always passedto
statisticsBuilder.constrainScan
, therefore there is no need to passthe
InvertedConstraint
separately.Release note: None
memo: rename multi-column-inverted-geo test file to inverted-geo-multi-column
Release note: None
opt: use histograms for inverted JSON/ARRAY scan statistics
Fixes #56870
Release note (performance improvement): The optimizer now uses collected
histograms statistics to better estimate the cost of JSON and ARRAY
inverted index scans, which may lead to more efficient query plans.