-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: support BYTES for histogram range calculations #68740
opt: support BYTES for histogram range calculations #68740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking at this!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):
Quoted 5 lines of code…
switch typ.Family() { case types.StringFamily, types.BytesFamily, types.UuidFamily, types.INetFamily: return true } return false
To my untrained eye it looks like this list is missing a few more non-numeric type families, such as CollatedStringFamily
and JsonFamily
. Plus, I'm wondering if ArrayFamily
and TupleFamily
belong here as well? Hmm. I am new to this code, but it almost seems like getRangeNonNumeric
should be the default range-calculation function, and getRange
should only be called for special cases, since presumably every type's key encoding is bytewise ordered / comparable and at least somewhat bytewise "dense".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
switch typ.Family() { case types.StringFamily, types.BytesFamily, types.UuidFamily, types.INetFamily: return true } return false
To my untrained eye it looks like this list is missing a few more non-numeric type families, such as
CollatedStringFamily
andJsonFamily
. Plus, I'm wondering ifArrayFamily
andTupleFamily
belong here as well? Hmm. I am new to this code, but it almost seems likegetRangeNonNumeric
should be the default range-calculation function, andgetRange
should only be called for special cases, since presumably every type's key encoding is bytewise ordered / comparable and at least somewhat bytewise "dense".
I like the idea of swapping the default, and checking instead for the types that we explicitly support in getRange
. Can these other non-numeric types be easily added in this PR? Otherwise, maybe open an issue for the other types that we're missing?
On a somewhat related note, I think we can also add Interval
, Enum
, and Oid
as numeric types (in a separate PR...).
pkg/sql/opt/props/histogram_test.go, line 710 at r1 (raw file):
}) t.Run("bytes", func(t *testing.T) {
is all of this basically directly copied from above? if so, can you just have one copy that's used for both string and bytes?
e2ae5b2
to
28adb74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in getting back to these comments. PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)
pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I like the idea of swapping the default, and checking instead for the types that we explicitly support in
getRange
. Can these other non-numeric types be easily added in this PR? Otherwise, maybe open an issue for the other types that we're missing?On a somewhat related note, I think we can also add
Interval
,Enum
, andOid
as numeric types (in a separate PR...).
I've combined the three functions getRange
, getRangeNonNumeric
and isNonNumeric
range into a single switch statement. I think this is more clear and avoids having to keep three functions in-sync.
pkg/sql/opt/props/histogram_test.go, line 710 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
is all of this basically directly copied from above? if so, can you just have one copy that's used for both string and bytes?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #70320 to track calculating ranges for additional types.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)
28adb74
to
73f3a67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rytaft)
pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I've combined the three functions
getRange
,getRangeNonNumeric
andisNonNumeric
range into a single switch statement. I think this is more clear and avoids having to keep three functions in-sync.
OK, thanks for the refactor and the filed issue.
pkg/sql/opt/props/histogram_test.go, line 647 at r4 (raw file):
}) t.Run("string/bytes", func(t *testing.T) {
nit: "/" is the subtest separator for go test
, so this name creates a TestFilterBucket/string/bytes
sub-sub-test instead of a TestFilterBucket/<x>
sub-test. Maybe use "string,bytes"
or some other symbol instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, 4 of 4 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/props/histogram.go, line 716 at r5 (raw file):
// func getRangesBeforeAndAfter( bucketLowerBound, bucketUpperBound, spanLowerBound, spanUpperBound tree.Datum, swap bool,
nit: I think we can get rid of the bucket v. span distinction to further clarify things. i.e. bucket -> before, span -> after
pkg/sql/opt/props/histogram.go, line 820 at r5 (raw file):
// For non-numeric types, convert the datums to encoded keys to // approximate the range. We utilize an array to reduce repetitive code // number of repetitive calls.
nit: "...to reduce repetitive code number of repetitive calls." is repetitive! 😉
73f3a67
to
a041913
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/props/histogram.go, line 716 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: I think we can get rid of the bucket v. span distinction to further clarify things. i.e. bucket -> before, span -> after
Done.
pkg/sql/opt/props/histogram.go, line 820 at r5 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: "...to reduce repetitive code number of repetitive calls." is repetitive! 😉
🙃
pkg/sql/opt/props/histogram_test.go, line 647 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
nit: "/" is the subtest separator for
go test
, so this name creates aTestFilterBucket/string/bytes
sub-sub-test instead of aTestFilterBucket/<x>
sub-test. Maybe use"string,bytes"
or some other symbol instead?
Good catch. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
pkg/sql/opt/props/histogram.go, line 666 at r6 (raw file):
} // getRangesBeforeAndAfter returns the size of the ranges before and after the
nit: update comment to remove references to bucket/span
pkg/sql/opt/props/histogram.go, line 736 at r6 (raw file):
// TODO(rytaft): handle more types here. // Note: the calculations below assume that bucketLowerBound is inclusive and // Span.PreferInclusive() has been called on the span.
nit: update comment (just say all bounds are inclusive)
Fixes cockroachdb#68346 Release note (performance improvement): The accuracy of histogram calculations for BYTES types has been improved. As a result, the optimizer should generate more efficient query plans in some cases.
Histogram range calculation has been refactored into a single switch statement. This prohibits inconsistencies between the three previous functions `getRange`, `getRangeNonNumeric` and `isNonNumeric`. It also highlights the preference for calculating numeric ranges when possible. Release note: None
a041913
to
03c45a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)
pkg/sql/opt/props/histogram.go, line 666 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: update comment to remove references to bucket/span
Done.
pkg/sql/opt/props/histogram.go, line 736 at r6 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: update comment (just say all bounds are inclusive)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
TFTRs! bors r+ |
Build succeeded: |
Fixes #68346
Release note (performance improvement): The accuracy of histogram
calculations for BYTES types has been improved. As a result, the
optimizer should generate more efficient query plans in some cases.