Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: support BYTES for histogram range calculations #68740

Merged
merged 2 commits into from
Sep 24, 2021

Conversation

mgartner
Copy link
Collaborator

Fixes #68346

Release note (performance improvement): The accuracy of histogram
calculations for BYTES types has been improved. As a result, the
optimizer should generate more efficient query plans in some cases.

@mgartner mgartner requested review from rytaft, michae2 and a team August 11, 2021 23:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking at this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):

Quoted 5 lines of code…
	switch typ.Family() {
	case types.StringFamily, types.BytesFamily, types.UuidFamily, types.INetFamily:
		return true
	}
	return false

To my untrained eye it looks like this list is missing a few more non-numeric type families, such as CollatedStringFamily and JsonFamily. Plus, I'm wondering if ArrayFamily and TupleFamily belong here as well? Hmm. I am new to this code, but it almost seems like getRangeNonNumeric should be the default range-calculation function, and getRange should only be called for special cases, since presumably every type's key encoding is bytewise ordered / comparable and at least somewhat bytewise "dense".

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…
	switch typ.Family() {
	case types.StringFamily, types.BytesFamily, types.UuidFamily, types.INetFamily:
		return true
	}
	return false

To my untrained eye it looks like this list is missing a few more non-numeric type families, such as CollatedStringFamily and JsonFamily. Plus, I'm wondering if ArrayFamily and TupleFamily belong here as well? Hmm. I am new to this code, but it almost seems like getRangeNonNumeric should be the default range-calculation function, and getRange should only be called for special cases, since presumably every type's key encoding is bytewise ordered / comparable and at least somewhat bytewise "dense".

I like the idea of swapping the default, and checking instead for the types that we explicitly support in getRange. Can these other non-numeric types be easily added in this PR? Otherwise, maybe open an issue for the other types that we're missing?

On a somewhat related note, I think we can also add Interval, Enum, and Oid as numeric types (in a separate PR...).


pkg/sql/opt/props/histogram_test.go, line 710 at r1 (raw file):

	})

	t.Run("bytes", func(t *testing.T) {

is all of this basically directly copied from above? if so, can you just have one copy that's used for both string and bytes?

@mgartner mgartner force-pushed the fix-bytes-histogram-filtering branch from e2ae5b2 to 28adb74 Compare September 16, 2021 16:31
@mgartner mgartner requested a review from a team as a code owner September 16, 2021 16:31
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to these comments. PTAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)


pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I like the idea of swapping the default, and checking instead for the types that we explicitly support in getRange. Can these other non-numeric types be easily added in this PR? Otherwise, maybe open an issue for the other types that we're missing?

On a somewhat related note, I think we can also add Interval, Enum, and Oid as numeric types (in a separate PR...).

I've combined the three functions getRange, getRangeNonNumeric and isNonNumeric range into a single switch statement. I think this is more clear and avoids having to keep three functions in-sync.


pkg/sql/opt/props/histogram_test.go, line 710 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

is all of this basically directly copied from above? if so, can you just have one copy that's used for both string and bytes?

Done.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #70320 to track calculating ranges for additional types.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rytaft)

@mgartner mgartner force-pushed the fix-bytes-histogram-filtering branch from 28adb74 to 73f3a67 Compare September 16, 2021 20:16
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r1, 1 of 1 files at r2, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @rytaft)


pkg/sql/opt/props/histogram.go, line 864 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I've combined the three functions getRange, getRangeNonNumeric and isNonNumeric range into a single switch statement. I think this is more clear and avoids having to keep three functions in-sync.

OK, thanks for the refactor and the filed issue.


pkg/sql/opt/props/histogram_test.go, line 647 at r4 (raw file):

	})

	t.Run("string/bytes", func(t *testing.T) {

nit: "/" is the subtest separator for go test, so this name creates a TestFilterBucket/string/bytes sub-sub-test instead of a TestFilterBucket/<x> sub-test. Maybe use "string,bytes" or some other symbol instead?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, 4 of 4 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/props/histogram.go, line 716 at r5 (raw file):

//
func getRangesBeforeAndAfter(
	bucketLowerBound, bucketUpperBound, spanLowerBound, spanUpperBound tree.Datum, swap bool,

nit: I think we can get rid of the bucket v. span distinction to further clarify things. i.e. bucket -> before, span -> after


pkg/sql/opt/props/histogram.go, line 820 at r5 (raw file):

		// For non-numeric types, convert the datums to encoded keys to
		// approximate the range. We utilize an array to reduce repetitive code
		// number of repetitive calls.

nit: "...to reduce repetitive code number of repetitive calls." is repetitive! 😉

@mgartner mgartner force-pushed the fix-bytes-histogram-filtering branch from 73f3a67 to a041913 Compare September 23, 2021 16:34
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/props/histogram.go, line 716 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: I think we can get rid of the bucket v. span distinction to further clarify things. i.e. bucket -> before, span -> after

Done.


pkg/sql/opt/props/histogram.go, line 820 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: "...to reduce repetitive code number of repetitive calls." is repetitive! 😉

🙃


pkg/sql/opt/props/histogram_test.go, line 647 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

nit: "/" is the subtest separator for go test, so this name creates a TestFilterBucket/string/bytes sub-sub-test instead of a TestFilterBucket/<x> sub-test. Maybe use "string,bytes" or some other symbol instead?

Good catch. Fixed.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)


pkg/sql/opt/props/histogram.go, line 666 at r6 (raw file):

}

// getRangesBeforeAndAfter returns the size of the ranges before and after the

nit: update comment to remove references to bucket/span


pkg/sql/opt/props/histogram.go, line 736 at r6 (raw file):

	// TODO(rytaft): handle more types here.
	// Note: the calculations below assume that bucketLowerBound is inclusive and
	// Span.PreferInclusive() has been called on the span.

nit: update comment (just say all bounds are inclusive)

Fixes cockroachdb#68346

Release note (performance improvement): The accuracy of histogram
calculations for BYTES types has been improved. As a result, the
optimizer should generate more efficient query plans in some cases.
Histogram range calculation has been refactored into a single switch
statement. This prohibits inconsistencies between the three previous
functions `getRange`, `getRangeNonNumeric` and `isNonNumeric`. It also
highlights the preference for calculating numeric ranges when possible.

Release note: None
@mgartner mgartner force-pushed the fix-bytes-histogram-filtering branch from a041913 to 03c45a0 Compare September 23, 2021 20:16
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @rytaft)


pkg/sql/opt/props/histogram.go, line 666 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: update comment to remove references to bucket/span

Done.


pkg/sql/opt/props/histogram.go, line 736 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: update comment (just say all bounds are inclusive)

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 24, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: row estimates for bytes columns inaccurate
4 participants