sql: make NULL ordering more thoroughly consistent #27885

knz · 2018-07-24T14:57:33Z

Fixes #12022.

1st commit is API change + small improvement to EncDatum comparison
last commit is bug fix + extra tests

cockroach-teamcity · 2018-07-24T14:57:43Z

This change is

RaduBerinde

The proposed APIs look good to me.

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):

//   themselves produce a NULL result.
//
//   Function: scalarCompare()

[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a ComparisonExpr just to do a comparison.

pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):

// doCompare is the main function.
func doCompare(ctx *EvalContext, orderedNULLs bool, a, b Datum) int {

[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return cmp int, ok bool with the bool indicating if the result is known (not-NULL), instead of the -2 special value.

If not, add /* orderedNULLs */ to all callsites which pass true or false.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a ComparisonExpr just to do a comparison.

Exported.

pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return cmp int, ok bool with the bool indicating if the result is known (not-NULL), instead of the -2 special value.

If not, add /* orderedNULLs */ to all callsites which pass true or false.

This is an intermediate step. A later patch replaces this argument entirely. Anyway I changed to add the comment in this commit so that the intermediate step becomes clearer.

knz · 2018-07-25T16:17:34Z

I completed the PR and it actually fixes #12022. RFAL.

Any suggestion as to which benchmark(s) to use?

nvanbenschoten

I agree, I like these new APIs a lot! Really nice job @knz.

Reviewed 1 of 32 files at r1, 38 of 43 files at r2, 4 of 4 files at r3, 2 of 2 files at r4, 2 of 2 files at r5, 8 of 8 files at r6, 1 of 1 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

	for _, orderInfo := range ordering {
		col := orderInfo.ColIdx
		cmp, err := l[col].Distinct(&lTypes[col], d, e, &r[orderInfo.ColIdx])

nit wherever you call Distinct. Consider changing the name of the variable you're assigning it to from cmp to something else (dist?.) so that it's obvious when reading that it's a bool and not the result of TotalOrderCompare.

pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):

func (mc *memRowContainer) MaybeReplaceMax(ctx context.Context, row sqlbase.EncDatumRow) error {
	max := mc.At(0)
	cmp, err := row.LessThanDatums(mc.types, &mc.datumAlloc, mc.ordering, mc.evalCtx, max)

Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call LessThanDatums we should swap cmp for less.

pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):

			return 1
		}
		return -2

If we're going to start returning more than the standard range of values (-1, 0, and 1) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a new type internalCmpRes int enum.

pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):

		return RegIMatch, left, right, false, true
	case IsDistinctFrom:
		// Distinct(left, right) is implemented as !IsNotDistinctFrom(left, right)

Was this intentional?

Prior to this patch, the interface method `(Datum).Compare()` had the dual role to *order* datums (which requires a total relation) and to *compare* datums in the SQL logical sense where NULLs have a special behavior. Contrary to intuition, the two relations are not derivable from each other and need separate code. The implementation of the `Compare()` method is currently tailored towards ordering (although it also contains a bug in that role) and, meanwhile, does not suitably perform SQL logical comparisons. So there two groups of issues at hand: - every caller of `Compare()` which is interested in a SQL logical comparison other than "distinctness" is mistaken+erroneous. - every caller of `Compare()` which is interested in ordering is currently subject to a bug/limitation of `Compare()` in some edge cases involving tuples of arrays containing NULL elements. The fix to the 2nd issue, which will come in a later patch, will cause the ordering function to become *completely* inadequate to perform SQL logical comparisons. This, in turn, will really require two separate interfaces for the two comparison functions. In advance of that patch, this first commit creates a new API for datum comparisons. It is defined as follows (`tree/compare.go`): ```go // This implements the comparators for three separate relations. // // - a total ordering for the purpose of sorting and searching // values in indexes. In that relation, NULL sorts at the // same location as itself and before other values. // // Functions: TotalOrderLess(), TotalOrderCompare(). // // - the logical SQL scalar partial ordering, where non-NULL // values can be compared to each others but NULL comparisons // themselves produce a NULL result. // // Function: scalarCompare() // // - the IS [NOT] DISTINCT relation, in which every value can // be compared to every other, NULLs are distinct from every // non-NULL value but not distinct from each other. // // Function: Distinct(). // // Due to the way the SQL language semantics are constructed, it is // the case Distinct() returns true if and only if // TotalOrderCompare() returns nonzero. However, one should be // careful when using this methods to properly convey *intent* to the // reader of the code: // // - the functions related to the total order for sorting should only // be used in contexts that are about sorting values. // // - Distinct() and scalarCompare() should be used everywhere else. // // Besides, separating Distinct() from TotalOrderCompare() enables // later performance optimizations of the former by specializing the // code. This is currently done for e.g. EncDatums. ``` This commit only introduces the new API without change in behavior. Release note: None

Release note: None

Prior to this commit, comparisons between composite datums (tuples/arrays) and other things failed to properly account for NULLs "inside" the composite. For example, the comparison `(1,(0,0)) > (1,(0,NULL))` was reporting "false" whereas the result should really be NULL (unknown). This patch fixes it by extending the interface of `(Datum).internalCompare()` with a flag to indicate what to do when NULLs are encountered. The public APIs related to sorting set this flag to "order and compare the NULLs like any other value" and the scalar comparison says "bail out with 'unknown' upon encountering NULLs". Release note (bug fix): Several bugs in the handling of NULL values inside tuples or arrays have been fixed.

Release note: None

knz

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit wherever you call Distinct. Consider changing the name of the variable you're assigning it to from cmp to something else (dist?.) so that it's obvious when reading that it's a bool and not the result of TotalOrderCompare.

Done.

pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call LessThanDatums we should swap cmp for less.

Done.

pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're going to start returning more than the standard range of values (-1, 0, and 1) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a new type internalCmpRes int enum.

Done.

pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was this intentional?

Oops, not, too agressive perl. Reverted.

nvanbenschoten

Reviewed 16 of 17 files at r8, 3 of 3 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

Previously, knz (kena) wrote…

Done.

Thanks, this is a lot easier to skim now.

pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):

	}

	cmp := int(res)

You could still technically do MakeDBool(res == 0) and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.

@nvanbenschoten

Requested by @nvanbenschoten. Release note: None

knz

So how do you want me to measure the perf impact on this?

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You could still technically do MakeDBool(res == 0) and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.

Done.

nvanbenschoten · 2018-07-26T14:41:29Z

So how do you want me to measure the perf impact on this?

I think @jordanlewis would have a better idea than I do about the current state of our SQL expression evaluation benchmarking.

andy-kimball

I would run a simple micro-benchmark comparing the performance of calling the old Compare method for 2 integer datums with the new static method for 2 integer datums (both run in a tight loop). If there's no difference, then you can quit there.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

knz · 2018-07-26T14:57:42Z

As per the discussion in #12022 I will change the PR to be closer to pg semantics. This violates the SQL standard but enables perf gains down the line (+ more pg compatibility).

knz · 2020-08-04T10:42:47Z

Unfortunately this PR has gone stale, even though the issue is still current.
@asubiotto @yuzefovich I'll let you decide if you want to salvage anything from this code.

knz requested review from a team July 24, 2018 14:57

knz requested a review from a team as a code owner July 24, 2018 14:57

knz requested review from a team July 24, 2018 14:57

knz added the do-not-merge bors won't merge a PR with this label. label Jul 24, 2018

knz force-pushed the 20180724-null-cmp branch 2 times, most recently from 7b5c451 to 2c43174 Compare July 24, 2018 17:31

knz requested a review from a team July 24, 2018 17:31

knz force-pushed the 20180724-null-cmp branch 5 times, most recently from 04af07c to f26a317 Compare July 25, 2018 08:30

knz mentioned this pull request Jul 25, 2018

sql: tuples with NULLs don't compare sanely #12022

Closed

RaduBerinde reviewed Jul 25, 2018

View reviewed changes

knz force-pushed the 20180724-null-cmp branch 3 times, most recently from 31cfa27 to 3499b75 Compare July 25, 2018 15:29

knz commented Jul 25, 2018

View reviewed changes

knz force-pushed the 20180724-null-cmp branch from 3499b75 to 65cb6a0 Compare July 25, 2018 16:16

knz changed the title ~~[DNM] sql: clarify/fix the meaning of "Datum comparisons"~~ sql: make NULL ordering more thoroughly consistent Jul 25, 2018

knz removed the do-not-merge bors won't merge a PR with this label. label Jul 25, 2018

knz requested a review from nvanbenschoten July 25, 2018 16:34

knz force-pushed the 20180724-null-cmp branch from 39bb801 to 498b04f Compare July 25, 2018 17:33

nvanbenschoten requested changes Jul 26, 2018

View reviewed changes

knz added 5 commits July 26, 2018 14:57

sql/sem/tree: move internalCompare impls to compare.go

df96b6c

Release note: None

sql/sem/tree: un-export containsNull()

d35fbf7

Release note: None

sql/sem/tree: factor common code in internalCompare()

4952ef6

Release note: None

knz force-pushed the 20180724-null-cmp branch from 498b04f to 83a4131 Compare July 26, 2018 13:12

knz commented Jul 26, 2018

View reviewed changes

nvanbenschoten approved these changes Jul 26, 2018

View reviewed changes

sql/sem/tree: use a typedef for the result of internalCompare

92243d4

Requested by @nvanbenschoten. Release note: None

knz force-pushed the 20180724-null-cmp branch from 83a4131 to 92243d4 Compare July 26, 2018 14:14

knz commented Jul 26, 2018

View reviewed changes

andy-kimball reviewed Jul 26, 2018

View reviewed changes

knz mentioned this pull request Aug 21, 2018

sql: equality, IN incorrect when dealing with "deep" tuples #22717

Closed

tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019

knz closed this Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: make NULL ordering more thoroughly consistent #27885

sql: make NULL ordering more thoroughly consistent #27885

knz commented Jul 24, 2018 •

edited

Loading

cockroach-teamcity commented Jul 24, 2018

RaduBerinde left a comment

knz left a comment

knz commented Jul 25, 2018

nvanbenschoten left a comment

knz left a comment

nvanbenschoten left a comment

knz left a comment

nvanbenschoten commented Jul 26, 2018

andy-kimball left a comment

knz commented Jul 26, 2018

knz commented Aug 4, 2020

sql: make NULL ordering more thoroughly consistent #27885

sql: make NULL ordering more thoroughly consistent #27885

Conversation

knz commented Jul 24, 2018 • edited Loading

cockroach-teamcity commented Jul 24, 2018

RaduBerinde left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Jul 25, 2018

nvanbenschoten left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jul 26, 2018

andy-kimball left a comment

Choose a reason for hiding this comment

knz commented Jul 26, 2018

knz commented Aug 4, 2020

knz commented Jul 24, 2018 •

edited

Loading