Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make NULL ordering more thoroughly consistent #27885

Closed
wants to merge 6 commits into from

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 24, 2018

Fixes #12022.

  • 1st commit is API change + small improvement to EncDatum comparison
  • last commit is bug fix + extra tests

@knz knz requested review from a team July 24, 2018 14:57
@knz knz requested a review from a team as a code owner July 24, 2018 14:57
@knz knz requested review from a team July 24, 2018 14:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz added the do-not-merge bors won't merge a PR with this label. label Jul 24, 2018
@knz knz force-pushed the 20180724-null-cmp branch 2 times, most recently from 7b5c451 to 2c43174 Compare July 24, 2018 17:31
@knz knz requested a review from a team July 24, 2018 17:31
@knz knz force-pushed the 20180724-null-cmp branch 5 times, most recently from 04af07c to f26a317 Compare July 25, 2018 08:30
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed APIs look good to me.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):

//   themselves produce a NULL result.
//
//   Function: scalarCompare()

[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a ComparisonExpr just to do a comparison.


pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):

// doCompare is the main function.
func doCompare(ctx *EvalContext, orderedNULLs bool, a, b Datum) int {

[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return cmp int, ok bool with the bool indicating if the result is known (not-NULL), instead of the -2 special value.

If not, add /* orderedNULLs */ to all callsites which pass true or false.

@knz knz force-pushed the 20180724-null-cmp branch 3 times, most recently from 31cfa27 to 3499b75 Compare July 25, 2018 15:29
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a ComparisonExpr just to do a comparison.

Exported.


pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return cmp int, ok bool with the bool indicating if the result is known (not-NULL), instead of the -2 special value.

If not, add /* orderedNULLs */ to all callsites which pass true or false.

This is an intermediate step. A later patch replaces this argument entirely. Anyway I changed to add the comment in this commit so that the intermediate step becomes clearer.

@knz knz force-pushed the 20180724-null-cmp branch from 3499b75 to 65cb6a0 Compare July 25, 2018 16:16
@knz knz changed the title [DNM] sql: clarify/fix the meaning of "Datum comparisons" sql: make NULL ordering more thoroughly consistent Jul 25, 2018
@knz knz removed the do-not-merge bors won't merge a PR with this label. label Jul 25, 2018
@knz
Copy link
Contributor Author

knz commented Jul 25, 2018

I completed the PR and it actually fixes #12022. RFAL.

Any suggestion as to which benchmark(s) to use?

@knz knz requested a review from nvanbenschoten July 25, 2018 16:34
@knz knz force-pushed the 20180724-null-cmp branch from 39bb801 to 498b04f Compare July 25, 2018 17:33
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I like these new APIs a lot! Really nice job @knz.

Reviewed 1 of 32 files at r1, 38 of 43 files at r2, 4 of 4 files at r3, 2 of 2 files at r4, 2 of 2 files at r5, 8 of 8 files at r6, 1 of 1 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

	for _, orderInfo := range ordering {
		col := orderInfo.ColIdx
		cmp, err := l[col].Distinct(&lTypes[col], d, e, &r[orderInfo.ColIdx])

nit wherever you call Distinct. Consider changing the name of the variable you're assigning it to from cmp to something else (dist?.) so that it's obvious when reading that it's a bool and not the result of TotalOrderCompare.


pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):

func (mc *memRowContainer) MaybeReplaceMax(ctx context.Context, row sqlbase.EncDatumRow) error {
	max := mc.At(0)
	cmp, err := row.LessThanDatums(mc.types, &mc.datumAlloc, mc.ordering, mc.evalCtx, max)

Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call LessThanDatums we should swap cmp for less.


pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):

			return 1
		}
		return -2

If we're going to start returning more than the standard range of values (-1, 0, and 1) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a new type internalCmpRes int enum.


pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):

		return RegIMatch, left, right, false, true
	case IsDistinctFrom:
		// Distinct(left, right) is implemented as !IsNotDistinctFrom(left, right)

Was this intentional?

knz added 5 commits July 26, 2018 14:57
Prior to this patch, the interface method `(Datum).Compare()` had the
dual role to *order* datums (which requires a total relation) and
to *compare* datums in the SQL logical sense where NULLs have a
special behavior.

Contrary to intuition, the two relations are not derivable from each
other and need separate code. The implementation of the `Compare()`
method is currently tailored towards ordering (although it also
contains a bug in that role) and, meanwhile, does not suitably perform
SQL logical comparisons.

So there two groups of issues at hand:

- every caller of `Compare()` which is interested in a SQL logical
  comparison other than "distinctness" is mistaken+erroneous.

- every caller of `Compare()` which is interested in ordering is
  currently subject to a bug/limitation of `Compare()` in some edge
  cases involving tuples of arrays containing NULL elements.

The fix to the 2nd issue, which will come in a later patch, will cause
the ordering function to become *completely* inadequate to perform SQL
logical comparisons. This, in turn, will really require two separate
interfaces for the two comparison functions.

In advance of that patch, this first commit creates a new API for
datum comparisons. It is defined as follows (`tree/compare.go`):

```go
// This implements the comparators for three separate relations.
//
// - a total ordering for the purpose of sorting and searching
//   values in indexes. In that relation, NULL sorts at the
//   same location as itself and before other values.
//
//   Functions: TotalOrderLess(), TotalOrderCompare().
//
// - the logical SQL scalar partial ordering, where non-NULL
//   values can be compared to each others but NULL comparisons
//   themselves produce a NULL result.
//
//   Function: scalarCompare()
//
// - the IS [NOT] DISTINCT relation, in which every value can
//   be compared to every other, NULLs are distinct from every
//   non-NULL value but not distinct from each other.
//
//   Function: Distinct().
//
// Due to the way the SQL language semantics are constructed, it is
// the case Distinct() returns true if and only if
// TotalOrderCompare() returns nonzero.  However, one should be
// careful when using this methods to properly convey *intent* to the
// reader of the code:
//
// - the functions related to the total order for sorting should only
//   be used in contexts that are about sorting values.
//
// - Distinct() and scalarCompare() should be used everywhere else.
//
// Besides, separating Distinct() from TotalOrderCompare() enables
// later performance optimizations of the former by specializing the
// code. This is currently done for e.g. EncDatums.
```

This commit only introduces the new API without change in behavior.

Release note: None
Prior to this commit, comparisons between composite
datums (tuples/arrays) and other things failed to properly account for
NULLs "inside" the composite.

For example, the comparison `(1,(0,0)) > (1,(0,NULL))` was reporting
"false" whereas the result should really be NULL (unknown).

This patch fixes it by extending the interface of
`(Datum).internalCompare()` with a flag to indicate what to do when
NULLs are encountered. The public APIs related to sorting set this
flag to "order and compare the NULLs like any other value" and the
scalar comparison says "bail out with 'unknown' upon encountering
NULLs".

Release note (bug fix): Several bugs in the handling of NULL values
inside tuples or arrays have been fixed.
@knz knz force-pushed the 20180724-null-cmp branch from 498b04f to 83a4131 Compare July 26, 2018 13:12
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit wherever you call Distinct. Consider changing the name of the variable you're assigning it to from cmp to something else (dist?.) so that it's obvious when reading that it's a bool and not the result of TotalOrderCompare.

Done.


pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call LessThanDatums we should swap cmp for less.

Done.


pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're going to start returning more than the standard range of values (-1, 0, and 1) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a new type internalCmpRes int enum.

Done.


pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was this intentional?

Oops, not, too agressive perl. Reverted.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 16 of 17 files at r8, 3 of 3 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):

Previously, knz (kena) wrote…

Done.

Thanks, this is a lot easier to skim now.


pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):

	}

	cmp := int(res)

You could still technically do MakeDBool(res == 0) and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.

@knz knz force-pushed the 20180724-null-cmp branch from 83a4131 to 92243d4 Compare July 26, 2018 14:14
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do you want me to measure the perf impact on this?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You could still technically do MakeDBool(res == 0) and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.

Done.

@nvanbenschoten
Copy link
Member

So how do you want me to measure the perf impact on this?

I think @jordanlewis would have a better idea than I do about the current state of our SQL expression evaluation benchmarking.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would run a simple micro-benchmark comparing the performance of calling the old Compare method for 2 integer datums with the new static method for 2 integer datums (both run in a tight loop). If there's no difference, then you can quit there.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@knz
Copy link
Contributor Author

knz commented Jul 26, 2018

As per the discussion in #12022 I will change the PR to be closer to pg semantics. This violates the SQL standard but enables perf gains down the line (+ more pg compatibility).

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@knz
Copy link
Contributor Author

knz commented Aug 4, 2020

Unfortunately this PR has gone stale, even though the issue is still current.
@asubiotto @yuzefovich I'll let you decide if you want to salvage anything from this code.

@knz knz closed this Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: tuples with NULLs don't compare sanely
6 participants