-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make NULL ordering more thoroughly consistent #27885
Conversation
7b5c451
to
2c43174
Compare
04af07c
to
f26a317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed APIs look good to me.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):
// themselves produce a NULL result. // // Function: scalarCompare()
[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a ComparisonExpr
just to do a comparison.
pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):
// doCompare is the main function. func doCompare(ctx *EvalContext, orderedNULLs bool, a, b Datum) int {
[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return cmp int, ok bool
with the bool indicating if the result is known (not-NULL), instead of the -2
special value.
If not, add /* orderedNULLs */
to all callsites which pass true
or false
.
31cfa27
to
3499b75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/sem/tree/compare.go, line 33 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] Feels odd that this is not exported (even if there are no direct users right now). I'd hate for some future code to have to construct a
ComparisonExpr
just to do a comparison.
Exported.
pkg/sql/sem/tree/compare.go, line 126 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] It would be cleaner for the callers if these were two separate functions. The "sql" variant could return
cmp int, ok bool
with the bool indicating if the result is known (not-NULL), instead of the-2
special value.If not, add
/* orderedNULLs */
to all callsites which passtrue
orfalse
.
This is an intermediate step. A later patch replaces this argument entirely. Anyway I changed to add the comment in this commit so that the intermediate step becomes clearer.
I completed the PR and it actually fixes #12022. RFAL. Any suggestion as to which benchmark(s) to use? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I like these new APIs a lot! Really nice job @knz.
Reviewed 1 of 32 files at r1, 38 of 43 files at r2, 4 of 4 files at r3, 2 of 2 files at r4, 2 of 2 files at r5, 8 of 8 files at r6, 1 of 1 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):
for _, orderInfo := range ordering { col := orderInfo.ColIdx cmp, err := l[col].Distinct(&lTypes[col], d, e, &r[orderInfo.ColIdx])
nit wherever you call Distinct
. Consider changing the name of the variable you're assigning it to from cmp
to something else (dist
?.) so that it's obvious when reading that it's a bool and not the result of TotalOrderCompare
.
pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):
func (mc *memRowContainer) MaybeReplaceMax(ctx context.Context, row sqlbase.EncDatumRow) error { max := mc.At(0) cmp, err := row.LessThanDatums(mc.types, &mc.datumAlloc, mc.ordering, mc.evalCtx, max)
Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call LessThanDatums
we should swap cmp
for less
.
pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):
return 1 } return -2
If we're going to start returning more than the standard range of values (-1, 0, and 1
) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a new type internalCmpRes int
enum.
pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):
return RegIMatch, left, right, false, true case IsDistinctFrom: // Distinct(left, right) is implemented as !IsNotDistinctFrom(left, right)
Was this intentional?
Prior to this patch, the interface method `(Datum).Compare()` had the dual role to *order* datums (which requires a total relation) and to *compare* datums in the SQL logical sense where NULLs have a special behavior. Contrary to intuition, the two relations are not derivable from each other and need separate code. The implementation of the `Compare()` method is currently tailored towards ordering (although it also contains a bug in that role) and, meanwhile, does not suitably perform SQL logical comparisons. So there two groups of issues at hand: - every caller of `Compare()` which is interested in a SQL logical comparison other than "distinctness" is mistaken+erroneous. - every caller of `Compare()` which is interested in ordering is currently subject to a bug/limitation of `Compare()` in some edge cases involving tuples of arrays containing NULL elements. The fix to the 2nd issue, which will come in a later patch, will cause the ordering function to become *completely* inadequate to perform SQL logical comparisons. This, in turn, will really require two separate interfaces for the two comparison functions. In advance of that patch, this first commit creates a new API for datum comparisons. It is defined as follows (`tree/compare.go`): ```go // This implements the comparators for three separate relations. // // - a total ordering for the purpose of sorting and searching // values in indexes. In that relation, NULL sorts at the // same location as itself and before other values. // // Functions: TotalOrderLess(), TotalOrderCompare(). // // - the logical SQL scalar partial ordering, where non-NULL // values can be compared to each others but NULL comparisons // themselves produce a NULL result. // // Function: scalarCompare() // // - the IS [NOT] DISTINCT relation, in which every value can // be compared to every other, NULLs are distinct from every // non-NULL value but not distinct from each other. // // Function: Distinct(). // // Due to the way the SQL language semantics are constructed, it is // the case Distinct() returns true if and only if // TotalOrderCompare() returns nonzero. However, one should be // careful when using this methods to properly convey *intent* to the // reader of the code: // // - the functions related to the total order for sorting should only // be used in contexts that are about sorting values. // // - Distinct() and scalarCompare() should be used everywhere else. // // Besides, separating Distinct() from TotalOrderCompare() enables // later performance optimizations of the former by specializing the // code. This is currently done for e.g. EncDatums. ``` This commit only introduces the new API without change in behavior. Release note: None
Release note: None
Release note: None
Prior to this commit, comparisons between composite datums (tuples/arrays) and other things failed to properly account for NULLs "inside" the composite. For example, the comparison `(1,(0,0)) > (1,(0,NULL))` was reporting "false" whereas the result should really be NULL (unknown). This patch fixes it by extending the interface of `(Datum).internalCompare()` with a flag to indicate what to do when NULLs are encountered. The public APIs related to sorting set this flag to "order and compare the NULLs like any other value" and the scalar comparison says "bail out with 'unknown' upon encountering NULLs". Release note (bug fix): Several bugs in the handling of NULL values inside tuples or arrays have been fixed.
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit wherever you call
Distinct
. Consider changing the name of the variable you're assigning it to fromcmp
to something else (dist
?.) so that it's obvious when reading that it's a bool and not the result ofTotalOrderCompare
.
Done.
pkg/sql/distsqlrun/row_container.go, line 189 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same kind of nit as before. The semantics of all of these comparison related methods are hard enough to keep in one's head. Let's help future readers by using different variable names for each one. For instance, here and everywhere else you call
LessThanDatums
we should swapcmp
forless
.
Done.
pkg/sql/sem/tree/compare.go, line 165 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're going to start returning more than the standard range of values (
-1, 0, and 1
) then we should enumerate the possible return values somewhere. This is especially true if we return these outside this file (which we do). Maybe create a newtype internalCmpRes int
enum.
Done.
pkg/sql/sem/tree/eval.go, line 4015 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was this intentional?
Oops, not, too agressive perl. Reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 17 files at r8, 3 of 3 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/distsqlrun/disk_row_container_test.go, line 50 at r3 (raw file):
Previously, knz (kena) wrote…
Done.
Thanks, this is a lot easier to skim now.
pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):
} cmp := int(res)
You could still technically do MakeDBool(res == 0)
and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.
Requested by @nvanbenschoten. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do you want me to measure the perf impact on this?
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/sem/tree/compare.go, line 126 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You could still technically do
MakeDBool(res == 0)
and all the others. internalCmpResult can be compared against constants. Your call whether to perform this cast of not.
Done.
I think @jordanlewis would have a better idea than I do about the current state of our SQL expression evaluation benchmarking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would run a simple micro-benchmark comparing the performance of calling the old Compare
method for 2 integer datums with the new static method for 2 integer datums (both run in a tight loop). If there's no difference, then you can quit there.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
As per the discussion in #12022 I will change the PR to be closer to pg semantics. This violates the SQL standard but enables perf gains down the line (+ more pg compatibility). |
Unfortunately this PR has gone stale, even though the issue is still current. |
Fixes #12022.