sql: tuples with NULLs don't compare sanely #12022

knz · 2016-12-05T18:20:43Z

Found by @eisenstatdavid in #10475 (comment)

The logic for not defining the previous element of (e.g.) -inf to be null is that, in a filter expression like col < -inf, the null values of col don't satisfy the filter (because null < -inf is null (unknown, technically, but we conflate unknown and boolean null) when we're using the user-accessible builtin and not .Compare), and thus should be excluded.

For arrays, I think Next(a) = a + [null] is good enough for the limited purpose of scanning an index. If we unthinkingly rewrite y > [1] to y >= [1, null], then despite the fact that [1, 2] > [1] is true, [1, 2] > [1, null] should be unknown (guessing this is what PostgreSQL does from its handling of tuples; IIRC, we don't do this because the array builtin calls .Compare, but we should).

For tuples also, we don't have a good answer. (2, null) sorts in between (1, max) and (2, min), so we should include (2, null) when there's a > (1, max) filter. This means that defining Next((1, max)) = (2, min) is bad, because then it's not safe to rewrite, e.g., y > (1, max) to y >= (2, min), because the semantics for (2, null) changes, from true to either false (.Compare) or null (as the builtin should return, but doesn't yet). Defining Next((1, max)) = (2, null) has the problem described above.

Jira issue: CRDB-6132

nvanbenschoten · 2016-12-06T19:34:40Z

I'm not sure that we ever want to return NULL values for a query that has a predicate on those values. The logic for this is that the comparison would be unknown, like you pointed out, so the null value would never pass the filter. This is how our current ordering works (with and without indexes, just like PG), and I believe means that this issue is not needed.

root@:26257> select a, b from t order by b;
+---+-------+
| a |   b   |
+---+-------+
| 3 | NULL  |
| 1 | false |
| 2 | true  |
+---+-------+
(3 rows)
root@:26257> select a, b from t where b > false order by b;
+---+------+
| a |  b   |
+---+------+
| 2 | true |
+---+------+
(1 row)
root@:26257> select a, b from t where b <= false order by b;
+---+-------+
| a |   b   |
+---+-------+
| 1 | false |
+---+-------+
(1 row)

Following this logic, the new issue would be making sure that that (3, null) is not returned from a query with a predicate of y > (1, max).

eisenstatdavid · 2016-12-06T19:52:31Z

Tuple comparison is lexicographic, not pointwise, hence PostgreSQL does return true sometimes when comparing tuples with null:

eisen=# SELECT (1, 'inf'::float) < (3, null);
 ?column?
----------
 t
(1 row)

eisenstatdavid · 2016-12-06T20:15:45Z

That's the reason we can't rewrite y > (1, max) to y >= (2, null).

…

On Tue, Dec 6, 2016 at 2:58 PM, Nathan VanBenschoten < ***@***.***> wrote: Does that change anything with respect to the initial question? nathan=# SELECT (2, 'inf'::float) < (2, null); ?column? ---------- (1 row) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12022 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADrgipCQBGP0LvV5HHGMptBsckVBDs2-ks5rFb6CgaJpZM4LEjHF> .

nvanbenschoten · 2016-12-06T20:15:58Z

It sounds like we'll need to take this lexicographical comparison into account when performing the normalization from y > (1, max) to y >= some tuple. I don't think that means we'll need to change the semantics of Next, but instead maybe decouple these two pieces of logic. Interestingly, normalizing y > (1, max) to y >= (1, null) would work because of what I mentioned earlier.

eisenstatdavid · 2016-12-06T20:20:00Z

Interestingly, normalizing y > (1, max) to y >= (1, null) would work because of what I mentioned earlier.

Only if we don't use an index, which puts null first. We could go deeper and rewrite to y' >= (2,), dropping the last component out.

nvanbenschoten · 2016-12-06T21:28:15Z

I might be missing something, but it doesn't look like we ever use indexes for inequalities between tuples. @RaduBerinde do you know if this is true, and if so, why we don't currently use them?

root@:26257> EXPLAIN SELECT a, b FROM t WHERE (a, b) = (2, true);
+-------+------+-------------------------+
| Level | Type |       Description       |
+-------+------+-------------------------+
|     0 | scan | tt@tt_a_b_idx /2/1-/2/2 |
+-------+------+-------------------------+
(1 row)
root@:26257> EXPLAIN SELECT a, b FROM t WHERE (a, b) > (2, true);
+-------+------+-----------------+
| Level | Type |   Description   |
+-------+------+-----------------+
|     0 | scan | tt@tt_a_b_idx - |
+-------+------+-----------------+
(1 row)

Regardless, dropping out the last component during index selection like you brought up seems like the correct approach. This is what I was hoping to achieve with the hacky normalization from y > (1, max) to y >= (1, null).

We'll also need to take note that index selection currently adds in an implicit IS NOT NULL constraint for isolated end constraints, resulting in the behavior I described above. If/when we support the use of indexes for inequalities between tuples, similar behavior will be expected.

RaduBerinde · 2016-12-06T21:43:52Z

There are various known gaps in the span generation code (we have some issues filed #6390, #6346).

eisenstatdavid · 2017-05-02T16:34:27Z

Concrete test case:

CREATE DATABASE d;
SET DATABASE = d;

CREATE TABLE t (
  c1 INT,
  c2 INT,
  UNIQUE INDEX i (c1, c2)
);

INSERT INTO t VALUES
  (NULL, NULL), (NULL, 1), (NULL, 2),
  (1, NULL), (1, 1), (1, 2),
  (2, NULL), (2, 1), (2, 2);

SELECT * FROM t WHERE (c1, c2) > (1, 9223372036854775807);

returns

c1	c2
2	1
2	2

but 2 NULL should be included.

eisenstatdavid · 2017-05-02T19:34:48Z

The PR referenced above implements Nathan's "hacky normalization" strategy, documenting at length in the contract for Next/Prev why it works for now.

knz · 2017-11-01T18:07:08Z

@awoods187 I am bumping this issue because it's actually a correctness issue that's likely to bite us earlier than later. I am not sure how to best approach it though wrt a solution. This might need input from multiple people (Nathan, Peter, Ben, Radu come to mind).

RaduBerinde · 2018-08-15T20:46:21Z

Hm, that looks like a bug. IS NULL and similar should only return true or false.

jordanlewis · 2019-05-14T15:29:30Z

@justinj's issue no longer reproduces, nor does David Eisenstat's original reproduction.

@knz, if you still have context, could you please provide a minimal reproduction of the problem here? I will edit the main body to include that reproduction. If there is no longer a reproduction, let's close this issue and re-open with something more concrete.

knz · 2019-05-14T17:02:46Z

This PR was meant to fix it: #27885

I was not able to complete the PR back then because of limitations in the type system. I think the work can be resumed now that the code has been greatly simplified by Andy.

At any rate the examples/tests in the PR give you an idea of what's problematic. Note that the examples/tests in the PR also happen to be currently incorrect -- they are "reasonable" (I wrote them by extending the original problem scenarios in a way that was consistent and symmetric) but then Radu and I then later discovered that postgres actually diverges from what's reasonable. I even posted on the pg mailing list to ask "wtf" and the answer was "historical reasons".

To summarize:

current crdb behavior both inconsistent and different from pg
PR sql: make NULL ordering more thoroughly consistent #27885 makes behavior consistent and regular
pg behavior actually different from sql: make NULL ordering more thoroughly consistent #27885 (albeit less inconsistent than current crdb behavior).

I'm not paged in on this today but we can discuss when I'm done with this week's other concurrent activities.

jordanlewis · 2019-09-19T13:31:54Z

Having reviewed this issue again, I'm going to rename it to be specific to tuple comparisons, which seem to still be somewhat messed up in the presence of nulls. @knz, if you disagree with this or have further context I'm missing, I'll politely request that you open a separate issue that's scoped just to the thing that's broken here besides tuple comparison (which I still can't figure out).

Postgres:

jordan=# select (1, null) is null, (1, null) is not null, (1, null) is distinct from null, (1, null) is not distinct from null;
 ?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
 f        | f        | t        | f
(1 row)

Cockroach:

[email protected]:54289/d> select (1, null) is null, (1, null) is not null, (1, null) is distinct from null, (1, null) is not distinct from null;
  ?column? | ?column? | ?column? | ?column?
+----------+----------+----------+----------+
   false   |   true   |   true   |  false
(1 row)

jordanlewis · 2020-04-02T19:57:06Z

@yuzefovich this is related to recent work you were doing I think. Can you merge into the other one if appropriate?

yuzefovich · 2020-04-10T02:43:01Z

I handed over #46675 to the optimizer team. Regarding merging this issue with that one, I'm worried that this issue is a superset of #46675, so I don't want to close this.

Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM NULL`. However, these transformations are not equivalent when `x` is a tuple. If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one element of `x` is `NULL` and one is not null, then `x IS NOT NULL` should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate to true. Therefore, they are not equivalent. Below is a table of the correct semantics for tuple expressions. | Tuple | IS NOT DISTINCT FROM NULL | IS NULL | IS DISTINCT FROM NULL | IS NOT NULL | | ------------ | ------------------------- | --------- | --------------------- | ----------- | | (1, 1) | false | false | true | true | | (1, NULL) | false | **false** | true | **false** | | (NULL, NULL) | false | true | true | false | Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of `IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not inverses given the tuple `(1, NULL)`. This commit introduces new tree expressions for `IS NULL` and `IS NOT NULL`. These operators have evaluation logic that is different from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. This commit also introduces new optimizer expression types, `IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added for folding these expressions into boolean values when possible. Fixes cockroachdb#46675 Informs cockroachdb#46908 Informs cockroachdb#12022 Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT NULL` operators with tuples, correctly differentiating them from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.

48299: sql: fix tuple IS NULL logic r=mgartner a=mgartner Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM NULL`. However, these transformations are not equivalent when `x` is a tuple. If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one element of `x` is `NULL` and one is not null, then `x IS NOT NULL` should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate to true. Therefore, they are not equivalent. Below is a table of the correct semantics for tuple expressions. | Tuple | IS NOT DISTINCT FROM NULL | IS NULL | IS DISTINCT FROM NULL | IS NOT NULL | | ------------ | ------------------------- | --------- | --------------------- | ----------- | | (1, 1) | false | false | true | true | | (1, NULL) | false | **false** | true | **false** | | (NULL, NULL) | false | true | true | false | Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of `IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not inverses given the tuple `(1, NULL)`. This commit introduces new tree expressions for `IS NULL` and `IS NOT NULL`. These operators have evaluation logic that is different from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. While an expression such as `x IS NOT DISTINCT FROM NULL` is parsed as a `tree.ComparisonExpr` with a `tree.IsNotDisinctFrom` operator, execbuiler will output the simpler `tree.IsNullExpr` when the two expressions are equivalent - when x is not a tuple. This commit also introduces new optimizer expression types, `IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added for folding these expressions into boolean values when possible. Fixes #46675 Informs #46908 Informs #12022 Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT NULL` operators with tuples, correctly differentiating them from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. Co-authored-by: Marcus Gartner <[email protected]>

jordanlewis · 2022-06-11T20:24:29Z

None of the examples in this issue repro anymore, so I'm going to close this.

knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-semantics labels Dec 5, 2016

knz mentioned this issue Dec 5, 2016

sql: fix end key selection with tuples #10475

Merged

petermattis added this to the 1.0 milestone Feb 23, 2017

cuongdo assigned nvanbenschoten Apr 5, 2017

cuongdo assigned justinj and unassigned nvanbenschoten Apr 20, 2017

eisenstatdavid assigned eisenstatdavid and unassigned justinj May 1, 2017

eisenstatdavid mentioned this issue May 2, 2017

sql: construct proper index spans in the presence of tuples with NULL #15612

Merged

eisenstatdavid modified the milestones: 1.1, 1.0, Later May 3, 2017

eisenstatdavid assigned knz and unassigned eisenstatdavid Jul 27, 2017

knz removed their assignment Nov 1, 2017

knz modified the milestones: Later, 1.2 Nov 1, 2017

knz removed their assignment Aug 27, 2019

knz added the A-sql-pgcompat Semantic compatibility with PostgreSQL label Aug 27, 2019

jordanlewis changed the title ~~sql: the ordering of NULL is internally inconsistent~~ sql: tuples with NULLs don't compare sanely Sep 19, 2019

jordanlewis assigned yuzefovich Apr 2, 2020

yuzefovich mentioned this issue Apr 8, 2020

sql: wrong implementation of NULL predicate for row value expressions #46675

Closed

yuzefovich removed their assignment Apr 10, 2020

mgartner mentioned this issue May 11, 2020

sql: fix tuple IS NULL logic #48299

Merged

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

jordanlewis closed this as completed Jun 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: tuples with NULLs don't compare sanely #12022

sql: tuples with NULLs don't compare sanely #12022

knz commented Dec 5, 2016 •

edited by cockroach-jira-scripts

Loading

nvanbenschoten commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016 via email

nvanbenschoten commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016

nvanbenschoten commented Dec 6, 2016

RaduBerinde commented Dec 6, 2016

eisenstatdavid commented May 2, 2017

eisenstatdavid commented May 2, 2017 •

edited

Loading

knz commented Nov 1, 2017

RaduBerinde commented Aug 15, 2018

jordanlewis commented May 14, 2019

knz commented May 14, 2019

jordanlewis commented Sep 19, 2019

jordanlewis commented Apr 2, 2020

yuzefovich commented Apr 10, 2020

jordanlewis commented Jun 11, 2022

sql: tuples with NULLs don't compare sanely #12022

sql: tuples with NULLs don't compare sanely #12022

Comments

knz commented Dec 5, 2016 • edited by cockroach-jira-scripts Loading

nvanbenschoten commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016 via email

nvanbenschoten commented Dec 6, 2016

eisenstatdavid commented Dec 6, 2016

nvanbenschoten commented Dec 6, 2016

RaduBerinde commented Dec 6, 2016

eisenstatdavid commented May 2, 2017

eisenstatdavid commented May 2, 2017 • edited Loading

knz commented Nov 1, 2017

RaduBerinde commented Aug 15, 2018

jordanlewis commented May 14, 2019

knz commented May 14, 2019

jordanlewis commented Sep 19, 2019

jordanlewis commented Apr 2, 2020

yuzefovich commented Apr 10, 2020

jordanlewis commented Jun 11, 2022

knz commented Dec 5, 2016 •

edited by cockroach-jira-scripts

Loading

eisenstatdavid commented May 2, 2017 •

edited

Loading