sql: wrong implementation of NULL predicate for row value expressions #46675

lukaseder · 2020-03-27T15:11:55Z

Describe the problem

It appears that the NULL predicate is not implemented correctly (or not at all?) for row value expressions.

To Reproduce

Run this query:

select a, b, 
  (a, b) is null, 
  (a, b) is not null,
  not ((a, b) is null),
  not ((a, b) is not null)
from (
  values (1, 1), (1, null), (null, 1), (null, null)
) as t (a, b)

And get the following, wrong result:

a|b|?column?|?column?|?column?|?column?|
-|-|--------|--------|--------|--------|
1|1|false   |true    |true    |false   |
1| |false   |true    |true    |false   |
 |1|false   |true    |true    |false   |
 | |false   |true    |true    |false   |

Expected behavior

It should either work as in the SQL standard (ISO/IEC 9075-2:2016(E)
8.8 <null predicate>) / PostgreSQL:

a|b|?column?|?column?|?column?|?column?|
-|-|--------|--------|--------|--------|
1|1|false   |true    |true    |false   |
1| |false   |false   |true    |true    |
 |1|false   |false   |true    |true    |
 | |true    |false   |false   |true    |

Or raise an error

Environment:

CockroachDB version: CockroachDB CCL v19.2.5 (x86_64-unknown-linux-gnu, built 2020/03/16 18:27:12, go1.12.12)
Server OS: See above (using docker)
Client app: JDBC

The text was updated successfully, but these errors were encountered:

jordanlewis · 2020-03-27T15:47:52Z

Thanks @lukaseder for the report - definitely a bug here.

yuzefovich · 2020-04-02T01:40:33Z

As it turns out, this seems not that easy to fix. The root of the problem, AFAIU, is that IS NULL behaves differently from IS NOT DISTINCT FROM NULL when the argument is a tuple postgres docs:

Note: If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while IS NOT
NULL is true when the row expression itself is non-null and all the row's
fields are non-null.

However, in our comparison resolution we transform IS into IS NOT DISTINCT FROM and IS NOT into IS DISTINCT FROM which is incorrect for this particular case.

lukaseder · 2020-04-02T08:11:24Z

Yes, it behaves differently. As per ISO/IEC 9075-2:2016(E)
8.8 <null predicate>

yuzefovich · 2020-04-08T15:36:08Z

By the way, we have an old issue (#12022) with the last known reproduction being exactly this bug. I have a feeling that that issue describes a bigger set of bugs, so I'm not closing that one as duplicate, but it'd be great to see whether that issue is, in fact, a superset of this issue.

Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM NULL`. However, these transformations are not equivalent when `x` is a tuple. If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one element of `x` is `NULL` and one is not null, then `x IS NOT NULL` should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate to true. Therefore, they are not equivalent. Below is a table of the correct semantics for tuple expressions. | Tuple | IS NOT DISTINCT FROM NULL | IS NULL | IS DISTINCT FROM NULL | IS NOT NULL | | ------------ | ------------------------- | --------- | --------------------- | ----------- | | (1, 1) | false | false | true | true | | (1, NULL) | false | **false** | true | **false** | | (NULL, NULL) | false | true | true | false | Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of `IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not inverses given the tuple `(1, NULL)`. This commit introduces new tree expressions for `IS NULL` and `IS NOT NULL`. These operators have evaluation logic that is different from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. This commit also introduces new optimizer expression types, `IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added for folding these expressions into boolean values when possible. Fixes cockroachdb#46675 Informs cockroachdb#46908 Informs cockroachdb#12022 Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT NULL` operators with tuples, correctly differentiating them from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.

Prior to this commit, when hoisting Any expressions like `<left> = ANY (SELECT <right> ...)`, we constructed `(IsNot <left|right> Null)` expressions which are equivalent to `<left|right> IS DISTINCT FROM NULL`. As discovered in cockroachdb#46675, these expressions have different behavior than `<left> IS NOT NULL` when `<left>` is a tuple. As a result, the hoisting transformations could construct invalid plans that cause incorrect results. This commit fixes the issue by using `IsTupleNotNull` expressions when `<left>` and `<right> are tupleq. Fixes cockroachdb#98691 Release note (bug fix): A bug has been fixes that caused incorrect results of ANY comparisons of tuples. For example, an expression like `(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true` instead of the correct result of `NULL` when `x` and `y` were `NULL`, or `a` and `b` were `NULL`. This could only occur if the subquery was correlated, i.e., it references columns from the outer part of the query. This bug was present since the cost-based optimizer was introduced in version 2.1.

98517: multitenant: add AdminMerge capability r=knz a=ecwall Fixes #95138 AdminMerge is currently only called by the system tenant even though it is named similarly to other Admin* functions so it does not need its own capability for now. This changes its required capability from noCapCheckNeeded to onlySystemTenant to prevent secondary tenants from calling it. Release note: None 98580: rpc: bump threshold for latency jump reporting r=erikgrinaker a=tbg For months I've seen this misfire in nearly every single log line I've looked at, and I've had to grep it out in many L2 incidents. Maybe it works better when we suppress it for latencies <=50ms. Touches #96262. Fixes #98066. Epic: none Release note: None 98688: colexecbase: fix a recently introduced bug with identity cast r=yuzefovich a=yuzefovich This commit fixes a recently introduced bug that can occur when we're randomizing `coldata.BatchSize()` (which we do in tests). In particular, we capped a global singleton at one batch size value, but later we can change it to a higher value, which could lead to index out of bounds. This is now fixed by always using the max batch size. Fixes: #98660. Release note: None 98700: opt: fix hoist of ANY comparison with tuples r=mgartner a=mgartner #### opt: fix hoist of ANY comparison with tuples Prior to this commit, when hoisting Any expressions like `<left> = ANY (SELECT <right> ...)`, we constructed `(IsNot <left|right> Null)` expressions which are equivalent to `<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these expressions have different behavior than `<left> IS NOT NULL` when `<left>` is a tuple. As a result, the hoisting transformations could construct invalid plans that cause incorrect results. This commit fixes the issue by using `IsTupleNotNull` expressions when `<left>` and `<right> are tupleq. Fixes #98691 Release note (bug fix): A bug has been fixes that caused incorrect results of ANY comparisons of tuples. For example, an expression like `(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true` instead of the correct result of `NULL` when `x` and `y` were `NULL`, or `a` and `b` were `NULL`. This could only occur if the subquery was correlated, i.e., it references columns from the outer part of the query. This bug was present since the cost-based optimizer was introduced in version 2.1. Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>

Prior to this commit, when hoisting Any expressions like `<left> = ANY (SELECT <right> ...)`, we constructed `(IsNot <left|right> Null)` expressions which are equivalent to `<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these expressions have different behavior than `<left> IS NOT NULL` when `<left>` is a tuple. As a result, the hoisting transformations could construct invalid plans that cause incorrect results. This commit fixes the issue by using `IsTupleNotNull` expressions when `<left>` and `<right> are tupleq. Fixes #98691 Release note (bug fix): A bug has been fixes that caused incorrect results of ANY comparisons of tuples. For example, an expression like `(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true` instead of the correct result of `NULL` when `x` and `y` were `NULL`, or `a` and `b` were `NULL`. This could only occur if the subquery was correlated, i.e., it references columns from the outer part of the query. This bug was present since the cost-based optimizer was introduced in version 2.1.

Prior to this commit, when hoisting Any expressions like `<left> = ANY (SELECT <right> ...)`, we constructed `(IsNot <left|right> Null)` expressions which are equivalent to `<left|right> IS DISTINCT FROM NULL`. As discovered in cockroachdb#46675, these expressions have different behavior than `<left> IS NOT NULL` when `<left>` is a tuple. As a result, the hoisting transformations could construct invalid plans that cause incorrect results. This commit fixes the issue by using `IsTupleNotNull` expressions when `<left>` and `<right> are tupleq. Fixes cockroachdb#98691 Release note (bug fix): A bug has been fixes that caused incorrect results of ANY comparisons of tuples. For example, an expression like `(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true` instead of the correct result of `NULL` when `x` and `y` were `NULL`, or `a` and `b` were `NULL`. This could only occur if the subquery was correlated, i.e., it references columns from the outer part of the query. This bug was present since the cost-based optimizer was introduced in version 2.1. Release note (<category, see below>): <what> <show> <why>

Prior to this commit, when hoisting Any expressions like `<left> = ANY (SELECT <right> ...)`, we constructed `(IsNot <left|right> Null)` expressions which are equivalent to `<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these expressions have different behavior than `<left> IS NOT NULL` when `<left>` is a tuple. As a result, the hoisting transformations could construct invalid plans that cause incorrect results. This commit fixes the issue by using `IsTupleNotNull` expressions when `<left>` and `<right> are tupleq. Fixes #98691 Release note (bug fix): A bug has been fixes that caused incorrect results of ANY comparisons of tuples. For example, an expression like `(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true` instead of the correct result of `NULL` when `x` and `y` were `NULL`, or `a` and `b` were `NULL`. This could only occur if the subquery was correlated, i.e., it references columns from the outer part of the query. This bug was present since the cost-based optimizer was introduced in version 2.1.

lukaseder mentioned this issue Mar 27, 2020

Work around CockroachDB's wrong implementation of <row> IS NULL predicate jOOQ/jOOQ#9999

Closed

jordanlewis changed the title ~~Wrong implementation of NULL predicate for row value expressions~~ sql: wrong implementation of NULL predicate for row value expressions Mar 30, 2020

jordanlewis added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. labels Mar 30, 2020

yuzefovich self-assigned this Apr 1, 2020

yuzefovich mentioned this issue Apr 2, 2020

sql: fix IS NULL on tuples #46908

Closed

yuzefovich assigned mgartner and unassigned yuzefovich Apr 8, 2020

yuzefovich mentioned this issue Apr 10, 2020

sql: tuples with NULLs don't compare sanely #12022

Closed

mgartner mentioned this issue May 11, 2020

sql: fix tuple IS NULL logic #48299

Merged

craig bot closed this as completed in 08107a4 May 15, 2020

mgartner mentioned this issue Mar 15, 2023

opt: incorrect hoisting of correlated ANY expression comparing tuples #98691

Closed

mgartner mentioned this issue Mar 15, 2023

opt: fix hoist of ANY comparison with tuples #98700

Merged

mgartner mentioned this issue Mar 16, 2023

release-22.2: opt: fix hoist of ANY comparison with tuples #98769

Merged

mgartner mentioned this issue Mar 21, 2023

release-21.2: opt: fix hoist of ANY comparison with tuples #99156

Closed

mgartner mentioned this issue Mar 21, 2023

release-22.1: opt: fix hoist of ANY comparison with tuples #99161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: wrong implementation of NULL predicate for row value expressions #46675

sql: wrong implementation of NULL predicate for row value expressions #46675

lukaseder commented Mar 27, 2020

jordanlewis commented Mar 27, 2020

yuzefovich commented Apr 2, 2020

lukaseder commented Apr 2, 2020

yuzefovich commented Apr 8, 2020

sql: wrong implementation of NULL predicate for row value expressions #46675

sql: wrong implementation of NULL predicate for row value expressions #46675

Comments

lukaseder commented Mar 27, 2020

jordanlewis commented Mar 27, 2020

yuzefovich commented Apr 2, 2020

lukaseder commented Apr 2, 2020

yuzefovich commented Apr 8, 2020