Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: wrong implementation of NULL predicate for row value expressions #46675

Closed
lukaseder opened this issue Mar 27, 2020 · 4 comments · Fixed by #48299
Closed

sql: wrong implementation of NULL predicate for row value expressions #46675

lukaseder opened this issue Mar 27, 2020 · 4 comments · Fixed by #48299
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.

Comments

@lukaseder
Copy link

Describe the problem

It appears that the NULL predicate is not implemented correctly (or not at all?) for row value expressions.

To Reproduce

Run this query:

select a, b, 
  (a, b) is null, 
  (a, b) is not null,
  not ((a, b) is null),
  not ((a, b) is not null)
from (
  values (1, 1), (1, null), (null, 1), (null, null)
) as t (a, b)

And get the following, wrong result:

a|b|?column?|?column?|?column?|?column?|
-|-|--------|--------|--------|--------|
1|1|false   |true    |true    |false   |
1| |false   |true    |true    |false   |
 |1|false   |true    |true    |false   |
 | |false   |true    |true    |false   |

Expected behavior

It should either work as in the SQL standard (ISO/IEC 9075-2:2016(E)
8.8 <null predicate>) / PostgreSQL:

a|b|?column?|?column?|?column?|?column?|
-|-|--------|--------|--------|--------|
1|1|false   |true    |true    |false   |
1| |false   |false   |true    |true    |
 |1|false   |false   |true    |true    |
 | |true    |false   |false   |true    |

Or raise an error

Environment:

  • CockroachDB version: CockroachDB CCL v19.2.5 (x86_64-unknown-linux-gnu, built 2020/03/16 18:27:12, go1.12.12)
  • Server OS: See above (using docker)
  • Client app: JDBC
@jordanlewis
Copy link
Member

Thanks @lukaseder for the report - definitely a bug here.

@jordanlewis jordanlewis changed the title Wrong implementation of NULL predicate for row value expressions sql: wrong implementation of NULL predicate for row value expressions Mar 30, 2020
@jordanlewis jordanlewis added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. labels Mar 30, 2020
@yuzefovich yuzefovich self-assigned this Apr 1, 2020
@yuzefovich
Copy link
Member

As it turns out, this seems not that easy to fix. The root of the problem, AFAIU, is that IS NULL behaves differently from IS NOT DISTINCT FROM NULL when the argument is a tuple postgres docs:

Note: If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while IS NOT
NULL is true when the row expression itself is non-null and all the row's
fields are non-null.

However, in our comparison resolution we transform IS into IS NOT DISTINCT FROM and IS NOT into IS DISTINCT FROM which is incorrect for this particular case.

@lukaseder
Copy link
Author

Yes, it behaves differently. As per ISO/IEC 9075-2:2016(E)
8.8 <null predicate>

image

@yuzefovich yuzefovich assigned mgartner and unassigned yuzefovich Apr 8, 2020
@yuzefovich
Copy link
Member

By the way, we have an old issue (#12022) with the last known reproduction being exactly this bug. I have a feeling that that issue describes a bigger set of bugs, so I'm not closing that one as duplicate, but it'd be great to see whether that issue is, in fact, a superset of this issue.

mgartner added a commit to mgartner/cockroach that referenced this issue May 11, 2020
Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT
FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM
NULL`. However, these transformations are not equivalent when `x` is a
tuple.

If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to
true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one
element of `x` is `NULL` and one is not null, then `x IS NOT NULL`
should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate
to true. Therefore, they are not equivalent.

Below is a table of the correct semantics for tuple expressions.

| Tuple        | IS NOT DISTINCT FROM NULL | IS NULL   | IS DISTINCT FROM NULL | IS NOT NULL |
| ------------ | ------------------------- | --------- | --------------------- | ----------- |
| (1, 1)       | false                     | false     | true                  | true        |
| (1, NULL)    | false                     | **false** | true                  | **false**   |
| (NULL, NULL) | false                     | true      | true                  | false       |

Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of
`IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not
inverses given the tuple `(1, NULL)`.

This commit introduces new tree expressions for `IS NULL` and `IS NOT
NULL`. These operators have evaluation logic that is different from `IS
NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.

This commit also introduces new optimizer expression types,
`IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added
for folding these expressions into boolean values when possible.

Fixes   cockroachdb#46675
Informs cockroachdb#46908
Informs cockroachdb#12022

Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT
NULL` operators with tuples, correctly differentiating them from `IS NOT
DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.
@craig craig bot closed this as completed in 08107a4 May 15, 2020
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 15, 2023
Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in cockroachdb#46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes cockroachdb#98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.
craig bot pushed a commit that referenced this issue Mar 15, 2023
98517: multitenant: add AdminMerge capability r=knz a=ecwall

Fixes #95138

AdminMerge is currently only called by the system tenant even though it is
named similarly to other Admin* functions so it does not need its own
capability for now.

This changes its required capability from noCapCheckNeeded to onlySystemTenant
to prevent secondary tenants from calling it.

Release note: None

98580: rpc: bump threshold for latency jump reporting r=erikgrinaker a=tbg

For months I've seen this misfire in nearly every single log line I've
looked at, and I've had to grep it out in many L2 incidents.
Maybe it works better when we suppress it for latencies <=50ms.

Touches #96262.
Fixes #98066.

Epic: none
Release note: None


98688: colexecbase: fix a recently introduced bug with identity cast r=yuzefovich a=yuzefovich

This commit fixes a recently introduced bug that can occur when we're randomizing `coldata.BatchSize()` (which we do in tests). In particular, we capped a global singleton at one batch size value, but later we can change it to a higher value, which could lead to index out of bounds. This is now fixed by always using the max batch size.

Fixes: #98660.

Release note: None

98700: opt: fix hoist of ANY comparison with tuples r=mgartner a=mgartner

#### opt: fix hoist of ANY comparison with tuples

Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes #98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.


Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
mgartner added a commit that referenced this issue Mar 17, 2023
Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes #98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 21, 2023
Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in cockroachdb#46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes cockroachdb#98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.

Release note (<category, see below>): <what> <show> <why>
mgartner added a commit that referenced this issue Mar 22, 2023
Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes #98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants