Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix IS NULL on tuples #46908

Closed
wants to merge 1 commit into from

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Apr 2, 2020

As it turns out, IS NULL behaves differently from
IS NOT DISTINCT FROM NULL when the argument is a tuple
(https://www.postgresql.org/docs/8.3/functions-comparison.html):

Note: If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while IS
NOT NULL is true when the row expression itself is non-null and all the
row's fields are non-null.

However, in our comparison resolution we transform IS into IS NOT DISTINCT FROM
and IS NOT into IS DISTINCT FROM which is incorrect
for tuples. This is now fixed by introducing separate IsNullOp which
reuses all overloads from IsNotDistinctFromOp while introducing its
own overload implementation only for tuples.

Here are the distinctions (note that IS DISTINCT FROM always
equals !(IS NOT DISTINCT FROM), however, IS NOT NULL is not
always the same as !(IS NULL)):

Tuple IS NOT DISTINCT FROM IS NULL IS DISTINCT FROM IS NOT NULL
(1, 1) false false true true
(1, NULL) false false true false
(NULL, NULL) false true true false

Fixes: #46675.

Release note (bug fix): CockroachDB previously was incorrectly
evaluating IS NULL and IS NOT NULL on tuples which has been fixed.

As it turns out, `IS NULL` behaves differently from `IS NOT DISTINCT FROM
NULL` when the argument is a tuple
(https://www.postgresql.org/docs/8.3/functions-comparison.html):
```
Note: If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null, while IS
NOT NULL is true when the row expression itself is non-null and all the
row's fields are non-null.
```
However, in our comparison resolution we transform `IS` into `IS NOT
DISTINCT FROM` and `IS NOT` into `IS DISTINCT FROM` which is incorrect
for tuples. This is now fixed by introducing separate `IsNullOp` which
reuses all overloads from `IsNotDistinctFromOp` while introducing its
own overload implementation only for tuples.

Release note (bug fix): CockroachDB previously was incorrectly
evaluating `IS NULL` and `IS NOT NULL` on tuples which has been fixed.
@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Apr 2, 2020
@yuzefovich yuzefovich requested review from otan and RaduBerinde April 2, 2020 01:42
@yuzefovich yuzefovich requested a review from a team as a code owner April 2, 2020 01:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

This is WIP since I need some help with figuring out whether this approach makes sense and what to do in the optimizer land.

@RaduBerinde
Copy link
Member

... wow

Can you add an example in the description so it's clear what the distinction is? I think (1, NULL) IS NOT NULL = false while (1, NULL) IS DISTINCT FROM NULL = true.
Another thing here is that x IS NOT NULL is different than NOT (x IS NULL). In this case both IS NULL and IS NOT NULL return false.

Is there a distinction only for tuples? If yes, I would introduce tuple-specific variants in the optimizer, it will be much easier to reason about them.

@yuzefovich
Copy link
Member Author

Updated the PR description to include the differences. Based on the docs, I believe that this distinction exists only for tuples (if I understand correctly, our "tuples" = PSQL's "row-valued expressions").

Could you give more guidance on how to solve it in the optimizer? Or maybe this issue could be a good small project for Marcus?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think this can get a little tricky, perhaps the best thing to do is for us to take over the PR and try to figure out the optimizer side.

CC @mgartner if this is something you are interested in tackling, let me know and we can discuss directly. I think it would be a good opportunity to expose yourself to the sem/tree and parser stuff, as well as get a taste of some of the nightmarish aspects of NULL semantics in SQL :)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @RaduBerinde, and @yuzefovich)


pkg/sql/opt/operator.go, line 135 at r1 (raw file):

	RegIMatchOp:      tree.RegIMatch,
	NotRegIMatchOp:   tree.NotRegIMatch,
	IsNullOp:         tree.IsNull,

I like that we specialized these to only apply to the "is null" case (as everything else can be covered by the existing IsOp/IsNotOp. But it should no longer be a binary operator with a right-hand side.


pkg/sql/opt/ops/scalar.opt, line 414 at r1 (raw file):


[Scalar, Bool, Comparison]
define IsNull {

Even though there is overlap between the two operators in SQL, I don't think we should do the same in the optimizer - this can easily become very confusing. We should try to add only specialized operators that work on a tuple. IsTupleNull IsTupleNotNull. We can make the optbuilder only create these when it knows the left-hand-side is a tuple.


pkg/sql/opt/ops/scalar.opt, line 426 at r1 (raw file):


[Scalar, Bool, Comparison]
define Is {

[nit] please add some comments to Is and IsNot. In particular mention that IsNot is always the inverse of Is, and that they map to IS NOT DISTINCT FROM and IS DISTINCT FROM SQL operators (which is the same with the IS / IS NOT operator for non-tuple types).

@yuzefovich
Copy link
Member Author

Ok, I'll leave this PR up for you guys to take over and will assign the issue to Marcus.

@yuzefovich
Copy link
Member Author

I'll close the PR but will keep the branch since it contains possibly useful test cases.

@yuzefovich yuzefovich closed this Apr 14, 2020
mgartner added a commit to mgartner/cockroach that referenced this pull request May 11, 2020
Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT
FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM
NULL`. However, these transformations are not equivalent when `x` is a
tuple.

If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to
true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one
element of `x` is `NULL` and one is not null, then `x IS NOT NULL`
should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate
to true. Therefore, they are not equivalent.

Below is a table of the correct semantics for tuple expressions.

| Tuple        | IS NOT DISTINCT FROM NULL | IS NULL   | IS DISTINCT FROM NULL | IS NOT NULL |
| ------------ | ------------------------- | --------- | --------------------- | ----------- |
| (1, 1)       | false                     | false     | true                  | true        |
| (1, NULL)    | false                     | **false** | true                  | **false**   |
| (NULL, NULL) | false                     | true      | true                  | false       |

Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of
`IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not
inverses given the tuple `(1, NULL)`.

This commit introduces new tree expressions for `IS NULL` and `IS NOT
NULL`. These operators have evaluation logic that is different from `IS
NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.

This commit also introduces new optimizer expression types,
`IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added
for folding these expressions into boolean values when possible.

Fixes   cockroachdb#46675
Informs cockroachdb#46908
Informs cockroachdb#12022

Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT
NULL` operators with tuples, correctly differentiating them from `IS NOT
DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.
craig bot pushed a commit that referenced this pull request May 15, 2020
48299: sql: fix tuple IS NULL logic r=mgartner a=mgartner

Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT
FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM
NULL`. However, these transformations are not equivalent when `x` is a
tuple.

If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to
true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one
element of `x` is `NULL` and one is not null, then `x IS NOT NULL`
should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate
to true. Therefore, they are not equivalent.

Below is a table of the correct semantics for tuple expressions.

| Tuple        | IS NOT DISTINCT FROM NULL | IS NULL   | IS DISTINCT FROM NULL | IS NOT NULL |
| ------------ | ------------------------- | --------- | --------------------- | ----------- |
| (1, 1)       | false                     | false     | true                  | true        |
| (1, NULL)    | false                     | **false** | true                  | **false**   |
| (NULL, NULL) | false                     | true      | true                  | false       |

Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of
`IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not
inverses given the tuple `(1, NULL)`.

This commit introduces new tree expressions for `IS NULL` and `IS NOT
NULL`. These operators have evaluation logic that is different from `IS
NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. While
an expression such as `x IS NOT DISTINCT FROM NULL` is parsed as a
`tree.ComparisonExpr` with a `tree.IsNotDisinctFrom` operator,
execbuiler will output the simpler `tree.IsNullExpr` when the two
expressions are equivalent - when x is not a tuple.

This commit also introduces new optimizer expression types,
`IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added
for folding these expressions into boolean values when possible.

Fixes   #46675
Informs #46908
Informs #12022

Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT
NULL` operators with tuples, correctly differentiating them from `IS NOT
DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.


Co-authored-by: Marcus Gartner <[email protected]>
@yuzefovich yuzefovich deleted the fix-tuple-is-null branch July 29, 2020 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge bors won't merge a PR with this label.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: wrong implementation of NULL predicate for row value expressions
3 participants