-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix IS NULL on tuples #46908
sql: fix IS NULL on tuples #46908
Conversation
As it turns out, `IS NULL` behaves differently from `IS NOT DISTINCT FROM NULL` when the argument is a tuple (https://www.postgresql.org/docs/8.3/functions-comparison.html): ``` Note: If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. ``` However, in our comparison resolution we transform `IS` into `IS NOT DISTINCT FROM` and `IS NOT` into `IS DISTINCT FROM` which is incorrect for tuples. This is now fixed by introducing separate `IsNullOp` which reuses all overloads from `IsNotDistinctFromOp` while introducing its own overload implementation only for tuples. Release note (bug fix): CockroachDB previously was incorrectly evaluating `IS NULL` and `IS NOT NULL` on tuples which has been fixed.
This is WIP since I need some help with figuring out whether this approach makes sense and what to do in the optimizer land. |
... wow Can you add an example in the description so it's clear what the distinction is? I think Is there a distinction only for tuples? If yes, I would introduce tuple-specific variants in the optimizer, it will be much easier to reason about them. |
Updated the PR description to include the differences. Based on the docs, I believe that this distinction exists only for tuples (if I understand correctly, our "tuples" = PSQL's "row-valued expressions"). Could you give more guidance on how to solve it in the optimizer? Or maybe this issue could be a good small project for Marcus? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I think this can get a little tricky, perhaps the best thing to do is for us to take over the PR and try to figure out the optimizer side.
CC @mgartner if this is something you are interested in tackling, let me know and we can discuss directly. I think it would be a good opportunity to expose yourself to the sem/tree
and parser
stuff, as well as get a taste of some of the nightmarish aspects of NULL semantics in SQL :)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan, @RaduBerinde, and @yuzefovich)
pkg/sql/opt/operator.go, line 135 at r1 (raw file):
RegIMatchOp: tree.RegIMatch, NotRegIMatchOp: tree.NotRegIMatch, IsNullOp: tree.IsNull,
I like that we specialized these to only apply to the "is null" case (as everything else can be covered by the existing IsOp/IsNotOp
. But it should no longer be a binary operator with a right-hand side.
pkg/sql/opt/ops/scalar.opt, line 414 at r1 (raw file):
[Scalar, Bool, Comparison] define IsNull {
Even though there is overlap between the two operators in SQL, I don't think we should do the same in the optimizer - this can easily become very confusing. We should try to add only specialized operators that work on a tuple. IsTupleNull
IsTupleNotNull
. We can make the optbuilder only create these when it knows the left-hand-side is a tuple.
pkg/sql/opt/ops/scalar.opt, line 426 at r1 (raw file):
[Scalar, Bool, Comparison] define Is {
[nit] please add some comments to Is and IsNot. In particular mention that IsNot is always the inverse of Is, and that they map to IS NOT DISTINCT FROM
and IS DISTINCT FROM
SQL operators (which is the same with the IS / IS NOT
operator for non-tuple types).
Ok, I'll leave this PR up for you guys to take over and will assign the issue to Marcus. |
I'll close the PR but will keep the branch since it contains possibly useful test cases. |
Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM NULL`. However, these transformations are not equivalent when `x` is a tuple. If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one element of `x` is `NULL` and one is not null, then `x IS NOT NULL` should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate to true. Therefore, they are not equivalent. Below is a table of the correct semantics for tuple expressions. | Tuple | IS NOT DISTINCT FROM NULL | IS NULL | IS DISTINCT FROM NULL | IS NOT NULL | | ------------ | ------------------------- | --------- | --------------------- | ----------- | | (1, 1) | false | false | true | true | | (1, NULL) | false | **false** | true | **false** | | (NULL, NULL) | false | true | true | false | Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of `IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not inverses given the tuple `(1, NULL)`. This commit introduces new tree expressions for `IS NULL` and `IS NOT NULL`. These operators have evaluation logic that is different from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. This commit also introduces new optimizer expression types, `IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added for folding these expressions into boolean values when possible. Fixes cockroachdb#46675 Informs cockroachdb#46908 Informs cockroachdb#12022 Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT NULL` operators with tuples, correctly differentiating them from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively.
48299: sql: fix tuple IS NULL logic r=mgartner a=mgartner Previously, we treated all cases of `x IS NULL` as `x IS NOT DISTINCT FROM NULL`, and all cases of `x IS NOT NULL` as `x IS DISTINCT FROM NULL`. However, these transformations are not equivalent when `x` is a tuple. If all elements of `x` are `NULL`, then `x IS NULL` should evaluate to true, but `x IS DISTINCT FROM NULL` should evaluate to false. If one element of `x` is `NULL` and one is not null, then `x IS NOT NULL` should evaluate to false, but `x IS DISTINCT FROM NULL` should evaluate to true. Therefore, they are not equivalent. Below is a table of the correct semantics for tuple expressions. | Tuple | IS NOT DISTINCT FROM NULL | IS NULL | IS DISTINCT FROM NULL | IS NOT NULL | | ------------ | ------------------------- | --------- | --------------------- | ----------- | | (1, 1) | false | false | true | true | | (1, NULL) | false | **false** | true | **false** | | (NULL, NULL) | false | true | true | false | Notice that `IS NOT DISTINCT FROM NULL` is always the inverse of `IS DISTINCT FROM NULL`. However, `IS NULL` and `IS NOT NULL` are not inverses given the tuple `(1, NULL)`. This commit introduces new tree expressions for `IS NULL` and `IS NOT NULL`. These operators have evaluation logic that is different from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. While an expression such as `x IS NOT DISTINCT FROM NULL` is parsed as a `tree.ComparisonExpr` with a `tree.IsNotDisinctFrom` operator, execbuiler will output the simpler `tree.IsNullExpr` when the two expressions are equivalent - when x is not a tuple. This commit also introduces new optimizer expression types, `IsTupleNull` and `IsTupleNotNull`. Normalization rules have been added for folding these expressions into boolean values when possible. Fixes #46675 Informs #46908 Informs #12022 Release note (bug fix): Fixes incorrect logic for `IS NULL` and `IS NOT NULL` operators with tuples, correctly differentiating them from `IS NOT DISTINCT FROM NULL` and `IS DISTINCT FROM NULL`, respectively. Co-authored-by: Marcus Gartner <[email protected]>
As it turns out,
IS NULL
behaves differently fromIS NOT DISTINCT FROM NULL
when the argument is a tuple(https://www.postgresql.org/docs/8.3/functions-comparison.html):
However, in our comparison resolution we transform
IS
intoIS NOT DISTINCT FROM
and
IS NOT
intoIS DISTINCT FROM
which is incorrectfor tuples. This is now fixed by introducing separate
IsNullOp
whichreuses all overloads from
IsNotDistinctFromOp
while introducing itsown overload implementation only for tuples.
Here are the distinctions (note that
IS DISTINCT FROM
alwaysequals
!(IS NOT DISTINCT FROM)
, however,IS NOT NULL
is notalways the same as
!(IS NULL)
):Fixes: #46675.
Release note (bug fix): CockroachDB previously was incorrectly
evaluating
IS NULL
andIS NOT NULL
on tuples which has been fixed.