colexec: IsDistinctFrom optimization is not used in "self-equality" case #63792

jordanlewis · 2021-04-16T17:57:20Z

In the following example, I would expect that the vectorized engine would plan the isNullProjOp to execute the optimizer's IS DISTINCT FROM operator. But, we instead plan a defaultCmpProjOp.

[email protected]:26257/defaultdb> create table a (a int);
CREATE TABLE
[email protected]:26257/defaultdb> explain(vec) select a = a from a;
                 info
---------------------------------------
  │
  └ Node 1
    └ *colexec.orProjOp
      ├ *colfetcher.ColBatchScan
      ├ *colexecproj.defaultCmpProjOp
      │ └ *colexecbase.castOpNullAny
      │   └ *colexecbase.constNullOp
      └ *colexecbase.castOpNullAny
        └ *colexecbase.constNullOp
(9 rows)

Time: 1ms total (execution 0ms / network 0ms)

[email protected]:26257/defaultdb> explain(opt) select a = a from a;
                                   info
---------------------------------------------------------------------------
  project
   ├── scan a
   └── projections
        └── (a IS DISTINCT FROM CAST(NULL AS INT8)) OR CAST(NULL AS BOOL)
(4 rows)

Time: 0ms total (execution 0ms / network 0ms)

[email protected]:26257/defaultdb>

The reason for this issue is that vectorized only permits the optimization for IS DISTINCT FROM when the right argument is a constant tree.DNull, but the optimizer hands down a Cast(tree.DNull, <type>). However, the following simplistic patch does not correct the problem because this code path is only used when the right side is a constant tree.Datum.

I'm not sure the most elegant way to correct this problem so I'm filing for discussion. @yuzefovich what do you think? This isn't too high priority but I'm curious.

git diff pkg/sql/colexec/colbuilder/execplan.go
diff --git a/pkg/sql/colexec/colbuilder/execplan.go b/pkg/sql/colexec/colbuilder/execplan.go
index 8b0e20cbc7..52434e04c7 100644
--- a/pkg/sql/colexec/colbuilder/execplan.go
+++ b/pkg/sql/colexec/colbuilder/execplan.go
@@ -2188,10 +2188,16 @@ func planProjectionExpr(
                                        allocator, typs[leftIdx], input, leftIdx, resultIdx, datumTuple, negate,
                                )
                        case tree.IsDistinctFrom, tree.IsNotDistinctFrom:
-                               if right != tree.DNull {
-                                       // Optimized IsDistinctFrom and IsNotDistinctFrom are
-                                       // supported only with NULL argument, so we fallback to the
-                                       // default comparison operator.
+                               // We support an optimized IsDistinctFrom and IsNotDistinctFrom when
+                               // the argument is NULL or the argument is CAST(NULL, anytype).
+                               if right == tree.DNull {
+                                       // Fall through.
+                               } else if cast, ok := right.(*tree.CastExpr); ok {
+                                       if cast.Expr != tree.DNull {
+                                               break
+                                       }
+                                       // Fall through.
+                               } else {
                                        break
                                }
                                // IS NULL is replaced with IS NOT DISTINCT FROM NULL, so we

The text was updated successfully, but these errors were encountered:

63238: roachtest: update libpq blocklist to ignore TestCopyInBinaryError r=rafiss a=RichardJCai roachtest: update libpq blocklist to ignore TestCopyInBinaryError TestCopyInBinary's behaviour was incorrect in the test since we were not receiving an expected error (`pq: only text format supported for COPY`). Furthermore the test would sporadically panic causing the following tests to fail. Release note: None Resolves #57855 63244: logictest: compare floating point values approximately on s390x r=ajwerner a=jonathan-albrecht-ibm ### Overview On s390x in the std math package and some c-deps, floating point calculations can produce results that differ from the values calculated on amd64. This patch adds a function to compare logictest floating point and decimal values within a small relative margin on s390x. The existing behavior on all other platforms remains the same. On s390x, there are three main reasons that floating point calculations sometimes give different results: * the go compiler generates the s390x "fused multiply and add" (FMA) instruction where possible, * the go math package uses s390x optimized versions of some functions, * some c libs eg. libgeos, libproj also have platform specific floating point calculation differences. ### Proposal The motivation for this work is so that users building CRDB on s390x do not need to diagnose tests that fail because of platform dependent floating point differences. This PR proposes one possible approach to dealing with platform dependent floating point differences. Since development, testing and CI are done on amd64 it keeps the current logic for determining float equality exactly the same. On s390x, it determines values of decimal and float column types (R and F) in query tests to be equal if they are within a tolerance. See the new pkg/testutils/floatcmp package for the implementation of the approximate equality logic and changes in logictest.go to see how it is applied to only s390x. There are probably other approaches I haven't thought of that would also work. I'd like to use this proposal to start a conversation on how all tests in CRDB that currently fail due to expected floating point differences could eventually be made to pass. Of course platforms other than s390x may also have differences but I haven't looked at any other platforms. The changes should be easily extendable to other platforms if needed. ### Future Work The changes in this PR allow the following tests to pass on s390x: * TestLogic/fakedist-disk/builtin_function/extra_float_digits_3 * TestLogic/fakedist-metadata/builtin_function/extra_float_digits_3 * TestLogic/fakedist-vec-off/builtin_function/extra_float_digits_3 * TestLogic/fakedist/builtin_function/extra_float_digits_3 * TestLogic/local-spec-planning/builtin_function/extra_float_digits_3 * TestLogic/local-vec-off/builtin_function/extra_float_digits_3 * TestLogic/local/builtin_function/extra_float_digits_3 There are about 70 more tests that currently fail due to platform floating point differences on s390x, many are tests of geospatial functions. Assuming we can come up with a good approach, I'd like to continue working on fixes to be submitted in future PRs. Release note: None 63802: colbuilder: optimize IS DISTINCT FROM NULL when null is casted r=yuzefovich a=yuzefovich We have an optimized operator for `Is{Not}DistinctFrom` operation which we can plan currently only if the right side is a constant NULL. In some cases the optimizer might create a cast expression on the right in order to propagate the type of the null, and previously we would fallback to the default comparison operator in such scenario. This is suboptimal, and this commit fixes the issue by special casing the scenario of casting NULL to some type. Fixes: #63792. Release note: None 63903: sql: mark planNodeToRowSource as streaming intelligently r=yuzefovich a=yuzefovich Previously, out of abundance of caution (and some laziness) we marked all `planNodeToRowSource` processors as of "streaming" nature. This marker influences whether we wrap it with a streaming or buffering columnarizer into the vectorized flow. However, doing so is unnecessary in most cases and kills some of the benefits of the vectorized model. The only special planNode is `hookFnNode` which must be streaming, all others are safe to have buffering around them. This commit implements that idea. This required adding another method to `Processor` interface. Release note: None Co-authored-by: richardjcai <[email protected]> Co-authored-by: Jonathan Albrecht <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 16, 2021

yuzefovich mentioned this issue Apr 16, 2021

colbuilder: optimize IS DISTINCT FROM NULL when null is casted #63802

Merged

yuzefovich self-assigned this Apr 16, 2021

craig bot closed this as completed in b32bbb5 Apr 21, 2021

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: IsDistinctFrom optimization is not used in "self-equality" case #63792

colexec: IsDistinctFrom optimization is not used in "self-equality" case #63792

jordanlewis commented Apr 16, 2021

colexec: IsDistinctFrom optimization is not used in "self-equality" case #63792

colexec: IsDistinctFrom optimization is not used in "self-equality" case #63792

Comments

jordanlewis commented Apr 16, 2021