Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: rewrite const OP ANY(subquery) as EXISTS(subquery WHERE ... OP const) #37241

Closed
justinj opened this issue May 1, 2019 · 6 comments
Closed
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@justinj
Copy link
Contributor

justinj commented May 1, 2019

These two queries are equivalent:

[email protected]:58515/defaultdb> explain select exists(select * from y where b = 3);
       tree      |    field     |             description
+----------------+--------------+--------------------------------------+
  root           |              |
   ├── values    |              |
   │             | size         | 1 column, 1 row
   └── subquery  |              |
        │        | id           | @S1
        │        | original sql | EXISTS (SELECT * FROM y WHERE b = 3)
        │        | exec mode    | exists
        └── scan |              |
                 | table        | y@primary
                 | spans        | /3-/3/#
(10 rows)

Time: 1.24ms

[email protected]:58515/defaultdb> explain (opt) select 3 = any(select b from y);
           text
+-------------------------+
  values
   └── tuple
        └── any: eq
             ├── scan y
             └── const: 3
(5 rows)

Time: 597µs

However, the first is significantly better since it can generate spans.

It might be the case that it's better to address this by instead hoisting the subquery, which would be a more general solution, but wouldn't generate spans at plan time which might be valuable (and is also blocked in the common case by #37240).

Jira issue: CRDB-4443

@justinj justinj added the A-sql-optimizer SQL logical planning and optimizations. label May 1, 2019
@justinj
Copy link
Contributor Author

justinj commented May 1, 2019

Perhaps the cleaner way to do this would be to hoist the subquery and then add a rule that simplifies that kind of join with a VALUES clause into a Select,

@jordanlewis jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label May 2, 2019
@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Jun 5, 2021

still exists in 21.1

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@mgartner
Copy link
Collaborator

mgartner commented Mar 8, 2023

Unfortunately, i = ANY (SELECT a FROM t) is not equivalent to EXISTS (SELECT * FROM t WHERE a = i in all cases. ANY has some unexpected behavior in the presence of NULL either on the LHS of the comparison, or within the results on the RHS:

The Postgres docs explain this:

... The result is “false” if no true result is found (including the case where the array has zero elements)...

... If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.

Here's how this plays out in practice:

CREATE TABLE t_without_null (a INT);
INSERT INTO t_without_null VALUES (1);

CREATE TABLE t_with_null (a INT);
INSERT INTO t_with_null VALUES (NULL), (1);

CREATE TABLE t_empty (a INT);

SELECT
  i,
  i = ANY (SELECT a FROM t_without_null) AS "any - t_without_null",
  EXISTS (SELECT a FROM t_without_null WHERE a = i) AS "exists - t_without_null",
  i = ANY (SELECT a FROM t_with_null) AS "any - t_with_null",
  EXISTS (SELECT a FROM t_with_null WHERE a = i) AS "exists - t_with_null",
  i = ANY (SELECT a FROM t_empty) AS "any - t_empty",
  EXISTS (SELECT a FROM t_empty WHERE a = i) AS "exists - t_empty"
FROM (VALUES (1), (2), (NULL)) v(i);

  i   | any - t_without_null | exists - t_without_null | any - t_with_null | exists - t_with_null | any - t_empty | exists - t_empty
------+----------------------+-------------------------+-------------------+----------------------+---------------+------------------
    1 | t                    | t                       | t                 | t                    | f             | f
    2 | f                    | f                       | NULL              | f                    | f             | f
 NULL | NULL                 | f                       | NULL              | f                    | f             | f
(3 rows)

I'm motivated to make a rewrite like this work - it'll vastly simplify the optimizer and execution engine if we can get rid of bespoke support for ANY entirely. But it's proving to be difficult.

One of the most difficult hurdles to overcome is supporting this bizarre behavior (which I've extracted from the results above):

CREATE TABLE t_empty (a INT);

SELECT NULL = ANY (SELECT * FROM t_empty);

 ?column?
---------
 f
(1 row)

If the contents of the ANY are empty, the result is false even if the LHS of the comparison is NULL.

@mgartner
Copy link
Collaborator

mgartner commented Mar 8, 2023

I think the following is a valid rewrite:

i = ANY (SELECT a FROM t)
=>
(SELECT count(*) > 0 AND
  (bool_or(a = i)  OR (
    CASE
    WHEN bool_or(a IS NULL) THEN NULL
    ELSE (CASE WHEN i IS NULL THEN NULL ELSE false END)
    END
  ))
FROM t)

Here's some tests showing its equivalence (in Postgres): https://gist.github.com/mgartner/9dac0b802c087329a8e2ee4ff0204bbd

@mgartner
Copy link
Collaborator

I used this transformation in #98375. I don't think we should use it yet universally, though, because it can lead to plans with slow apply-joins.

Also, it doesn't provide the benefit originally described in this issue - it won't allow constrained scans to be performed. If columns can be prove to be non-NULL then a simpler transformation may be possible. For now I'm going to close this issue.

@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

5 participants