Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for wider range of scalar subqueries #3725

Open
Tracked by #822
andygrove opened this issue Oct 5, 2022 · 1 comment
Open
Tracked by #822

Add support for wider range of scalar subqueries #3725

andygrove opened this issue Oct 5, 2022 · 1 comment
Labels
enhancement New feature or request optimizer Optimizer rules

Comments

@andygrove
Copy link
Member

andygrove commented Oct 5, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Setup

$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ echo "1,2" > test.csv
$ cd datafusion-cli
$ cargo run
❯ create external table test (a int, b int) stored as csv location 'test.csv';

Test

This query works:

❯ select * from test where a = (select max(a) from test t2);
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+

These non-aggregate subqueries that return a single row do not work:

❯ select * from test where a = (select distinct a from test);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")

❯ select * from test where a = (select a from test limit 1);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")

In both cases, the subqueries do have a projection, but they are wrapped with a Distinct or Limit node.

This query fails with a different error. In this case there is a Filter wrapping the Projection.

❯ select * from test where a = (select a from test where a is not null);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must aggregate a value at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:193\ncaused by\nError during planning: Could not coerce into Aggregate! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1368")

Describe the solution you'd like
Support these subqueries

Describe alternatives you've considered
None

Additional context
None

@andygrove andygrove added enhancement New feature or request optimizer Optimizer rules labels Oct 5, 2022
@andygrove andygrove changed the title Add support for non-aggregate subqueries in scalar_subquery_to_join Add support for wider range of scalar subqueries in scalar_subquery_to_join Oct 6, 2022
@andygrove
Copy link
Member Author

This is going to be harder than I realized. We need to check that scalar subqueries return a single row at some point during execution. I am not sure that we can translate queries to joins if there is the possibility of more than one row being returned.

Postgres example:

postgres=# select * from test t1 where a = (select distinct b from test where b > 7);
 a | b 
---+---
(0 rows)

postgres=# select * from test t1 where a = (select distinct b from test where b > 3);
ERROR:  more than one row returned by a subquery used as an expression

@andygrove andygrove changed the title Add support for wider range of scalar subqueries in scalar_subquery_to_join Add support for wider range of scalar subqueries Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimizer Optimizer rules
Projects
None yet
Development

No branches or pull requests

1 participant