You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Setup
$ export DATAFUSION_OPTIMIZER_SKIP_FAILED_RULES=false
$ echo "1,2" > test.csv
$ cd datafusion-cli
$ cargo run
❯ create external table test (a int, b int) stored as csv location 'test.csv';
Test
This query works:
❯ select * from test where a = (select max(a) from test t2);
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
These non-aggregate subqueries that return a single row do not work:
❯ select * from test where a = (select distinct a from test);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")
❯ select * from test where a = (select a from test limit 1);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must have a projection at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:185\ncaused by\nError during planning: Could not coerce into Projection! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1121")
In both cases, the subqueries do have a projection, but they are wrapped with a Distinct or Limit node.
This query fails with a different error. In this case there is a Filter wrapping the Projection.
❯ select * from test where a = (select a from test where a is not null);
Internal("Optimizer rule 'scalar_subquery_to_join' failed due to unexpected error: scalar subqueries must aggregate a value at /home/andy/git/apache/arrow-datafusion/datafusion/optimizer/src/scalar_subquery_to_join.rs:193\ncaused by\nError during planning: Could not coerce into Aggregate! at /home/andy/git/apache/arrow-datafusion/datafusion/expr/src/logical_plan/plan.rs:1368")
Describe the solution you'd like
Support these subqueries
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered:
andygrove
changed the title
Add support for non-aggregate subqueries in scalar_subquery_to_join
Add support for wider range of scalar subqueries in scalar_subquery_to_joinOct 6, 2022
This is going to be harder than I realized. We need to check that scalar subqueries return a single row at some point during execution. I am not sure that we can translate queries to joins if there is the possibility of more than one row being returned.
Postgres example:
postgres=# select * from test t1 where a = (select distinct b from test where b > 7);
a | b
---+---
(0 rows)
postgres=# select * from test t1 where a = (select distinct b from test where b > 3);
ERROR: more than one row returned by a subquery used as an expression
andygrove
changed the title
Add support for wider range of scalar subqueries in scalar_subquery_to_join
Add support for wider range of scalar subqueries
Oct 10, 2022
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Setup
Test
This query works:
These non-aggregate subqueries that return a single row do not work:
In both cases, the subqueries do have a projection, but they are wrapped with a
Distinct
orLimit
node.This query fails with a different error. In this case there is a
Filter
wrapping theProjection
.Describe the solution you'd like
Support these subqueries
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: