Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters/limit are not pushdown druing optimalization for table with alias #2270

Closed
mateuszkj opened this issue Apr 18, 2022 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@mateuszkj
Copy link
Contributor

mateuszkj commented Apr 18, 2022

Describe the bug
Filters are not push down through SubqueryAlias to TableScan during logical plan optimization. This can cause unnecessary IO during pruning parquet files.

To Reproduce
Steps to reproduce the behavior:

Prepare data and run datafusion-cli with logs:

echo "1,2" > data.csv
export RUST_LOG=info,datafusion=debug
datafusion-cli

Run query without alias (partial_filters is added for TableScan):

SELECT b FROM foo WHERE a=1;
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #foo.b
      Filter: #foo.a = Int64(1)
        TableScan: foo projection=None
    
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #foo.b
      Filter: #foo.a = Int64(1)
        TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]

Run query with alias (partial_filters is not added for TableScan)

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1])

Expected behavior
partial_filers should be push down to TableScan

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]

Additional context

Tested with master branch 5f0b61b. I think this SubqueryAlias condition is not handled in file: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=

The same situation is with limits and alias:

Limit is push down to TableScan:

SELECT b FROM foo LIMIT 10;
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Input logical plan:
    Limit: 10
      Projection: #foo.b
        TableScan: foo projection=None
    
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Optimized logical plan:
    Limit: 10
      Projection: #foo.b
        TableScan: foo projection=Some([1]), limit=10

Limit is not push down to TableScan when table has alias:

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1])
@mateuszkj mateuszkj added the bug Something isn't working label Apr 18, 2022
@mateuszkj mateuszkj changed the title Partial filers are not pushdown druing optimalization for table with alias Filters/limit are not pushdown druing optimalization for table with alias Apr 18, 2022
@jackwener
Copy link
Member

Yes, push_down just handle subqueryAlias -> tableScan.

I fix the limit, but I fix projection failed because I can't handle the limitation of schema..... It's in #2244

@jackwener
Copy link
Member

This bug will be fixed after finish #2213 #2212 because finish those issue must fix this bug.

@Jefffrey
Copy link
Contributor

@alamb this can be closed as complete, limit done by #4425

filter has a test confirming behaviour works:

https://github.com/apache/arrow-datafusion/blob/f75d25fec2c1a5581eeb8ce73a890e5792df02c7/datafusion/optimizer/src/push_down_filter.rs#L2250-L2281

@alamb
Copy link
Contributor

alamb commented Feb 12, 2023

Thanks @Jefffrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants