Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeltaScanBuilder does not respect datafusion context's datafusion.execution.parquet.pushdown_filters #2739

Closed
adamfaulkner-at opened this issue Aug 6, 2024 · 3 comments
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@adamfaulkner-at
Copy link
Contributor

Environment

Delta-rs version: 0.18.1

Binding: ?

Environment: MacOS & Linux

  • Cloud provider: AWS
  • OS: MacOS & Linux
  • Other:

Bug

What happened:

When I set up a datafusion context with parquet filter pushdown enabled, I expect it to propagate the filters to the parquet scan. However, this does not happen.

let ctx = SessionConfig::default().set_bool("datafusion.execution.parquet.pushdown_filters", true)
ctx.register_table("table", Arc::new(delta_table))?;
let table = ctx.table("table").await?;
let result_batches = table.filter(some_filter_expr)?.collect().await?

When running this with RUST_LOG=debug, I see the following log line, indicating that no predicate was pushed down:

[2024-08-06T21:47:54Z DEBUG datafusion::datasource::physical_plan::parquet] Creating ParquetExec, files: [[PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "part-00000-7090b947-7f4f-4b3f-867e-60f070089207-c000.snappy.parquet" }, last_modified: 2024-08-05T22:39:04.322Z, size: 419156, e_tag: None, version: None }, partition_values: [], range: None, statistics: None, extensions: Non
e }, PartitionedFile { object_meta: ObjectMeta { location: Path { raw: "part-00000-c9b00314-b854-4e65-baf4-1df2384c23cb-c000.snappy.parquet" }, last_modified: 2024-08-05T22:39:00.136Z, size: 3620924, e_tag: None, version: None }, partition_values: [], range: None, statistics: None, extensions: None }]], projection Some([0, 1]), predicate: None, limit: None

(Note the "predicate: None")

What you expected to happen:

I expected predicates to be pushed down.

How to reproduce it:

From inspecting the code in DeltaScanBuilder and the implementation of TableProvider, it seems like the only way to enable pushdown is to use DeltaTableProvider to set the scan config rather than directly registering the DeltaTable with data fusion. However, due to #2602 this is not possible either. So I don't think it's possible for any use of delta-rs to do filter pushdown right now.

More details:

@adamfaulkner-at adamfaulkner-at added the bug Something isn't working label Aug 6, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Aug 9, 2024

So the default scan config sets enable_parquet_pushdown: true. so this should always execute:

        if let Some(predicate) = logical_filter {
            if config.enable_parquet_pushdown {
                exec_plan_builder = exec_plan_builder.with_predicate(predicate);
            }
        };

Taking a look btw

@rtyler rtyler added the binding/rust Issues for the Rust crate label Aug 9, 2024
@adamfaulkner-at
Copy link
Contributor Author

adamfaulkner-at commented Aug 20, 2024

Thanks! I just realized that 0.18.1 is now an old version of delta-rs.

This seems like it was fixed in 0.18.2 with this PR, I'll give it a shot #2637

It looks like another change was made in 0.19.0 which exactly addresses my comment about not respecting the datafusion session's option. #2702

@adamfaulkner-at
Copy link
Contributor Author

I've confirmed that 0.19.0 fixes this, sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants