Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Push-down filtering in Java #14782

Open
noahfrn opened this issue Nov 30, 2022 · 8 comments
Open

[Java] Push-down filtering in Java #14782

noahfrn opened this issue Nov 30, 2022 · 8 comments

Comments

@noahfrn
Copy link
Contributor

noahfrn commented Nov 30, 2022

Push-down filtering (Java) questions

Hey all,

I'm exploring adding push-down filtering to the Java Datasets API, as it currently only supports push-down projection.
Just had a couple of questions before I start this work:

  • Is this planned on any current roadmap? I'd like to avoid duplicating work that's already begun
  • Does anyone have any smart ideas as to how to pass the filter predicate into ScanOptions? As far as I can tell, the only complication to make this happen would be figuring out some way to call into the Expression object from Java without creating future maintenance effort.

Thanks!

Component(s)

C++, Java

@noahfrn noahfrn added the Type: usage Issue is a user question label Nov 30, 2022
@lidavidm
Copy link
Member

CC @lwhite1 @davisusanibar

An alternative might be to accept Substrait plans instead, binding to Acero as a whole instead of just Dataset. This would give you the full power of the query engine and avoid having to create bindings to individual C++ components. But of course this is more complex and Substrait is a bit of a moving target.

Another alternative is to track the discussion about a text format for expressions. See https://lists.apache.org/thread/7vch27t3gfz1hmv7d8w69n50gfc1nswf and #14287.

@noahfrn
Copy link
Contributor Author

noahfrn commented Dec 1, 2022

Thanks for your help @lidavidm - that expression parsing PR looks like exactly what I wanted!

@ianmcook
Copy link
Member

There is a discussion here about passing Substrait expressions to the Dataset Project and Filter methods: #33985 (comment)

If this gets implemented in C++, can it be exposed to Java through the JNI bindings?

@ianmcook ianmcook changed the title Push-down filtering in Java [Java] Push-down filtering in Java Feb 17, 2023
@lidavidm
Copy link
Member

I think this is just about having a convenient user-facing API to the existing Dataset functionality, a Substrait API to Acero in Java would be a separate project

@ianmcook
Copy link
Member

Ok, thanks. To clarify: my question is not about Substrait plans, it's about Substrait expressions which we now have a way to represent independent of plans. I opened a separate issue to request this feature: #34252

@lidavidm
Copy link
Member

I think without a convenient API to build Substrait expressions in Java, it'd still not quite meet the goals here right?

@noahfrn
Copy link
Contributor Author

noahfrn commented Feb 20, 2023

Agreed @lidavidm - Something like substrait-io/substrait-java#128 would provide the necessary functionality here.

@ianmcook
Copy link
Member

#34252 is now complete, and the Arrow Java Datasets API now supports pushdown projection and filtering using Substrait expressions.

More details here: https://github.com/apache/arrow/blob/main/docs/source/java/dataset.rst#projection-produce-new-columns-and-filters

Example here: https://github.com/apache/arrow/blob/main/docs/source/java/substrait.rst#executing-projections-and-filters-using-extended-expressions

However: this capability is not really ready for practical applications yet, because we do not yet have any user-friendly tools to create Substrait expressions in Java. I hope we can achieve that in substrait-io/substrait-java#128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants