Skip to content

Commit

Permalink
apacheGH-34252: [Java] Support ScannerBuilder::Project or ScannerBuil…
Browse files Browse the repository at this point in the history
…der::Filter as a Substrait proto extended expression (apache#35570)

### Rationale for this change

To close apache#34252

### What changes are included in this PR?

This is a proposal to try to solve:
1. Receive a list of Substrait scalar expressions and use them to Project a Dataset
- [x] Draft a Substrait Extended Expression to test (this will be generated by 3rd party project such as Isthmus)
- [x] Use C++ draft PR to Serialize/Deserialize Extended Expression proto messages
- [x] Create JNI Wrapper for ScannerBuilder::Project 
- [x] Create JNI API
- [x] Testing coverage
- [x] Documentation

Current problem is: `java.lang.RuntimeException: Inferring column projection from FieldRef FieldRef.FieldPath(0)`. Not able to infer by column position by able to infer by colum name. This problem is solved by apache#35798

This PR needs/use this PRs/Issues:
- apache#34834
- apache#34227
- apache#35579

2. Receive a Boolean-valued Substrait scalar expression and use it to filter a Dataset
- [x] Working to identify activities

### Are these changes tested?

Initial unit test added.

### Are there any user-facing changes?

No
* Closes: apache#34252

Lead-authored-by: david dali susanibar arce <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: benibus <[email protected]>
Co-authored-by: David Li <[email protected]>
Co-authored-by: Dane Pitkin <[email protected]>
Signed-off-by: David Li <[email protected]>
  • Loading branch information
5 people authored and dgreiss committed Feb 17, 2024
1 parent 9bba001 commit 5becae0
Show file tree
Hide file tree
Showing 7 changed files with 770 additions and 24 deletions.
29 changes: 24 additions & 5 deletions docs/source/java/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,12 +132,10 @@ within method ``Scanner::schema()``:
.. _java-dataset-projection:

Projection
==========
Projection (Subset of Columns)
==============================

User can specify projections in ScanOptions. For ``FileSystemDataset``, only
column projection is allowed for now, which means, only column names
in the projection list will be accepted. For example:
User can specify projections in ScanOptions. For example:

.. code-block:: Java
Expand All @@ -159,6 +157,27 @@ Or use shortcut construtor:
Then all columns will be emitted during scanning.

Projection (Produce New Columns) and Filters
============================================

User can specify projections (new columns) or filters in ScanOptions using Substrait. For example:

.. code-block:: Java
ByteBuffer substraitExpressionFilter = getSubstraitExpressionFilter();
ByteBuffer substraitExpressionProject = getSubstraitExpressionProjection();
// Use Substrait APIs to create an Expression and serialize to a ByteBuffer
ScanOptions options = new ScanOptions.Builder(/*batchSize*/ 32768)
.columns(Optional.empty())
.substraitExpressionFilter(substraitExpressionFilter)
.substraitExpressionProjection(getSubstraitExpressionProjection())
.build();
.. seealso::

:doc:`Executing Projections and Filters Using Extended Expressions <substrait>`
Projections and Filters using Substrait.

Read Data from HDFS
===================

Expand Down
Loading

0 comments on commit 5becae0

Please sign in to comment.