Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parquet page filtering for more types: String, Binary(Decimal), Int96 #3833

Closed
Tracked by #3462
Ted-Jiang opened this issue Oct 14, 2022 · 3 comments
Closed
Tracked by #3462
Labels
enhancement New feature or request

Comments

@Ted-Jiang
Copy link
Member

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

https://github.com/apache/arrow-datafusion/blob/144dcca360fd95c9bf33f38555fffcb2338bd2c1/datafusion/core/src/physical_plan/file_format/parquet.rs#L853-L857

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Ted-Jiang Ted-Jiang added the enhancement New feature or request label Oct 14, 2022
@alamb alamb changed the title Support more types for getting min max from page index Support parquet page filtering for more types (for getting min max from from strings) Nov 7, 2022
@alamb
Copy link
Contributor

alamb commented Nov 7, 2022

Here is a PR for filtering on strings: #4132

@alamb alamb changed the title Support parquet page filtering for more types (for getting min max from from strings) Support parquet page filtering for more types: String, Binary(Decimal), Int96 Nov 7, 2022
@alamb
Copy link
Contributor

alamb commented Nov 26, 2022

Int96 appears to be the only unsupported type left

https://github.com/apache/arrow-datafusion/blob/ba73c8180ebd874614cabc33be8cbb0d1db52518/datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs#L478-L481

I don't think Int96 is a widely used type nor supported for most other implementation -- thus I suggest we close this issue

@Ted-Jiang
Copy link
Member Author

Int96 appears to be the only unsupported type left

https://github.com/apache/arrow-datafusion/blob/ba73c8180ebd874614cabc33be8cbb0d1db52518/datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs#L478-L481

I don't think Int96 is a widely used type nor supported for most other implementation -- thus I suggest we close this issue

@alamb Agree, Int96 also not supported in row_group prunning

https://github.com/alamb/arrow-datafusion/blob/d7a7fb61afe9ce2824aae737f65aec12d9513f7f/datafusion/core/src/physical_plan/file_format/parquet/row_groups.rs#L124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants