Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config the length of list when using In_list on parquet, rather than a const of 20. #8609

Open
karellincoln opened this issue Dec 21, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@karellincoln
Copy link

Is your feature request related to a problem or challenge?

When I use In_list Expr, if the legth of list is 19, it used 6 ms. but when the length grows to 20, it used 200ms.

Describe the solution you'd like

in build_predicate_expression listExpr pruning down only in in_list.list().len() < 20
image

I want to config the value.

Describe alternatives you've considered

I think.
add a config in ParquetOptions and ParquetExec

but I also think that is ugly, Is there a more elegant implementation?

Additional context

No response

@karellincoln karellincoln added the enhancement New feature or request label Dec 21, 2023
@alamb
Copy link
Contributor

alamb commented Dec 22, 2023

I think a more elegant solution would be to implement direct support in pruning for large IN lists -- the parameter you refer to is effectively rewriting such predicates into OR chains so the existing min/max based evaluation can work on them.

A config parameter is probably fine for the near term.

We have been recently improving the code in this area -- see #8440 for example. Maybe we can update the PruningPredicate logic to use the contained api more to rule out containers based on their min/max values

Specifically, we could figure out the min and max values in the list for contains and then compare the actual min/max values in the columns 🤔

@karellincoln
Copy link
Author

Thanks your advice.
Can't wait for it #8440.

@alamb
Copy link
Contributor

alamb commented Dec 30, 2023

We have been recently improving the code in this area -- see #8440 for example. Maybe we can update the PruningPredicate logic to use the contained api more to rule out containers based on their min/max values

FYI I think @yahoNanJing is in the process of implementing this feature #8669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants