You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think a more elegant solution would be to implement direct support in pruning for large IN lists -- the parameter you refer to is effectively rewriting such predicates into OR chains so the existing min/max based evaluation can work on them.
A config parameter is probably fine for the near term.
We have been recently improving the code in this area -- see #8440 for example. Maybe we can update the PruningPredicate logic to use the contained api more to rule out containers based on their min/max values
Specifically, we could figure out the min and max values in the list for contains and then compare the actual min/max values in the columns 🤔
We have been recently improving the code in this area -- see #8440 for example. Maybe we can update the PruningPredicate logic to use the contained api more to rule out containers based on their min/max values
FYI I think @yahoNanJing is in the process of implementing this feature #8669
Is your feature request related to a problem or challenge?
When I use In_list Expr, if the legth of list is 19, it used 6 ms. but when the length grows to 20, it used 200ms.
Describe the solution you'd like
in build_predicate_expression listExpr pruning down only in
in_list.list().len() < 20
I want to config the value.
Describe alternatives you've considered
I think.
add a config in ParquetOptions and ParquetExec
but I also think that is ugly, Is there a more elegant implementation?
Additional context
No response
The text was updated successfully, but these errors were encountered: