-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Minor] extract const and add doc and more tests for in_list pruning #8815
Conversation
/// Translate logical filter expression into pruning predicate | ||
/// expression that will evaluate to FALSE if it can be determined no | ||
/// rows between the min/max values could pass the predicates. | ||
/// | ||
/// Returns the pruning predicate as an [`PhysicalExpr`] | ||
/// | ||
/// Notice: For [`InListExpr`] if in list values more than 20, it will be rewritten to TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract const and add explain for InListExpr
,
@@ -1958,6 +1964,90 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn row_group_predicate_between() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some test case.
ID IN (46402206,
201143645,
1147370581,
....
242375670,
38453705)
Before found MAX_LIST_VALUE_SIZE_REWRITE
I though this filter not push down is wrong 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice improvement to me -- thank you @Ted-Jiang
I left some potential improvements, but I don't think they are necessary to merge this PR
@@ -960,7 +964,9 @@ fn build_predicate_expression( | |||
} | |||
} | |||
if let Some(in_list) = expr_any.downcast_ref::<phys_expr::InListExpr>() { | |||
if !in_list.list().is_empty() && in_list.list().len() < 20 { | |||
if !in_list.list().is_empty() | |||
&& in_list.list().len() <= MAX_LIST_VALUE_SIZE_REWRITE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Field::new("c2", DataType::Int32, false), | ||
]); | ||
// test c1 in(1, 2) | ||
let expr1 = Expr::InList(InList::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could potentially use https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#method.in_list
expr1 = col("cl").in_list(vec![lit(1), lit(2)], false)
fn row_group_predicate_in_list_to_many_values() -> Result<()> { | ||
let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]); | ||
// test c1 in(1..21) | ||
// in pruning.rs has MAX_LIST_VALUE_SIZE_REWRITE = 20, more than this value will be rewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to document the current behavior in a test, but I feel like we can do better -- like #7869 says even if we can't prove the expression is not true due to a large inlist, we shouldn't just disable pruning entirely
fn row_group_predicate_in_list_to_many_values() -> Result<()> { | ||
let schema = Schema::new(vec![Field::new("c1", DataType::Int32, false)]); | ||
// test c1 in(1..21) | ||
// in pruning.rs has MAX_LIST_VALUE_SIZE_REWRITE = 20, more than this value will be rewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to document the current behavior in a test, but I feel like we can do better -- like #7869 says even if we can't prove the expression is not true due to a large inlist, we shouldn't just disable pruning entirely
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?