-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't error in simplify_expressions rule #8957
Conversation
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
cc @jackwener @alamb. This is the first step to solving #8910; the main change in this PR is that it throws an error in runtime, not in simplify_expression. I open #8959 to discuss how to deal with the short-circuit function.
after this pr
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @haohuaijin -- I think this is looking really good to me.
I think we should avoid the clone if possible, add a few more tests, and update the expected error messages, but all in all this PR is looking really close.
cc @jackwener
@@ -138,28 +138,6 @@ mod tests { | |||
ExprSchemable, JoinType, | |||
}; | |||
|
|||
/// A macro to assert that one string is contained within another with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for reusing the existing one
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
@@ -281,7 +280,13 @@ impl<'a> TreeNodeRewriter for ConstEvaluator<'a> { | |||
|
|||
fn mutate(&mut self, expr: Expr) -> Result<Expr> { | |||
match self.can_evaluate.pop() { | |||
Some(true) => Ok(Expr::Literal(self.evaluate_to_scalar(expr)?)), | |||
Some(true) => { | |||
let lit = self.evaluate_to_scalar(expr.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a real shame that this has to clone the expression even for the common case where there is no error
maybe we could use something like the following:
enum SimplifyResult {
// expr was simplifed and contains the new expression
Simplified(Expr),
// Evalaution encountered an error, contains the original expression
SimplifyRuntimeError(DataFusionError, Expr),
}
Thanks @alamb for the great reviews; I apply your suggestions, re-added the deleted tests and add some new tests. |
SELECT not(1), not(0) | ||
|
||
query ?B | ||
SELECT null, not(null) | ||
---- | ||
NULL NULL | ||
|
||
query error DataFusion error: Optimizer rule 'simplify_expressions' failed\ncaused by\nInternal error: NOT 'Literal \{ value: Utf8\("hi"\) \}' can't be evaluated because the expression's type is Utf8, not boolean or NULL | ||
query error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the detailed error message because this query return internal error, that make ci failed
❯ select not('hi');
Internal error: NOT 'Literal { value: Utf8("hi") }' can't be evaluated because the expression's type is Utf8, not boolean or NULL.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯ select 1 and 2;
Error during planning: Cannot infer common argument type for logical boolean operation Int64 AND Int64
I think the return error message should like And
operator, they both only work for boolean value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track in #8965
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @haohuaijin -- this PR looks great to me. Thank you
FYI @jackwener
#[test] | ||
fn test_simplify_divide_zero_by_zero() { | ||
// because divide by 0 maybe occur in short-circuit expression | ||
// so we should not simplify this, and throw error in runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
#[test] | ||
fn test_expression_partial_simplify_2() { | ||
// (1 > 2) and (4 / 0) -> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very nice job! Thanks @haohuaijin @alamb ❤️
// Certain expressions such as `CASE` and `COALESCE` are short circuiting | ||
// and may not evalute all their sub expressions. Thus if | ||
// if any error is countered during simplification, return the original | ||
// so that normal evaluation can occur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
Closes #8909
Rationale for this change
see #8910 and #8909 ,
What changes are included in this PR?
before this pr, in simplify_espression we try to fold the expression, but this can result some error, for example
4/0
never reach in runtime.so in this PR, if we encounter an error during fold expression, return the origin expression, and the error will occur in runtime if it is a real error(not the short-circuit problem).
Note: this pr doesn't solve #8910 , due to #8927
Are these changes tested?
test by existing tests
Are there any user-facing changes?