-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't extract common sub expr in CASE WHEN
clause
#8833
Conversation
cc @liukun4515 |
fn pre_visit(&mut self, expr: &Expr) -> Result<VisitRecursion> { | ||
// related to https://github.com/apache/arrow-datafusion/issues/8814 | ||
// If the expr contain volatile expression or is a case expression, skip it. | ||
if matches!(expr, Expr::Case(..)) || is_volatile_expression(expr)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check volatile expression is this place is because the previous method only can deal with a single random()
function, the below query wil return true in main branch
SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random()+1 r1, random()+1 r2) WHERE r1 > 0 AND r2 > 0)
@@ -92,6 +94,21 @@ pub fn log_plan(description: &str, plan: &LogicalPlan) { | |||
trace!("{description}::\n{}\n", plan.display_indent_schema()); | |||
} | |||
|
|||
/// check whether the expression is volatile predicates | |||
pub(crate) fn is_volatile_expression(e: &Expr) -> Result<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move is_volatile_expression
to utils for reuse.
thanks @haohuaijin I will help to review this pr this week. |
datafusion/optimizer/src/utils.rs
Outdated
VisitRecursion::Continue | ||
}) | ||
}) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use the unwrap
? or the ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the result of is_volatile_expression
is result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I forgot to change it when I moved the code.
@@ -1112,3 +1112,22 @@ SELECT abs(x), abs(x) + abs(y) FROM t; | |||
|
|||
statement ok | |||
DROP TABLE t; | |||
|
|||
# related to https://github.com/apache/arrow-datafusion/issues/8814 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
@jackwener could you please help to take a look this issue? |
fn pre_visit(&mut self, expr: &Expr) -> Result<VisitRecursion> { | ||
// related to https://github.com/apache/arrow-datafusion/issues/8814 | ||
// If the expr contain volatile expression or is a case expression, skip it. | ||
if matches!(expr, Expr::Case(..)) || is_volatile_expression(expr)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this problem could exist in other function/expression like COALESCE
| OR
It can be tracked as a future ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner expressions in these expressions may be short-circuited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track in #8874
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I comment in the #8874, Do we have any method for this rule to make sure the new function/expr or other cases we can't find will not bring the same bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, look great to me, but some details need improve.
Thanks @haohuaijin & @liukun4515
@@ -739,7 +740,9 @@ impl OptimizerRule for PushDownFilter { | |||
|
|||
(field.qualified_name(), expr) | |||
}) | |||
.partition(|(_, value)| is_volatile_expression(value)); | |||
.partition(|(_, value)| { | |||
is_volatile_expression(value).unwrap_or(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use unwrap_or(true)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't know whether the scalar function is volatile, default set it to a volatile function.
Thanks @jackwener and @liukun4515 for reviews. |
Thanks for @haohuaijin to fix this issue. |
@alamb i will merge this pr. |
* fix: don't extract common sub expr in CASE WHEN clause * fix ci * fix
… to v34 (#227) * fix: don't extract common sub expr in `CASE WHEN` clause (apache#8833) * fix: don't extract common sub expr in CASE WHEN clause * fix ci * fix * fix: common_subexpr_eliminate rule should not apply to short-circuit expression (apache#8928) * fix: common_subexpr_eliminate rule should not apply to short-circuit expression * add more tests * format * minor * apply reviews * add some commont * fmt --------- Co-authored-by: Huaijin <[email protected]>
Which issue does this PR close?
Closes #8814
Rationale for this change
as shown in #8814, in
CASE WHEN A THEN B ELSE C END
expr, the B and C only have one can run,so for query
the
A.column1/B.column1
actually should not run, but we extract it as a common sub expr(see below plan), which results inA.column1/B.column1
running and getting a divide zero error.What changes are included in this PR?
Are these changes tested?
yes, add a test.
Are there any user-facing changes?