Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't extract common sub expr in CASE WHEN clause #8833

Merged
merged 3 commits into from
Jan 16, 2024

Conversation

haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #8814

Rationale for this change

as shown in #8814, in CASE WHEN A THEN B ELSE C END expr, the B and C only have one can run,
so for query

select 
case when B.column1 > 0 then A.column1/B.column1 else 0 end as value1,
case when B.column1 > 0 and B.column2 > 0 then A.column1/B.column1 else 0 end as value3
from (select column1, column2 from users) as A ,  (select column1, column2 from users where column1 = 0) as B;

the A.column1/B.column1 actually should not run, but we extract it as a common sub expr(see below plan), which results in A.column1/B.column1 running and getting a divide zero error.

| initial_logical_plan                                       | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                                                                                                        |
|                                                            |   CrossJoin:                                                                                                                                                                                                                                                                                                                                    |
|                                                            |     SubqueryAlias: a                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         TableScan: users                                                                                                                                                                                                                                                                                                                        |
|                                                            |     SubqueryAlias: b                                                                                                                                                                                                                                                                                                                            |
|                                                            |       Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                  |
|                                                            |         Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                        |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| logical_plan after common_sub_expression_eliminate         | Projection: CASE WHEN b.column1 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value1, CASE WHEN b.column2 > Int64(0) THEN a.column1 / b.column1b.column1a.column1 AS a.column1 / b.column1 ELSE Int64(0) END AS value3                                                                  |
|                                                            |   Projection: a.column1 / b.column1 AS a.column1 / b.column1b.column1a.column1, a.column1, a.column2, b.column1, b.column2                                                                                                                                                                                                                      |
|                                                            |     CrossJoin:                                                                                                                                                                                                                                                                                                                                  |
|                                                            |       SubqueryAlias: a                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           TableScan: users                                                                                                                                                                                                                                                                                                                      |
|                                                            |       SubqueryAlias: b                                                                                                                                                                                                                                                                                                                          |
|                                                            |         Projection: users.column1, users.column2                                                                                                                                                                                                                                                                                                |
|                                                            |           Filter: users.column1 = Int64(0)                                                                                                                                                                                                                                                                                                      |
|                                                            |             TableScan: users 

What changes are included in this PR?

Are these changes tested?

yes, add a test.

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 11, 2024
@haohuaijin
Copy link
Contributor Author

cc @liukun4515

fn pre_visit(&mut self, expr: &Expr) -> Result<VisitRecursion> {
// related to https://github.com/apache/arrow-datafusion/issues/8814
// If the expr contain volatile expression or is a case expression, skip it.
if matches!(expr, Expr::Case(..)) || is_volatile_expression(expr)? {
Copy link
Contributor Author

@haohuaijin haohuaijin Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check volatile expression is this place is because the previous method only can deal with a single random() function, the below query wil return true in main branch

SELECT r FROM (SELECT r1 == r2 r, r1, r2 FROM (SELECT random()+1 r1, random()+1 r2) WHERE r1 > 0 AND r2 > 0)

@@ -92,6 +94,21 @@ pub fn log_plan(description: &str, plan: &LogicalPlan) {
trace!("{description}::\n{}\n", plan.display_indent_schema());
}

/// check whether the expression is volatile predicates
pub(crate) fn is_volatile_expression(e: &Expr) -> Result<bool> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move is_volatile_expression to utils for reuse.

@liukun4515
Copy link
Contributor

cc @liukun4515

thanks @haohuaijin I will help to review this pr this week.

VisitRecursion::Continue
})
})
.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the unwrap? or the ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the result of is_volatile_expression is result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I forgot to change it when I moved the code.

@@ -1112,3 +1112,22 @@ SELECT abs(x), abs(x) + abs(y) FROM t;

statement ok
DROP TABLE t;

# related to https://github.com/apache/arrow-datafusion/issues/8814
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@liukun4515
Copy link
Contributor

@jackwener could you please help to take a look this issue?

@jackwener jackwener self-requested a review January 15, 2024 09:05
fn pre_visit(&mut self, expr: &Expr) -> Result<VisitRecursion> {
// related to https://github.com/apache/arrow-datafusion/issues/8814
// If the expr contain volatile expression or is a case expression, skip it.
if matches!(expr, Expr::Case(..)) || is_volatile_expression(expr)? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this problem could exist in other function/expression like COALESCE | OR

It can be tracked as a future ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner expressions in these expressions may be short-circuited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

track in #8874

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I comment in the #8874, Do we have any method for this rule to make sure the new function/expr or other cases we can't find will not bring the same bug

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, look great to me, but some details need improve.

Thanks @haohuaijin & @liukun4515

@@ -739,7 +740,9 @@ impl OptimizerRule for PushDownFilter {

(field.qualified_name(), expr)
})
.partition(|(_, value)| is_volatile_expression(value));
.partition(|(_, value)| {
is_volatile_expression(value).unwrap_or(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use unwrap_or(true)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't know whether the scalar function is volatile, default set it to a volatile function.

@haohuaijin
Copy link
Contributor Author

Thanks @jackwener and @liukun4515 for reviews.

@liukun4515
Copy link
Contributor

Thanks for @haohuaijin to fix this issue.
Thanks for @jackwener to help review this and point out other cases/issues.

@liukun4515
Copy link
Contributor

@alamb i will merge this pr.

@liukun4515 liukun4515 merged commit 4cde998 into apache:main Jan 16, 2024
22 checks passed
@haohuaijin haohuaijin deleted the fix_case branch January 16, 2024 02:25
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Apr 4, 2024
* fix: don't extract common sub expr in CASE WHEN clause

* fix ci

* fix
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Apr 4, 2024
… to v34 (#227)

* fix: don't extract common sub expr in `CASE WHEN` clause (apache#8833)

* fix: don't extract common sub expr in CASE WHEN clause

* fix ci

* fix

* fix: common_subexpr_eliminate rule should not apply to short-circuit expression (apache#8928)

* fix: common_subexpr_eliminate rule should not apply to short-circuit expression

* add more tests

* format

* minor

* apply reviews

* add some commont

* fmt

---------

Co-authored-by: Huaijin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

throw Divide by zero error when use case when
3 participants