Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate ConstantFolding and SimplifyExpression #1375

Merged
merged 3 commits into from
Dec 2, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 28, 2021

Which issue does this PR close?

Builds on #1374

Re #1160 (more sophisticated expression simplification) and #1316 don't change names during constant folding

Rationale for this change

ConstantFolding and SimplifyExpression both simplify expressions, in slightly different ways, which is confusing and redundant.

It also means bug fixes such as #1319 didn't get applied to SimplifyExpression.

For example, on master (not the column name is column1, due to the simplification that has occured):

❯ select column1 * 1 from (values (1, NULL), (2, 2), (3,3)) as sq;
+---------+
| column1 |
+---------+
| 1       |
| 2       |
| 3       |
+---------+

On this branch the column is correctly named:

select column1 * 1 from (values (1, NULL), (2, 2), (3,3)) as sq;
+-----------------------+
| sq.column1 * Int64(1) |
+-----------------------+
| 1                     |
| 2                     |
| 3                     |
+-----------------------+

What changes are included in this PR?

  1. Remove independent ConstantFolding optimizer pass by compiling ConstantFolding into SimplifyExpression optimizer pass
  2. Move tests

Note:
I plan more PRs to combine simplify and Simplifier logic but to keep reviews easier I am going to do that in another PR

@alamb alamb marked this pull request as draft November 28, 2021 13:15
@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Nov 28, 2021
@alamb alamb changed the title (WIP) Consolidate ConstantFolding and SimplifyExpression (WIP) Consolidate ConstantFolding and SimplifyExpression Nov 28, 2021
@alamb alamb changed the title (WIP) Consolidate ConstantFolding and SimplifyExpression Consolidate ConstantFolding and SimplifyExpression Nov 30, 2021
@alamb alamb marked this pull request as ready for review November 30, 2021 11:46
@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 30, 2021
// projected columns. With just the projected schema, it's not possible to infer types for
// expressions that references non-projected columns within the same project plan or its
// children plans.
let mut simplifier =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is moved from ConstantFolding

Comment on lines +304 to +305
// TODO combine simplify into Simplifier
let e = simplify(&e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the call into the existing simplify function that is in the SimplifyExpressions pass (originally written by @jgoday)

let new_name = &new_e.name(plan.schema());

// TODO simplify this logic
if let (Ok(expr_name), Ok(new_expr_name)) = (name, new_name) {
Copy link
Contributor Author

@alamb alamb Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that now the fix for alias names (introduced by @viirya in #1319) is applied to the result of simplify as well -- previously it was not 🎉

@@ -1562,4 +1556,388 @@ mod tests {
.unwrap(),
)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are just moved from constant_folding.rs

@alamb alamb requested review from houqp and removed request for houqp November 30, 2021 11:58
@alamb alamb marked this pull request as draft November 30, 2021 12:05
@alamb
Copy link
Contributor Author

alamb commented Nov 30, 2021

The new tpch query test added in #1368 by @viirya is already coming in handy to prevent regressions -- it found a problem with an earlier version of this pr where simplification wasn't done until after the other optimizations. ❤️

Arc::new(ConstantFolding::new()),
// Simplify expressions first to maximize the chance
// of applying other optimizations
Arc::new(SimplifyExpressions::new()),
Copy link
Contributor Author

@alamb alamb Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has the additional benefit of allowing even more filters to be pushed down (e.g. column * 1 < 5 which is simplified by the code in simplify to column < 5 can now also be pushed down

@alamb alamb requested a review from houqp November 30, 2021 14:00
@alamb alamb marked this pull request as ready for review November 30, 2021 14:00
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice simplification and tests :)

@alamb alamb merged commit e73a7a0 into apache:master Dec 2, 2021
@alamb alamb deleted the alamb/simplify_simplify branch December 2, 2021 22:42
@alamb alamb added enhancement New feature or request and removed api change Changes the API exposed to users of the crate labels Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants