-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484
Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484
Conversation
# Conflicts: # datafusion/physical-expr/src/equivalence.rs # datafusion/sqllogictest/test_files/groupby.slt
# Conflicts: # datafusion/core/src/physical_optimizer/enforce_distribution.rs # datafusion/core/src/physical_optimizer/replace_with_order_preserving_variants.rs
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
* Discover ordering of complex expressions in group by and window partition by * Remove unnecessary tests * Update comments * Minor changes * Better projection support complex expression support * Fix failing test * Simplifications * Simplifications * Add is end flag * Simplifications * Simplifications * Simplifications * Minor changes * Minor changes * Minor changes * All tests pass * Change implementation of find_longest_permutation * Minor changes * Minor changes * Remove projection section * Remove projection implementation * Fix linter errors * Remove projection sections * Minor changes * Add docstring comments * Add comments * Minor changes * Minor changes * Add comments * simplifications
# Conflicts: # datafusion/physical-expr/src/equivalence.rs # datafusion/physical-expr/src/sort_properties.rs
# Conflicts: # datafusion/physical-expr/src/equivalence.rs # datafusion/sqllogictest/test_files/groupby.slt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mustafasrepo and @metesynnada
Let me know if you want an extra review of this PR
One thing I noticed is that datafusion/physical-expr/src/equivalence.rs
is getting quite massive (over 5000 lines long now). What do you think about breaking it up into smaller modules? I think that would make the code easier to reason about as well as help solidify the APIs a bit more.
I did some version of that in #8235 but @ozankabak rightly pointed out that such a refactor would likely cause non trivial merge conflicts if there was other work underway.
I also reviewed this so it may be a better use of your time to review other PRs, but if you have some extra time on your hands, why not 🙂
This was the last major work item on this code, it will be good to go for a refactor after we merge this 🚀 |
😆 I don't think it needs any more review if it has already been reviewed by three committers |
…gh ProjectionExec (apache#8484) * Better projection support complex expression support --------- Co-authored-by: metesynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…gh ProjectionExec (apache#8484) * Better projection support complex expression support --------- Co-authored-by: metesynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…gh ProjectionExec (apache#8484) * Better projection support complex expression support --------- Co-authored-by: metesynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Which issue does this PR close?
Improves the situation on #8064.
Rationale for this change
In the PR8281 we added support for ordering requirement analysis for complex expressions.
However, in current codebase some ordering information got lost during the projection of complex expressions.
What changes are included in this PR?
This PR adds support for better ordering information propagation support for complex expressions through
ProjectionExec
.Consider the case where we have table with columns
a,b,c,d,e
and assume that[a ASC, b ASC]
,[c ASC]
is valid orderings for this table.If were to apply the projection (
a
,c
,b+c
) on top of this table valid orderings after projection are:[a ASC, b+c ASC]
,[c ASC]
. However, in current code base we can propagate only[a ASC]
,[c ASC]
. This PR enables us to propagate complex expression orderings better throughProjectionExec
. You can examine unit tests for other examples of this PR's support.Are these changes tested?
Yes extensive unit and random test is added. Most of the changes comes from test or test util codes.
Are there any user-facing changes?