Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484

Merged
merged 50 commits into from
Dec 11, 2023
Merged

Better Equivalence (ordering and exact equivalence) Propagation through ProjectionExec #8484

merged 50 commits into from
Dec 11, 2023

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Dec 9, 2023

Which issue does this PR close?

Improves the situation on #8064.

Rationale for this change

In the PR8281 we added support for ordering requirement analysis for complex expressions.

However, in current codebase some ordering information got lost during the projection of complex expressions.

What changes are included in this PR?

This PR adds support for better ordering information propagation support for complex expressions through ProjectionExec.
Consider the case where we have table with columns a,b,c,d,e and assume that [a ASC, b ASC], [c ASC] is valid orderings for this table.

If were to apply the projection (a, c, b+c) on top of this table valid orderings after projection are: [a ASC, b+c ASC], [c ASC]. However, in current code base we can propagate only [a ASC], [c ASC]. This PR enables us to propagate complex expression orderings better through ProjectionExec. You can examine unit tests for other examples of this PR's support.

Are these changes tested?

Yes extensive unit and random test is added. Most of the changes comes from test or test util codes.

Are there any user-facing changes?

mustafasrepo and others added 30 commits November 6, 2023 16:06
# Conflicts:
#	datafusion/physical-expr/src/equivalence.rs
#	datafusion/sqllogictest/test_files/groupby.slt
# Conflicts:
#	datafusion/core/src/physical_optimizer/enforce_distribution.rs
#	datafusion/core/src/physical_optimizer/replace_with_order_preserving_variants.rs
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
* Discover ordering of complex expressions in group by and window partition by

* Remove unnecessary tests

* Update comments

* Minor changes

* Better projection support complex expression support

* Fix failing test

* Simplifications

* Simplifications

* Add is end flag

* Simplifications

* Simplifications

* Simplifications

* Minor changes

* Minor changes

* Minor changes

* All tests pass

* Change implementation of find_longest_permutation

* Minor changes

* Minor changes

* Remove projection section

* Remove projection implementation

* Fix linter errors

* Remove projection sections

* Minor changes

* Add docstring comments

* Add comments

* Minor changes

* Minor changes

* Add comments

* simplifications
# Conflicts:
#	datafusion/physical-expr/src/equivalence.rs
#	datafusion/physical-expr/src/sort_properties.rs
@github-actions github-actions bot added the physical-expr Physical Expressions label Dec 9, 2023
Copy link
Contributor

@metesynnada metesynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mustafasrepo and @metesynnada

Let me know if you want an extra review of this PR

One thing I noticed is that datafusion/physical-expr/src/equivalence.rs is getting quite massive (over 5000 lines long now). What do you think about breaking it up into smaller modules? I think that would make the code easier to reason about as well as help solidify the APIs a bit more.

I did some version of that in #8235 but @ozankabak rightly pointed out that such a refactor would likely cause non trivial merge conflicts if there was other work underway.

@ozankabak
Copy link
Contributor

Let me know if you want an extra review of this PR

I also reviewed this so it may be a better use of your time to review other PRs, but if you have some extra time on your hands, why not 🙂

I did some version of that in #8235 but @ozankabak rightly pointed out that such a refactor would likely cause non trivial merge conflicts if there was other work underway.

This was the last major work item on this code, it will be good to go for a refactor after we merge this 🚀

@alamb
Copy link
Contributor

alamb commented Dec 11, 2023

but if you have some extra time on your hands, why not 🙂

😆 I don't think it needs any more review if it has already been reviewed by three committers

@mustafasrepo mustafasrepo merged commit 95ba48b into apache:main Dec 11, 2023
22 checks passed
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
…gh ProjectionExec (apache#8484)


* Better projection support complex expression support


---------

Co-authored-by: metesynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 3, 2024
…gh ProjectionExec (apache#8484)


* Better projection support complex expression support


---------

Co-authored-by: metesynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 3, 2024
…gh ProjectionExec (apache#8484)


* Better projection support complex expression support


---------

Co-authored-by: metesynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants