Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve all of the valid orderings during merging. #8169

Merged
merged 3 commits into from
Nov 15, 2023
Merged

Preserve all of the valid orderings during merging. #8169

merged 3 commits into from
Nov 15, 2023

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Nov 14, 2023

Which issue does this PR close?

Partially closes #8064.

Rationale for this change

Currently during merging (SortPreservingMergeExec) we can only preserve single ordering of the one of the valid orderings of the table.
Consider the case
where we have table1 (satisfies [a ASC] and [b ASC])

a b
0 0
0 1
1 1
1 2
2 2
2 3
3 3

and table 2 (satisfies [a ASC] and [b ASC])

a b
0 0
0 1
1 1
1 2
2 2
2 3
3 3

during merging of these two table if we merge according to constraint [a ASC].
Resulting table may be

a b
0 0
0 1
0 0
0 1
1 1
1 2
1 1
1 2
2 2
2 3
2 2
2 3
3 3
3 3

where [a ASC] property is still valid. However, [b ASC] property is lost.
This PR fixes this problem. With this PR after SortPreservingMerge we still preserve [a ASC] and [b ASC] properties. After this PR result would be

a b
0 0
0 0
0 1
0 1
1 1
1 1
1 2
1 2
2 2
2 2
2 3
2 3
3 3
3 3

where both [a ASC] and [b ASC] are satisfied.

The algorithm is as follows:

  • During merge we try to preserve not just one of the ordering (such as either [a ASC] and [b ASC] ) but concatenated version of the all of the valid orderings ([a ASC, b ASC]). This enables us to preserve existing valid orderings during merge.

Most of the changes in this PR comes from tests, or test related utils.

What changes are included in this PR?

Are these changes tested?

Yes.

Are there any user-facing changes?

@mustafasrepo mustafasrepo marked this pull request as draft November 14, 2023 11:50
@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 14, 2023
@mustafasrepo mustafasrepo marked this pull request as ready for review November 14, 2023 12:04
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments, thanks @mustafasrepo for this small but powerful change. @alamb, PTAL as IOx also leverages orderings and related optimizations extensively

datafusion/physical-expr/src/equivalence.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/equivalence.rs Outdated Show resolved Hide resolved
Comment on lines -475 to -477
if self.preserve_order {
result = result.with_reorder(self.sort_exprs().unwrap_or_default().to_vec())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers: This is the key change that avoids losing alternative orderings.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking at this -- the code looks good to me. I didn't quite have enough time to study the test code to understand what it is doing.

However, please don't feel like you need to wait for my review to merge this if it is blocking something

self.orderings.first().cloned()
let output_ordering =
self.orderings.iter().flatten().cloned().collect::<Vec<_>>();
let output_ordering = collapse_lex_ordering(output_ordering);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is another core change, right? It preserves all the possible orderings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the place where we concatenate all valid orderings so that they can be preserved during merge

@alamb
Copy link
Contributor

alamb commented Nov 14, 2023

Thank you @mustafasrepo -- this is very neat

@ozankabak
Copy link
Contributor

Thanks @alamb, I will go ahead and merge this then. If you encounter any issue related to this, please let us know so we can promptly fix it

@ozankabak ozankabak merged commit 6ecb6cd into apache:main Nov 15, 2023
22 checks passed
@tustvold
Copy link
Contributor

tustvold commented Nov 15, 2023

This PR appears to have logic merge conflicts that have broken CI - #8186

@ozankabak could you perhaps take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Sort Based Optimizations
4 participants