-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect when filters on unique constraints make subqueries scalar #8312
Conversation
2f4cb79
to
44383c2
Compare
Thank you @Jesse-Bakker -- I plan to review this carefully tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jesse-Bakker -- this looks great. I had a few suggestions, mostly about comments, and code reuse. But otherwise I think this PR is pretty much ready
cc @liukun4515 and @jackwener who may have some more thoughts about subquery rewrites
The only thing I think is strictly needed to approve this PR is docstrings for "is_scalar" -- the rest is "nice to have" from my perspective / could be done as a follow on PR |
52fadb7
to
f33a843
Compare
I will review this PR and think more about it. If this PR isn't urgent, please wait for me. |
f33a843
to
16db847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jesse-Bakker
/// `Filter(b = 2).is_scalar() == false` | ||
/// and | ||
/// `Filter(a = 2 OR b = 2).is_scalar() == false` | ||
fn is_scalar(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest use Uniform slot
and Unique slot
to describe FDs
instead of scalar
.
Commonly used Functional dependencies : including
uniform slot
, which means ndv <= 1
unique slot
, which means ndv = row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the suggestion is to rename this method to Filter::is_uniform()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and add these definition into doc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Jesse-Bakker - let's see if @jackwener has a chance to clarify. If we haven't heard by tomorrow I'll merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jesse-Bakker @alamb
I have no other question.
I took the liberty of merging up from main to resolve a conflict in this PR |
Thanks again @Jesse-Bakker and @jackwener |
Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Part of #3725
Rationale for this change
In some cases, it is possible to prove that a Filter only ever produces one
row. In those cases, such a filter may be used in a scalar subquery to ensure
it is in fact scalar, allowing for more flexible use of scalar subqueries.
What changes are included in this PR?
This adds an
is_scalar()
method toFilter
, which will check if there is aunique functional dependence which is covered by the
Filter
's predicate.This is used in
LogicalPlan::max_rows()
to provide a tighter bound on themaximum number of rows returned in the presence of
Filter
s.Are these changes tested?
This is directly tested with a unit test and with a
sqllogictest
that exercisesthe more flexible use of scalar subqueries.
Are there any user-facing changes?
Scalar subqueries are more flexible. Previous constraints were not documented
as far as I can tell