-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Incorrect LEFT JOIN evaluation result on OR conditions #11203
Conversation
@@ -441,11 +442,11 @@ fn push_down_all_join( | |||
|
|||
// Extract from OR clause, generate new predicates for both side of join if possible. | |||
// We only track the unpushable predicates above. | |||
if left_preserved { | |||
if on_left_preserved { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep_predicates
, join_conditions
are join filter but not predicates from top Filter, we should use on_lr_is_preserved
instead of lr_is_preserved
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @viirya (and everyone else who reviewed and commented on this bug)
I went through the code carefully and I pushed a few more tests to this branch. I think it looks great and makes sense to me.
I will also make a new PR to tweak some comments as well
left_push.extend(extract_or_clauses_for_join(&keep_predicates, left_schema)); | ||
left_push.extend(extract_or_clauses_for_join(&join_conditions, left_schema)); | ||
} | ||
if right_preserved { | ||
if on_right_preserved { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed the conditions, filter predicates and on expression can be pushed down with @ozankabak .
According to us, correct handling seems to be
if left_preserved {
left_push.extend(extract_or_clauses_for_join(&keep_predicates, left_schema));
}
if on_left_preserved {
left_push.extend(extract_or_clauses_for_join(&join_conditions, left_schema));
}
if right_preserved {
right_push.extend(extract_or_clauses_for_join(&keep_predicates, right_schema));
}
if on_right_preserved {
right_push.extend(extract_or_clauses_for_join(&join_conditions, right_schema));
}
where, for predicates and join conditions flag used is different. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what is different about your suggestion.
My understanding of the code is that to breaks the predicates into:
keep_predicates
predicates that must be left as a filter (above the join)join_conditions
predicates that must be left in the join (e.g.ON
clause)left_push
predicates that can be pushed down to the left inputright_push
predicates that can be pushed down to the right input
I think the code, as written, does push predicates from the ON clause down correctly when possible.
So in other words, I am not sure what changes you are suggesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be missing something but it seems like keep_predicates
and join_conditions
should be using preserved
vs. on_preserved
(respectively) since the former applies as a post-join filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For left join, left_preserved=true
and on_left_preserved=false
.
In current version of the code, nothing from keep_predicates
and join_conditions
can be inserted to the left_push
since on_left_preserved
flag is false
. However, if we use the flag left_preserved
to decide whether we can insert something from keep_predicates
, into left_push
. We could have inserted additional conditions to the left_push
. In this sense, current code works correctly, however behaves suboptimal in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we wrongly mix join filter predicates and filter predicates into join_conditions
. I separate them now so only on_filter_join_conditions
will be considered with on_preserved
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarifications
In current version of the code, nothing from keep_predicates and join_conditions can be inserted to the left_push since on_left_preserved flag is false
I agree, but I think this is required for correctness. At least for join_conditions
it is important not to push them down to the preserved side
For example, in this query I am pretty sure it is not correct to push the predicate on A.a
below the join (the intuition is that all rows from A
have to have at least one row out of the join). I believe this is what this PR fixes.
SELECT .. FROM A LEFT JOIN B ON (A.a > 5)
I think is correct to push predicates on B below (as any non matching rows would have been filtered in the join anyways)
SELECT .. FROM A LEFT JOIN B ON (B.b > 5)
We could have inserted additional conditions to the left_push. In this sense, current code works correctly, however behaves suboptimal in some cases.
Can you help me write an example? I can add it to this PR and then we can handle improving the case in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add a test case with it.
EDIT: Seems it doesn't test against the corner case. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we found one example -- @mustafasrepo will post shortly
Looks like my comment #11203 (comment) was in progress when @viirya changed the code in 17b2d43 Can someone help me understand what additional case this would catch (and I'll go and write some tests that cover it)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you made a typo in your last commit (left an inline comment for it), but other than that I think it is now logically equivalent to @mustafasrepo's suggestion on #11203 (comment) (although his code seems to use a single join_conditions
vector and seems more concise).
left_push.extend(extract_or_clauses_for_join(&keep_predicates, left_schema)); | ||
left_push.extend(extract_or_clauses_for_join(&join_conditions, left_schema)); | ||
} | ||
if on_right_preserved { | ||
if left_preserved { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant right_preserved
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
I have constructed an example that triggers this test case (query is a bit weird. I have constructed it to trigger
this query generates following plan with recent changes
previously it was generating
where both are correct. However, first plan is a bit better |
Yea, I think it is correct. I made the change to make it more obvious to more people. Because they need to figure out that I can use the suggestion if you think it is better. |
Thank you @mustafasrepo - that makes sense to me. Let's also add that example to the tests (I can do so in a follow on PR or if we are going to change this PR again we can do so) |
As long as we have the functionality, both versions are fine. |
Thanks @mustafasrepo @ozankabak @alamb . I added the test case. |
1 Alice HR | ||
3 Carol Engineering | ||
|
||
# Push down OR conditions on Filter through LEFT JOIN if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty fancy 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great collaboration!
Should we go ahead and merge? Are we waiting for more feedback? |
🚀 I agree it was time to merge Thanks again everyone |
…1203) * fix: Incorrect LEFT JOIN evaluation result on OR conditions * Add a few more test cases * Don't push join filter predicates into join_conditions * Add test case and fix typo * Add test case --------- Co-authored-by: Andrew Lamb <[email protected]>
…1203) * fix: Incorrect LEFT JOIN evaluation result on OR conditions * Add a few more test cases * Don't push join filter predicates into join_conditions * Add test case and fix typo * Add test case --------- Co-authored-by: Andrew Lamb <[email protected]>
…1203) * fix: Incorrect LEFT JOIN evaluation result on OR conditions * Add a few more test cases * Don't push join filter predicates into join_conditions * Add test case and fix typo * Add test case --------- Co-authored-by: Andrew Lamb <[email protected]>
…1203) * fix: Incorrect LEFT JOIN evaluation result on OR conditions * Add a few more test cases * Don't push join filter predicates into join_conditions * Add test case and fix typo * Add test case --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #10881.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?