-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23564][SQL] infer additional filters from constraints for join's children #21083
Conversation
…dren" This reverts commit cdcccd7.
cc @mgaido91 @maryannxue @KaiXinXiaoLei @gatorsmile @jiangxb1987 @gengliangwang: I do not think this is the right way to do things, @cloud-fan. Looks like you have been aware of my and others' work like #20816, you could have, or I'd say, should have, given your input/suggestions on related PRs. People who have worked on this should deserve more credit than being mentioned as "inspired" here. |
Test build #89431 has finished for PR 21083 at commit
|
Test build #89432 has finished for PR 21083 at commit
|
It's a totally different approach to do additional filters inference from constraints for join, and I'm not sure if it works until I finished this PR. That's the main reason I didn't post my suggestion to the related PRs. I actually don't care about the credites and was planning to give credites to @mgaido91 since it's mostly inspired by the discussion with him. However someone mentioned your PR to me at the last minute, I pulled in your tests and found it's already covered. Then I was a little hesitate about who should get the credites and didn't mention it in the PR desription. If you don't mind, are you OK with giving credites to @mgaido91 ? Thanks for your understanding! |
@cloud-fan the approach itself seems OK to me. Indeed, I prefer this one over the previous status where we had As far as the credit is regarded, there is no issue for me. You can give credit to @maryannxue and @KaiXinXiaoLei. I think I can close #20717 then, since it is going to be covered here, am I right? |
Test build #89435 has finished for PR 21083 at commit
|
@@ -54,13 +51,42 @@ trait QueryPlanConstraints { self: LogicalPlan => | |||
* See [[Canonicalize]] for more details. | |||
*/ | |||
protected def validConstraints: Set[Expression] = Set.empty | |||
} | |||
|
|||
object ConstraintsUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in order to follow the pattern I see also for PredicateHelper
, what about having a ConstraintHelper
trait instead of this object?
baseConstraints.union(ConstraintsUtils.inferAdditionalConstraints(baseConstraints)) | ||
} | ||
|
||
private def inferNewFilter(plan: LogicalPlan, constraints: Set[Expression]): LogicalPlan = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may we return here the additional constraints instead of the new plan, so that in L669 and similar we can copy the plan only if it is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copying the plan is very cheap.
Thank you for you reply, @cloud-fan! I was not clear when you had become aware of the effort on SPARK-21479 so it might be a misunderstanding on my side and I apologize. Anyway, if you had had a closer look at the PR, you would have probably got the idea that it's basically the same approach as what you have here, only that you have covered more join types.
|
Previously the |
Test build #89480 has finished for PR 21083 at commit
|
retest this please |
Test build #89701 has finished for PR 21083 at commit
|
retest this please |
|
||
private def inferNewFilter(plan: LogicalPlan, constraints: Set[Expression]): LogicalPlan = { | ||
val newPredicates = constraints | ||
.union(constructIsNotNullConstraints(constraints, plan.output)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the code
constraints
.union(inferAdditionalConstraints(constraints))
.union(constructIsNotNullConstraints(constraints, output))
.filter { c =>
c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic
}
into the a function in ConstraintHelper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _: InnerLike | LeftSemi =>
val allConstraints = getAllConstraints(left, right, conditionOpt)
val newLeft = inferNewFilter(left, allConstraints)
val newRight = inferNewFilter(right, allConstraints)
join.copy(left = newLeft, right = newRight)
For this pattern, if we reuse the code you mentioned, we need to do constraints expanding twice, for left and right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one comment.
LGTM |
lgtm |
Test build #89705 has finished for PR 21083 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
The existing query constraints framework has 2 steps:
For step 2, it mostly helps with Join, because we can connect the constraints from children to the join condition and infer powerful filters to prune the data of the join sides. e.g., the left side has constraints
a = 1
, the join condition isleft.a = right.a
, then we can inferright.a = 1
to the right side and prune the right side a lot.However, the current logic of inferring filters from constraints for Join is pretty weak. It infers the filters from Join's constraints. Some joins like left semi/anti exclude output from right side and the right side constraints will be lost here.
This PR propose to check the left and right constraints individually, expand the constraints with join condition and add filters to children of join directly, instead of adding to the join condition.
This reverts #20670 , covers #20717 and #20816
This is inspired by the original PRs and the tests are all from these PRs. Thanks to the authors @mgaido91 @maryannxue @KaiXinXiaoLei !
How was this patch tested?
new tests