-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32360][SQL] Add MaxMinBy to support eliminate sorts #29142
Conversation
Test build #126034 has finished for PR 29142 at commit
|
cc @maropu |
@@ -1004,7 +1004,7 @@ object EliminateSorts extends Rule[LogicalPlan] { | |||
|
|||
private def isOrderIrrelevantAggs(aggs: Seq[NamedExpression]): Boolean = { | |||
def isOrderIrrelevantAggFunction(func: AggregateFunction): Boolean = func match { | |||
case _: Min | _: Max | _: Count => true | |||
case _: Min | _: Max | _: Count | _: CountIf | _: MaxMinBy => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count_if has replaced with count and if, so needn't add it.
Test build #126101 has finished for PR 29142 at commit
|
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache Spark community really appreciates your continuously contributions. Here are some advices for your contributions.
- Your contribution is good. However, please be aware of the non-code Apache Spark community policy. For example, release cycle and versioning. For example, SPARK-29343 is already released at 3.0.0 and your improvement patch will be applied only at 3.1.0. You need to use a new JIRA ID.
- You had better choose the most specific PR title. For example, this PR only add
MaxMinBy
. If then, just sayAdd MaxMinBy
instead ofAdd more aggregate function
. - Please provide a test coverage when you add a new code path. More specifically, we need a test case which fails at
master
branch and succeeds at your PR. The test coverage is very crucial to protect your contribution from accidental removal at future releases.
The above is a general guideline for you. We want to help you grow in the Apache Spark community and to go with us further. In addition, the above guideline will help you work in another Apache community, too. Thank you always, @ulysses-you .
Test build #126135 has finished for PR 29142 at commit
|
Thank you @dongjoon-hyun , your advice is really helpful to me. I will follow up the policy that you said and keep contribute continuously. Thanks again! |
Test build #126137 has finished for PR 29142 at commit
|
Thank you for your update. Could you fix the UT failure? The newly add UT is failing now.
|
@@ -1004,7 +1004,7 @@ object EliminateSorts extends Rule[LogicalPlan] { | |||
|
|||
private def isOrderIrrelevantAggs(aggs: Seq[NamedExpression]): Boolean = { | |||
def isOrderIrrelevantAggFunction(func: AggregateFunction): Boolean = func match { | |||
case _: Min | _: Max | _: Count => true | |||
case _: Min | _: Max | _: Count | _: MaxMinBy => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this itself looks fine to me.
Test build #126145 has finished for PR 29142 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ulysses-you and @maropu . Unfortunately, MaxBy
and MinBy
is order-sensitive. I'm -1 for this optimization.
scala> sql("SELECT max_by(x, y) FROM (SELECT * FROM VALUES (('a', 50)), (('b', 50)), (('c', 50)) AS tab(x, y) ORDER BY x)").show
+------------+
|max_by(x, y)|
+------------+
| c|
+------------+
scala> sql("SELECT max_by(x, y) FROM (SELECT * FROM VALUES (('a', 50)), (('b', 50)), (('c', 50)) AS tab(x, y) ORDER BY x DESC)").show
+------------+
|max_by(x, y)|
+------------+
| a|
+------------+
Could you close this PR please, @ulysses-you ? Or, could you let me know what I missed in your proposal? |
Thanks for the negative case, seems I missed something. |
Thank you for closing, @ulysses-you . |
cc @gatorsmile |
What changes were proposed in this pull request?
Add
MaxMinBy
aggregate function and make these case support eliminate sorts.Why are the changes needed?
Make
EliminateSorts
match more case.Does this PR introduce any user-facing change?
Yes, if match case user will see the different execution plan.
before
after
How was this patch tested?
manual test.