-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the performance queries with a single distinct aggregate #1315
Conversation
schema: _, | ||
group_expr, | ||
} => { | ||
match is_single_agg(plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be using if/else
.
Great idea and results! @ic4y |
Thanks for your contribution @ic4y, I'll take a look tomorrow. |
LogicalPlan::Aggregate { | ||
input, | ||
aggr_expr, | ||
schema: _, | ||
group_expr, | ||
} => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogicalPlan::Aggregate { | |
input, | |
aggr_expr, | |
schema: _, | |
group_expr, | |
} => { | |
LogicalPlan::Aggregate { | |
input, | |
aggr_expr, | |
group_expr, | |
.. | |
} => { |
fn is_single_agg(plan: &LogicalPlan) -> bool { | ||
match plan { | ||
LogicalPlan::Aggregate { | ||
input: _, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, you can also check other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks i fixed it
} => { | ||
match is_single_agg(plan) { | ||
true => { | ||
let mut all_group_args: Vec<Expr> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut all_group_args: Vec<Expr> = Vec::new(); | |
let mut all_group_args = Vec::with_capacity(group_expr.len()); |
// remove distinct and collection args | ||
let mut new_aggr_expr = aggr_expr | ||
.iter() | ||
.map(|aggfunc| match aggfunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggfunc
is still an Expr
, so it's better to have a name with expr
not func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question: if all exprs in aggr_expr
are Expr::AggregateFunction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because there is judgment in is_single_distinct_agg()
aggr_expr: Vec::new(), | ||
schema: grouped_schema, | ||
}; | ||
let mut expres = group_expr.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: expres
?
let expr = plan.expressions(); | ||
// apply the optimization to all inputs of the plan | ||
let inputs = plan.inputs(); | ||
|
||
let new_inputs = inputs | ||
.iter() | ||
.map(|plan| optimize(plan, execution_props)) | ||
.collect::<Result<Vec<_>>>()?; | ||
|
||
utils::from_plan(plan, &expr, &new_inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is redundant from the 113~119 lines, so if we can eliminate duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks i fixed it
BTW, some tests can't pass due to |
@xudong963 Thank you for reviewing!I prioritize the problem of failing the tests |
Thanks @ic4y -- this looks really cool. I plan to review this PR carefully shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ic4y ❤️ I went through this carefully, and I think it could be merged. This is a very nice first contribution,. I left a few stylistic comments and it looks like there are some test failures (looks like some output needs to be updated).
Also, I think this transformation is also valid for multiple distinct aggregates (if they share the same argument ). For example
SELECT F1(DISTINCT s), F2(DISTINCT s), k
...
GROUP BY k
Rewritten to
SELECT F1(s), F2(s)
FROM (
SELECT s, k ... GROUP BY s, k
)
GROUP BY k
/// </pre> | ||
/// <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These and <p>
tags seem out of place.
I think you could use something like
/// ```text
/// - Aggregation
/// GROUP BY (k)
/// F1(s)
/// - Aggregation
/// GROUP BY (k, s)
/// - X
/// ```
If you wanted to use monospaced fonts to illustrate the transformation
use std::sync::Arc; | ||
|
||
/// single distinct to group by optimizer rule | ||
/// - Aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another way to display this transformation
SELECT F1(DISTINCT s)
...
GROUP BY k
Rewritten to
SELECT F1(s)
FROM (
SELECT s, k ... GROUP BY s, k
)
GROUP BY k
let mut all_group_args = Vec::with_capacity(group_expr.len()); | ||
all_group_args.append(&mut group_expr.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut all_group_args = Vec::with_capacity(group_expr.len()); | |
all_group_args.append(&mut group_expr.clone()); | |
let mut all_group_args = group_expr.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this really matters, but I figured I would point out a slightly shorter way to do the same thing.
.iter() | ||
.map(|agg_expr| match agg_expr { | ||
Expr::AggregateFunction { fun, args, .. } => { | ||
all_group_args.append(&mut args.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here you can do something like this (and avoid a mut
):
all_group_args.append(&mut args.clone()); | |
all_group_args.extend(args.iter().cloned()); |
input: Arc::new(grouped_agg.unwrap()), | ||
aggr_expr: new_aggr_expr, | ||
schema: final_agg_schema.clone(), | ||
group_expr: group_expr.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_expr: group_expr.clone(), | |
group_expr, |
group_expr: group_expr.clone(), | ||
}; | ||
|
||
let mut alias_expr: Vec<Expr> = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might help here to explain the rationale for adding this alias (so the aggregates are displayed in the same way even after the rewrite). It is important but may not be obvious to other readers
} | ||
|
||
#[test] | ||
fn distinct_and_common() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@alamb Thank you for your help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @ic4y -- I started CI and when it passes I'll merge this one in. 🎉
}) | ||
.count() | ||
== aggr_expr.len() | ||
&& fields_set.len() == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
)? | ||
.build()?; | ||
// Should work | ||
let expected = "Projection: #test.a AS a, #COUNT(test.b) AS COUNT(DISTINCT test.b), #MAX(test.b) AS MAX(DISTINCT test.b) [a:UInt32, COUNT(DISTINCT test.b):UInt64;N, MAX(DISTINCT test.b):UInt32;N]\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Thanks for this great contribution 🎉 |
Very cool optimization, thanks @ic4y ! |
Which issue does this PR close?
resolves #1282
related to #1312
Rationale for this change
Improve the performance of single_distinct_agg
a single distinct aggregation optimization method as follows:
I used a test data set of 60 million to test datafunshion before and after using the optimizer. After optimization, the performance has double improvement and the execution time has been reduced from 12 seconds to 6 seconds.
The test results and the logical plan before and after optimization are as follows:
What changes are included in this PR?
Are there any user-facing changes?
nothing