-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ORDER BY an aliased column #5067
Conversation
2cd0a0c
to
dbbbc25
Compare
@@ -36,6 +36,12 @@ use datafusion_sql::planner::{ContextProvider, ParserOptions, SqlToRel}; | |||
|
|||
use rstest::rstest; | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to enable debug logging in the sql tests
/// Remember that: | ||
/// 1. given a projection with exprs: [a, b + c] | ||
/// 2. t produces an output schema with two columns "a", "b + c" | ||
fn rewrite_in_terms_of_projection(expr: Expr, projection: &Projection) -> Result<Expr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the fix -- it has slightly different semantics than the raw aggregate case.
/// Note The SQL planner always puts a `Projection` at the output of | ||
/// an aggregate, the other paths such as LogicalPlanBuilder can | ||
/// create a Sort directly above an Aggregate | ||
fn rewrite_in_terms_of_aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same logic, just extracted into a function and using rewrite_expr
to make it easier to read
Ok -- I think this PR is ready to review once the other PRs it depends on are ready |
dbbbc25
to
e11a032
Compare
907964b
to
6d85c1f
Compare
aggr_expr, | ||
distinct_group_exprs: &distinct_group_exprs, | ||
}) | ||
/// Rewrites a sort expression in terms of the output of the previous [`LogicalPlan`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new more general algorithm that works for LogicalPlan::Projection
and LogicalPlan::Aggregation
// if that doesn't work, try to match the expression as an | ||
// output column -- however first it must be "normalized" | ||
// (e.g. "c1" --> "t.c1") because that normalization is done | ||
// at the input of the aggregate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand why this is needed, but several tests fail if this logic is removed and the previous logic did it as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe(Just to offer a possibility) it's related with qualifier
.
In the comment:
/// 1. given a projection with exprs: [a, b + c]
/// 2. t produces an output schema with two columns "a", "b + c"
But in fact, in the plan, [a, b + c]
can be [table1.a, "table2.b+c"].
output schema also may be with qualifier like above.
Due to qualifier, some equations are not true, like 'a' == 't1.a' will be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in fact, in the plan, [a, b + c] can be [table1.a, "table2.b+c"].
Yes I think you are correct. I think what I found was confusing is that the qualifier is added "on demand" in several places and there isn't a clear cut line between "as written" and "qualified"
@@ -136,24 +162,24 @@ mod test { | |||
|
|||
let cases = vec![ | |||
TestCase { | |||
desc: "c1 --> t.c1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why direct aggregate creation doesn't need this rewrite anymore. All the other tests in DataFusion CI pass and I wrote the unit tests last week to document the existing behavior in #5088
If reviewers prefer to keep the old behavior here, that is easy as the first commit in this PR, 0c16ef1 actually keeps all these tests passing with the existing "aggregate" code.
// expect that this is not an ambiguous reference | ||
let expected = | ||
"Sort: birth_date ASC NULLS LAST\ | ||
\n Projection: AVG(person.age) AS value, datetrunc(Utf8(\"month\"), person.birth_date) AS birth_date\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shows the plan is using the expression datetrunc
rather than the original birth_date
grouping column
This PR is now ready for review. I don't know if @houqp or @andygrove have some time to review it too as I think they were involved with the order by / group by rewrites |
fn select_groupby_orderby() { | ||
let sql = r#"SELECT | ||
avg(age) AS "value", | ||
date_trunc('month', birth_date) AS "birth_date" | ||
FROM person GROUP BY birth_date ORDER BY birth_date; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest add more test about qualified column, I'm worried that it may cause some problem.
Because I also try to fix this problem, but meet some barrier about qualified column.
such as use date_trunc('month', person.birth_date) AS "birth_date"
instead of
date_trunc('month', birth_date) AS "birth_date"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good idea -- Added in ba563ad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @jackwener
fn select_groupby_orderby() { | ||
let sql = r#"SELECT | ||
avg(age) AS "value", | ||
date_trunc('month', birth_date) AS "birth_date" | ||
FROM person GROUP BY birth_date ORDER BY birth_date; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good idea -- Added in ba563ad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very nice job🚀 I reviewed this PR intermittently for a few days.
Look great to me.
cc @mingmwang @liukun4515
Thank you for the review @jackwener I will plan to merge this PR tomorrow (in about 24 hours) unless anyone else would like time to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🥳
Benchmark runs are scheduled for baseline = b09c1ee and contender = 6eb0e36. 6eb0e36 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Draft:rewrite_expr
convenience method for rewritingExpr
s #5092Which issue does this PR close?
Closes #4854
Rationale for this change
Column resolution for ORDER BY expressions are incorrect
What changes are included in this PR?
Are these changes tested?
Yes,
Are there any user-facing changes?
bug fix