Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using display_name for Expr::Aggregation #11020

Merged
merged 1 commit into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions datafusion/core/src/physical_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1916,6 +1916,7 @@ pub fn create_aggregate_expr_and_maybe_filter(
// unpack (nested) aliased logical expressions, e.g. "sum(col) as total"
let (name, e) = match e {
Expr::Alias(Alias { expr, name, .. }) => (name.clone(), expr.as_ref()),
Expr::AggregateFunction(_) => (e.display_name().unwrap_or(physical_name(e)?), e),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call display_name in physical_name like what ScalarFunction does.
And even more, I think there are no such difference between logical name and physical name, maybe we can have one name for these two 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be nice if we also use display_name in impl fmt::Display for Expr

Copy link
Contributor

@jayzhan211 jayzhan211 Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems CastExpr is different in write_name and Expr's Display 🤔

And, there is an issue that expect CastExpr be shown in write name

#10274

CastExpr is removed in #3222, but I didn't find the rationale of doing this

If the issue is due to the schema change, then it will be quite tricky #10276

We can reuse display_name in physical_name to fix the regression first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems CastExpr is different in write_name and Expr's Display 🤔

And, there is an issue that expect CastExpr be shown in write name

#10274

CastExpr is removed in #3222, but I didn't find the rationale of doing this

If the issue is due to the schema change, then it will be quite tricky #10276

We can reuse display_name in physical_name to fix the regression first

Yes, I think calling display_name like physical name is a good idea, I tried it before, but there are some tests failed, there might be some reason that the display_name is different from physical name, I think for this issue, only change AggregateFunction is enough, I can take other related issue and fix them in the following PR, Thanks for review !

_ => (physical_name(e)?, e),
};

Expand Down
37 changes: 35 additions & 2 deletions datafusion/sqllogictest/test_files/expr.slt
Original file line number Diff line number Diff line change
Expand Up @@ -2473,7 +2473,7 @@ host2 202
host3 303

# TODO: Issue tracked in https://github.com/apache/datafusion/issues/10364
query error
query TR
select
t2.server['c3'] as host,
sum((
Expand All @@ -2488,6 +2488,10 @@ select
) t2
where t2.server['c3'] IS NOT NULL
group by t2.server['c3'] order by host;
----
host1 101
host2 202
host3 303

# can have 2 projections with aggr(short_circuited), with different short-circuited expr
query TRR
Expand Down Expand Up @@ -2559,7 +2563,7 @@ host2 2.2 202
host3 3.3 303

# TODO: Issue tracked in https://github.com/apache/datafusion/issues/10364
query error
query TRR
select
t2.server['c3'] as host,
sum((
Expand All @@ -2579,6 +2583,10 @@ select
) t2
where t2.server['c3'] IS NOT NULL
group by t2.server['c3'] order by host;
----
host1 1.1 101
host2 2.2 202
host3 3.3 303

# can have 2 projections with aggr(short_circuited), with the same short-circuited expr (e.g. coalesce)
query TRR
Expand All @@ -2587,3 +2595,28 @@ select t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] as host, sum(coalesc
host1 1.1 101
host2 2.2 202
host3 3.3 303

statement ok
set datafusion.sql_parser.dialect = 'Postgres';

statement ok
create table t (a float) as values (1), (2), (3);

query TT
explain select min(a) filter (where a > 1) as x from t;
----
logical_plan
01)Projection: MIN(t.a) FILTER (WHERE t.a > Int64(1)) AS x
02)--Aggregate: groupBy=[[]], aggr=[[MIN(t.a) FILTER (WHERE t.a > Float32(1)) AS MIN(t.a) FILTER (WHERE t.a > Int64(1))]]
03)----TableScan: t projection=[a]
physical_plan
01)ProjectionExec: expr=[MIN(t.a) FILTER (WHERE t.a > Int64(1))@0 as x]
02)--AggregateExec: mode=Single, gby=[], aggr=[MIN(t.a) FILTER (WHERE t.a > Int64(1))]
03)----MemoryExec: partitions=1, partition_sizes=[1]


statement ok
drop table t;

statement ok
set datafusion.sql_parser.dialect = 'Generic';