-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](nereids) fix normalize repeat alias rewrite #38166
[Fix](nereids) fix normalize repeat alias rewrite #38166
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 39808 ms
|
TPC-DS: Total hot run time: 173341 ms
|
ClickBench: Total hot run time: 30.51 s
|
run buildall |
TPC-H: Total hot run time: 39674 ms
|
TPC-DS: Total hot run time: 174194 ms
|
ClickBench: Total hot run time: 30.58 s
|
run buildall |
TPC-H: Total hot run time: 39899 ms
|
add desc |
TPC-DS: Total hot run time: 173083 ms
|
ClickBench: Total hot run time: 30.82 s
|
run buildall |
TPC-H: Total hot run time: 39637 ms
|
TPC-DS: Total hot run time: 174428 ms
|
ClickBench: Total hot run time: 30.95 s
|
run cloud_p0 |
run cloud_p1 |
""" | ||
|
||
qt_alias "select id as c1 from grouping_alias_test_t group by grouping sets((id,value2),(id)) order by 1;" | ||
qt_alias_grouping_scalar "select id as c1, grouping(id) from grouping_alias_test_t group by grouping sets((id,value2),(id)) order by 1,2;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add more test case like bellow:
select id as c1, id, id as c3, grouping(id) from table1 group by grouping sets((c1, value2),(c3));
the alias in grouping sets
run buildall |
TPC-H: Total hot run time: 39981 ms
|
TPC-DS: Total hot run time: 175108 ms
|
ClickBench: Total hot run time: 30.86 s
|
run p0 |
run external |
PR approved by at least one committer and no changes requested. |
run p0 |
1 similar comment
run p0 |
Induced by apache#34196. In NormalizeRepeat, when NormalizeToSlot is called, aggregate function parameters, grouping scalar function parameters, and all expressions in grouping sets (including columns and column aliases) are pushed down to the lower-level project output. In the previous PR apache#34196, the context was split into two, but the two contexts were not consistent. It is possible that the triplets in one context save (id, c1, id as c1), and the triplets in the other context save (id, id, id). This causes id as c1 to be pushed down, but there is a reference to id in the upper-level LogicalRepeat, which causes the slot to be not found. This pr has been modified. If the same slot in the projection column has different aliases, for example, select id as c1, id, id as c3, grouping(id) from table1 group by grouping sets((id, value2),(id)); then id as c1 (using the first alias) will be pushed down to the project. In both the LogicalRepeat operator and the LogicalAggregate operator, c1 is referenced as the input slot, and id and c3 will not be used as input slots. before NormalizeRepeat: LogicalResultSink[32] ( outputExprs=[c1#3, id#0, c3#4, __grouping_3#5] ) +--LogicalRepeat ( groupingSets=[[id#0, value2#2], [id#0]], outputExpressions=[id#0 AS `c1`apache#3, id#0, id#0 AS `c3`apache#4, Grouping(id#0) AS `Grouping(id)`apache#5] ) +--LogicalOlapScan (qualified=table1) After NormalizeRepeat: LogicalResultSink[33] (outputExprs=[c1#3, id#0, c3#4, __grouping_3#5]) +--LogicalAggregate[30] (groupByExpr=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)], outputExpr=[c1#3, c1#3 AS `id`#0, c1#3 AS `c3`apache#4, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3) AS `GROUPING_PREFIX_c1`apache#5], hasRepeat=true ) +--LogicalRepeat (groupingSets=[[c1#3, value2#2], [c1#3]], outputExpressions=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)] ) +--LogicalProject[28] (projects=[id#0 AS `c1`apache#3, value2#2]) +--LogicalOlapScan (qualified=table1)
Induced by apache#34196. In NormalizeRepeat, when NormalizeToSlot is called, aggregate function parameters, grouping scalar function parameters, and all expressions in grouping sets (including columns and column aliases) are pushed down to the lower-level project output. In the previous PR apache#34196, the context was split into two, but the two contexts were not consistent. It is possible that the triplets in one context save (id, c1, id as c1), and the triplets in the other context save (id, id, id). This causes id as c1 to be pushed down, but there is a reference to id in the upper-level LogicalRepeat, which causes the slot to be not found. This pr has been modified. If the same slot in the projection column has different aliases, for example, select id as c1, id, id as c3, grouping(id) from table1 group by grouping sets((id, value2),(id)); then id as c1 (using the first alias) will be pushed down to the project. In both the LogicalRepeat operator and the LogicalAggregate operator, c1 is referenced as the input slot, and id and c3 will not be used as input slots. before NormalizeRepeat: LogicalResultSink[32] ( outputExprs=[c1#3, id#0, c3#4, __grouping_3#5] ) +--LogicalRepeat ( groupingSets=[[id#0, value2#2], [id#0]], outputExpressions=[id#0 AS `c1`apache#3, id#0, id#0 AS `c3`apache#4, Grouping(id#0) AS `Grouping(id)`apache#5] ) +--LogicalOlapScan (qualified=table1) After NormalizeRepeat: LogicalResultSink[33] (outputExprs=[c1#3, id#0, c3#4, __grouping_3#5]) +--LogicalAggregate[30] (groupByExpr=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)], outputExpr=[c1#3, c1#3 AS `id`#0, c1#3 AS `c3`apache#4, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3) AS `GROUPING_PREFIX_c1`apache#5], hasRepeat=true ) +--LogicalRepeat (groupingSets=[[c1#3, value2#2], [c1#3]], outputExpressions=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)] ) +--LogicalProject[28] (projects=[id#0 AS `c1`apache#3, value2#2]) +--LogicalOlapScan (qualified=table1)
Induced by #34196. In NormalizeRepeat, when NormalizeToSlot is called, aggregate function parameters, grouping scalar function parameters, and all expressions in grouping sets (including columns and column aliases) are pushed down to the lower-level project output. In the previous PR #34196, the context was split into two, but the two contexts were not consistent. It is possible that the triplets in one context save (id, c1, id as c1), and the triplets in the other context save (id, id, id). This causes id as c1 to be pushed down, but there is a reference to id in the upper-level LogicalRepeat, which causes the slot to be not found. This pr has been modified. If the same slot in the projection column has different aliases, for example, select id as c1, id, id as c3, grouping(id) from table1 group by grouping sets((id, value2),(id)); then id as c1 (using the first alias) will be pushed down to the project. In both the LogicalRepeat operator and the LogicalAggregate operator, c1 is referenced as the input slot, and id and c3 will not be used as input slots. before NormalizeRepeat: LogicalResultSink[32] ( outputExprs=[c1#3, id#0, c3#4, __grouping_3#5] ) +--LogicalRepeat ( groupingSets=[[id#0, value2#2], [id#0]], outputExpressions=[id#0 AS `c1`#3, id#0, id#0 AS `c3`#4, Grouping(id#0) AS `Grouping(id)`#5] ) +--LogicalOlapScan (qualified=table1) After NormalizeRepeat: LogicalResultSink[33] (outputExprs=[c1#3, id#0, c3#4, __grouping_3#5]) +--LogicalAggregate[30] (groupByExpr=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)], outputExpr=[c1#3, c1#3 AS `id`#0, c1#3 AS `c3`#4, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3) AS `GROUPING_PREFIX_c1`#5], hasRepeat=true ) +--LogicalRepeat (groupingSets=[[c1#3, value2#2], [c1#3]], outputExpressions=[c1#3, value2#2, GROUPING_ID#7, GROUPING_PREFIX_c1#6 originExpression=Grouping(c1#3)] ) +--LogicalProject[28] (projects=[id#0 AS `c1`#3, value2#2]) +--LogicalOlapScan (qualified=table1)
Induced by #34196.
In NormalizeRepeat, when NormalizeToSlot is called, aggregate function parameters, grouping scalar function parameters, and all expressions in grouping sets (including columns and column aliases) are pushed down to the lower-level project output.
In the previous PR #34196, the context was split into two, but the two contexts were not consistent. It is possible that the triplets in one context save (id, c1, id as c1), and the triplets in the other context save (id, id, id).
This causes id as c1 to be pushed down, but there is a reference to id in the upper-level LogicalRepeat, which causes the slot to be not found.
This pr has been modified.
If the same slot in the projection column has different aliases, for example,
then id as c1 (using the first alias) will be pushed down to the project.
In both the LogicalRepeat operator and the LogicalAggregate operator, c1 is referenced as the input slot, and id and c3 will not be used as input slots.