-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](nereids)make agg output unchanged after normalized repeat #36207
[Fix](nereids)make agg output unchanged after normalized repeat #36207
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
// Make the output ExprId unchanged | ||
if (!e.getExprId().equals(originalAggOutput.get(i).getExprId())) { | ||
e = new Alias(originalAggOutput.get(i).getExprId(), e, originalAggOutput.get(i).getName()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changed in normalizeToUseSlotRef
, could we ensure it not changed in normalizeToUseSlotRef
?
run buildall |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 40006 ms
|
TPC-DS: Total hot run time: 173295 ms
|
ClickBench: Total hot run time: 30.49 s
|
run feut |
…he#36207) The NormalizeRepeat rule can change the output of agg. For example: ```sql SELECT col_int_undef_signed2 AS C1 , col_int_undef_signed2 FROM normalize_repeat_name_unchanged GROUP BY GROUPING SETS ( (col_int_undef_signed2), (col_int_undef_signed2)) ``` Before fixing the bug, the plan is: ```sql LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] ) +--LogicalOlapScan ( ) ``` This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) child output is: [C1#7] This pr makes agg output unchanged after normalized repeat. After fixing, the plan is: ```sql LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`apache#1], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] ) +--LogicalOlapScan ( ) ``` --------- Co-authored-by: feiniaofeiafei <[email protected]>
The NormalizeRepeat rule can change the output of agg. For example: ```sql SELECT col_int_undef_signed2 AS C1 , col_int_undef_signed2 FROM normalize_repeat_name_unchanged GROUP BY GROUPING SETS ( (col_int_undef_signed2), (col_int_undef_signed2)) ``` Before fixing the bug, the plan is: ```sql LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] ) +--LogicalOlapScan ( ) ``` This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) child output is: [C1#7] This pr makes agg output unchanged after normalized repeat. After fixing, the plan is: ```sql LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`#1], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] ) +--LogicalOlapScan ( ) ``` --------- Co-authored-by: feiniaofeiafei <[email protected]>
cherry-pick #36207 to branch-2.0 The NormalizeRepeat rule can change the output of agg. For example: SELECT col_int_undef_signed2 AS C1 , col_int_undef_signed2 FROM normalize_repeat_name_unchanged GROUP BY GROUPING SETS ( (col_int_undef_signed2), (col_int_undef_signed2)) Before fixing the bug, the plan is: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] ) +--LogicalOlapScan ( ) This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) child output is: [C1#7] This pr makes agg output unchanged after normalized repeat. After fixing, the plan is: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`#1], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`#7], excepts=[] ) +--LogicalOlapScan ( )
cherry-pick #36207 to branch-2.1 Co-authored-by: feiniaofeiafei <[email protected]>
…he#36369) cherry-pick apache#36207 to branch-2.0 The NormalizeRepeat rule can change the output of agg. For example: SELECT col_int_undef_signed2 AS C1 , col_int_undef_signed2 FROM normalize_repeat_name_unchanged GROUP BY GROUPING SETS ( (col_int_undef_signed2), (col_int_undef_signed2)) Before fixing the bug, the plan is: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] ) +--LogicalOlapScan ( ) This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) child output is: [C1#7] This pr makes agg output unchanged after normalized repeat. After fixing, the plan is: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] ) +--LogicalProject[94] ( distinct=false, projects=[C1#7, C1#7 as `col_int_undef_signed2`apache#1], excepts=[] ) +--LogicalAggregate[93] ( groupByExpr=[C1#7, GROUPING_ID#8], outputExpr=[C1#7, GROUPING_ID#8], hasRepeat=true ) +--LogicalRepeat ( groupingSets=[[C1#7], [C1#7]], outputExpressions=[C1#7, GROUPING_ID#8] ) +--LogicalProject[91] ( distinct=false, projects=[col_int_undef_signed2#1 AS `C1`apache#7], excepts=[] ) +--LogicalOlapScan ( )
The NormalizeRepeat rule can change the output of agg.
For example:
Before fixing the bug, the plan is:
This can lead to column not found in LogicalResultSink, report error: Input slot(s) not in childs output: col_int_undef_signed2#1 in plan: LogicalResultSink[97] ( outputExprs=[C1#7, col_int_undef_signed2#1] )
child output is: [C1#7]
This pr makes agg output unchanged after normalized repeat. After fixing, the plan is: