-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: support new aggregate framework for HashAggExec #7268
Conversation
@XuHuaiyu This PR is too big. Would you please split it into a few small ones? |
I'll split this PR into small ones:
|
@XuHuaiyu All the dependency are satisfied, is this PR ready to be reviewed? |
yes, I'm removing the useless code. @zz-jason |
/run-all-tests |
/run-all-tests |
/run-common-test tidb-test=pr/599 |
for _, arg := range e.args { | ||
v, isNull, err = arg.EvalString(sctx, row) | ||
if err != nil { | ||
return errors.Trace(err) | ||
} | ||
if isNull { | ||
continue | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bug-fix.
Test case is here. https://github.com/pingcap/tidb/pull/7268/files#diff-ab83e4a8f21f295ae805f30d7e889703R332
executor/aggregate.go
Outdated
@@ -31,16 +30,16 @@ import ( | |||
"golang.org/x/net/context" | |||
) | |||
|
|||
type aggCtxsMapper map[string][]*aggregation.AggEvaluateContext | |||
type aggCtxsMapper map[string][]aggfuncs.PartialResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change its name to something like partialResultMapper
?
1 similar comment
@@ -51,9 +51,9 @@ func canProjectionBeEliminatedStrict(p *PhysicalProjection) bool { | |||
func resolveColumnAndReplace(origin *expression.Column, replace map[string]*expression.Column) { | |||
dst := replace[string(origin.HashCode(nil))] | |||
if dst != nil { | |||
colName := origin.ColName | |||
colName, retType := origin.ColName, origin.RetType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.
tidb> desc select distinct a from t2 ;
+---------------------+----------+------+-------------------------------------------------------------+
| id | count | task | operator info |
+---------------------+----------+------+-------------------------------------------------------------+
| HashAgg_4 | 8000.00 | root | group by:test.t2.a, funcs:firstrow(test.t2.a) |
| └─TableReader_8 | 10000.00 | root | data:TableScan_7 |
| └─TableScan_7 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------+----------+------+-------------------------------------------------------------+
3 rows in set (0.00 sec)
There will be a Projection between HashAgg4 and TableReader_8 during logical plan building,
whose schema is [a (type enum)]
.
We reset the return type of origin
(which is from HashAgg4's schema) here to make sure the return type of HashAgg_4 is the result of type inferring rather than the return type of Projection.
@@ -594,6 +594,11 @@ func (b *planBuilder) buildDistinct(child LogicalPlan, length int) *LogicalAggre | |||
} | |||
plan4Agg.SetChildren(child) | |||
plan4Agg.SetSchema(child.Schema().Clone()) | |||
// Distinct will be rewritten as first_row, we reset the type here since the return type | |||
// of first_row is not always the same as the column arg of first_row. | |||
for i, col := range plan4Agg.schema.Columns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the same change for all the places that create the schema of aggregate operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reset the type of aggregate here since this is not correct to set the schema of plan4Agg
as the child's schema in line 596.
If the schema of aggregate is set correctly, we do not need to make the same change.
@@ -345,6 +345,9 @@ func (a *AggFuncDesc) typeInfer4MaxMin(ctx sessionctx.Context) { | |||
a.Args[0] = expression.BuildCastFunction(ctx, a.Args[0], tp) | |||
} | |||
a.RetTp = a.Args[0].GetType() | |||
if a.RetTp.Tp == mysql.TypeEnum || a.RetTp.Tp == mysql.TypeSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also check TypeBit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeBit
has been checked when buildMaxMin and buildFirstRow.
I tried to check it here, but it caused other problems,
I'll try to move the check of TypeBit
here individually.
executor/aggregate.go
Outdated
return rows, false | ||
// getPartialResultBatch fetches a batch of partial results from HashAggIntermData. | ||
func (d *HashAggIntermData) getPartialResultBatch(sc *stmtctx.StatementContext, prs [][]aggfuncs.PartialResult, aggFuncs []aggfuncs.AggFunc, maxChunkSize int) (_ [][]aggfuncs.PartialResult, groupKeys [][]byte, reachEnd bool) { | ||
if len(prs) == maxChunkSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this function returns all the (group key, partial result)
pairs back to the caller, this check can be removed?
@@ -73,7 +72,7 @@ type HashAggFinalWorker struct { | |||
|
|||
rowBuffer []types.Datum | |||
mutableRow chunk.MutRow | |||
aggCtxsMap aggCtxsMapper | |||
partialResultMap aggPartialResultMapper | |||
groupSet *mvmap.MVMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems groupSet
can be declared as map[string]struct{}
, or just use executor.aggfuncs.stringSet
?
We can leave it to the future PRs.
@XuHuaiyu Please address the comments. |
/run-all-tests tidb-test=pr/599 |
/run-common-test tidb-test=pr/599 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @winoros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
After this pr, if we group_concat on a sorted source. Will it still be no order?
@winoros group_concat([ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]) to make it orderly. |
What have you changed? (mandatory)
This PR supports the new aggregate framework for HashAggExec.
Since HashAggExec has supported parallel execution, we introduce
MergePartialResult
to merge the partial results in the final phase,which would avoid the cost of converting PartialResult to chunk.
Please review the following PRs before this PR:
What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
exist tests
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
no
Does this PR affect tidb-ansible update? (mandatory)
no
Does this PR need to be added to the release notes? (mandatory)
#6952
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
Add a few positive/negative examples (optional)