Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner/core: change agg cost factor #25210

Merged
merged 10 commits into from
Jun 8, 2021

Conversation

hanfei1991
Copy link
Member

What problem does this PR solve?

Problem Summary:

Right now the cost factor of only-group-by (always transformed from select distinct ....) statements is 1.0, leading to not pushing down agg in pseodu stats senerios. However, the cost factor is not reasonable to set 1.0 while the first-row's cost factor is only 0.1. This results in some unexpected behaviours, such as:

mysql> explain select a, b from t1 group by a;
+---------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| id                        | estRows  | task      | access object | operator info                                                                                 |
+---------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| HashAgg_9                 | 8000.00  | root      |               | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a, funcs:firstrow(Column#5)->test.t1.b |
| └─TableReader_10          | 8000.00  | root      |               | data:HashAgg_5                                                                                |
|   └─HashAgg_5             | 8000.00  | cop[tikv] |               | group by:test.t1.a, funcs:firstrow(test.t1.b)->Column#5                                       |
|     └─TableFullScan_8     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                |
+---------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

mysql> explain select a from t1 group by a;
+--------------------------+----------+-----------+---------------+----------------------------------------------------------+
| id                       | estRows  | task      | access object | operator info                                            |
+--------------------------+----------+-----------+---------------+----------------------------------------------------------+
| HashAgg_7                | 8000.00  | root      |               | group by:test.t1.a, funcs:firstrow(test.t1.a)->test.t1.a |
| └─TableReader_12         | 10000.00 | root      |               | data:TableFullScan_11                                    |
|   └─TableFullScan_11     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                           |
+--------------------------+----------+-----------+---------------+----------------------------------------------------------+
3 rows in set (0.00 sec)

It's hard to say which one is better because of the lack of stats info. But at least keeping same behaviours will not cause perils. In some MPP senarios, such as every complicated sql, the plans closer to the root (TiDB) might miss the stats info and choose 1-phase aggregation. Considering 2-phase agg is better than 1-phase in most cases, I think this change is beneficial.

However, we still need some experiments and reliable tests to calibrate the cost factors. This work should be done in the near future.

What is changed and how it works?

What's Changed:
change the factor of only-group-by agg from 1.0 to 0.1, which is same as first-row.

Tests

  • Unit test. Over 100 lines of relavent tests are changed. Some tests involed with in-subquery will generate a "distinct" group by. I have checked the test changes and made sure they are safe.

Release note

  • planner/core: change agg cost factor

@hanfei1991 hanfei1991 requested review from a team as code owners June 7, 2021 08:51
@hanfei1991 hanfei1991 requested review from lzmhhh123 and removed request for a team June 7, 2021 08:51
@ti-chi-bot ti-chi-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 7, 2021
@@ -1058,7 +1058,7 @@ func (p *basePhysicalAgg) getAggFuncCostFactor() (factor float64) {
}
}
if factor == 0 {
factor = 1.0
factor = 0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add some comments and directly use aggFuncFactor[firstrow].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, using aggFuncFactor[firstrow] is a good idea!

@@ -1058,7 +1058,11 @@ func (p *basePhysicalAgg) getAggFuncCostFactor() (factor float64) {
}
}
if factor == 0 {
factor = 1.0
if isMPP {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments and TODO for this if branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 8, 2021
@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • qw4990
  • winoros

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 8, 2021
@hanfei1991
Copy link
Member Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 1a5ed62

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 8, 2021
@hanfei1991 hanfei1991 deleted the hanfei/change-agg-factor branch June 8, 2021 07:46
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jun 8, 2021
@ti-srebot
Copy link
Contributor

cherry pick to release-5.0 in PR #25241

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jun 8, 2021
@ti-srebot
Copy link
Contributor

cherry pick to release-5.1 in PR #25242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.0 needs-cherry-pick-release-5.1 size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants