-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: update the correlation adjustment rule of Limit/TopN for TableScan (#26445) #26653
Conversation
Signed-off-by: ti-srebot <[email protected]>
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@ti-srebot: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ti-srebot: This cherry pick PR is for a release branch and has not yet been approved by release team. To merge this cherry pick, it must first be approved by the collaborators. AFTER it has been approved by collaborators, please ping the release team in a comment to request a cherry pick review. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/run-all-tests |
@qw4990 you're already a collaborator in bot's repo. |
After discussion, we decided not to pick it. |
cherry-pick #26445 to release-5.0
You can switch your code base to this Pull Request by using git-extras:
# In tidb repo: git pr https://github.com/pingcap/tidb/pull/26653
After apply modifications, you can push your change to this PR via:
What problem does this PR solve?
Issue Number: close #26088
Problem Summary: planner: update the correlation adjustment rule about Limit/TopN for TableScan
What is changed and how it works?
For
TableScan
withLimit
, the original estimation formula isLimitNum / Selectivity
.For
select * from t where year=2003 limit 1
, itsest-row
is1/est(year=2003)
.The formula is based on uniform assumption that we can get one row meeting the condition
year=2003
after scanning1/est(year=2003)
rows.This assumption might be broken when the column
year
has a high correlation with thePK
.For example, this table has 8 rows:
(PK, year): (1, 2002), (2, 2002), (3, 2002), (4, 2002), (5, 2003), (6, 2003), (7, 2003), (8, 2003)
.The
est-row
calculated by the formula is1/est(year=2003) = 1/0.5 = 2
, but actually theact-row
is4 + 1 = 5
, since we have to scan through 4 rows withyear=2002
before accessing a row withyear=2003
.To mitigate this problem, a correlation adjustment rule was introduced before.
In the case above, the
est-row
will be set tocount(year<2003) + 1
.But this rule also brings some risks that it may under-estimate
TableScan's
row count.For example, if rows in the table are
(PK, year): (0, 2000), (1, 2001), (2, 2002), (3, 2003), (4, 2004), (5, 2005), (6, 2006), (7, 1999)
.In this table,
year
is not strictly ordered, but it still has a high correlation withPK
, and the correlation value is larger than our threshold.Then the
est-row
ofwhere year=1999 limit 1
will be set tocount(year<1999) + 1
which is 1 and under-estimated extremely.This under-estimation may mislead the optimizer to use
TableScan
instead ofIndexScan(year)
.Multiple issues like this case are found, so for safety, we decided to only allow this rule to adjust the upper bound of
TableScan
, in other words, theest-row
ismax(LimitNum/Selectivity, AdjustedCount)
.Check List
Tests
Release note