-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543
Conversation
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
Codecov Report
@@ Coverage Diff @@
## master #18543 +/- ##
===========================================
Coverage 79.4399% 79.4399%
===========================================
Files 546 546
Lines 148098 148098
===========================================
Hits 117649 117649
Misses 20971 20971
Partials 9478 9478 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the rationale behind this new formula. For the old formula, selectivity = 1 / NDV * (ModifyRows / TotalRows)
, it assumes that the un-analyzed data rows are uniformly distributed and has similar distribution with analyzed data rows, this assumption sounds reasonable for me, while the new formula 1 / ndv
indicates that the analyzed data rows contain the target value as well, which is contradictive with the truth that the target value is out of range / not in CMSketch.
Actually which formula is more rational depends on what specific case it is. If all Since I can't find any specific case that makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/run-all-tests |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-3.0 in PR #18994 |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-3.1 in PR #18995 |
Signed-off-by: ti-srebot <[email protected]>
cherry pick to release-4.0 in PR #18997 |
…ctivity when out of range (#18543) (#18997) Signed-off-by: ti-srebot <[email protected]>
What problem does this PR solve?
Issue Number: close #18461
Problem Summary: If the estimated value is out of range, an inappropriate heuristic rule
sel = 1/NDV*(modifyRows/totalRows)
is used, which may cause unexpected lowsel
when a few rows are modified.What is changed and how it works?
Change this rule to:
Check List
Tests
Release note