planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543

qw4990 · 2020-07-14T07:44:19Z

What problem does this PR solve?

Issue Number: close #18461

Problem Summary: If the estimated value is out of range, an inappropriate heuristic rule sel = 1/NDV*(modifyRows/totalRows) is used, which may cause unexpected low sel when a few rows are modified.

What is changed and how it works?

Change this rule to:

func outOfRangeEQSelectivity(ndv, modifyRows, totalRows int64) float64 {
	if modifyRows == 0 {
		return 0 // it must be 0 since the histogram contains the whole data
	}
	if ndv < outOfRangeBetweenRate {
		ndv = outOfRangeBetweenRate // avoid inaccurate selectivity caused by small NDV
	}
	selectivity := 1 / float64(ndv) // TODO: After extracting TopN from histograms, we can minus the TopN fraction here.
	if selectivity*float64(totalRows) > float64(modifyRows) {
		selectivity = float64(modifyRows) / float64(totalRows)
	}
	return selectivity
}

Check List

Tests

Unit test

Release note

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range

sre-bot · 2020-07-14T07:44:41Z

No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md

sre-bot · 2020-07-14T08:01:11Z

No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md

codecov · 2020-07-14T08:24:13Z

Codecov Report

Merging #18543 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18543   +/-   ##
===========================================
  Coverage   79.4399%   79.4399%           
===========================================
  Files           546        546           
  Lines        148098     148098           
===========================================
  Hits         117649     117649           
  Misses        20971      20971           
  Partials       9478       9478

qw4990 · 2020-07-21T06:42:35Z

PTAL @eurekaka @winoros

eurekaka

I don't understand the rationale behind this new formula. For the old formula, selectivity = 1 / NDV * (ModifyRows / TotalRows), it assumes that the un-analyzed data rows are uniformly distributed and has similar distribution with analyzed data rows, this assumption sounds reasonable for me, while the new formula 1 / ndv indicates that the analyzed data rows contain the target value as well, which is contradictive with the truth that the target value is out of range / not in CMSketch.

qw4990 · 2020-07-22T08:17:11Z

I don't understand the rationale behind this new formula. For the old formula, selectivity = 1 / NDV * (ModifyRows / TotalRows), it assumes that the un-analyzed data rows are uniformly distributed and has similar distribution with analyzed data rows, this assumption sounds reasonable for me, while the new formula 1 / ndv indicates that the analyzed data rows contain the target value as well, which is contradictive with the truth that the target value is out of range / not in CMSketch.

Actually which formula is more rational depends on what specific case it is.

If all ModifyRows rows are new-inserted and all of them are out-of-range, the formula 1/NDV is the correct formula of your assumption the un-analyzed data rows are uniformly distributed and has similar distribution with analyzed data rows.
Because for every new-inserted unique value, there should be Tot/NDV rows with the same value on it.

Since I can't find any specific case that makes 1/NDV * (Modi/Tot) be more rational than 1/NDV, and according to other DB's strategies(#18461), can we think 1/NDV is more robust than 1/NDV * (Modi/Tot)?

eurekaka

LGTM

winoros

lgtm

qw4990 · 2020-08-05T05:59:22Z

/run-all-tests

Signed-off-by: ti-srebot <[email protected]>

ti-srebot · 2020-08-05T06:12:29Z

cherry pick to release-3.0 in PR #18994

Signed-off-by: ti-srebot <[email protected]>

ti-srebot · 2020-08-05T06:14:44Z

cherry pick to release-3.1 in PR #18995

Signed-off-by: ti-srebot <[email protected]>

ti-srebot · 2020-08-05T06:17:03Z

cherry pick to release-4.0 in PR #18997

…ctivity when out of range (#18543) (#18997) Signed-off-by: ti-srebot <[email protected]>

update a heuristic rule

5ff6bc1

qw4990 requested a review from a team as a code owner July 14, 2020 07:44

qw4990 requested review from SunRunAway and removed request for a team July 14, 2020 07:44

fix CI

394f5f1

qw4990 requested review from eurekaka and winoros July 14, 2020 08:04

qw4990 added sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. labels Jul 14, 2020

fix CI

b3b8bc7

github-actions bot added the component/statistics label Jul 14, 2020

fix CI

413462a

qw4990 added needs-cherry-pick-3.0 labels Jul 14, 2020

eurekaka reviewed Jul 21, 2020

View reviewed changes

qw4990 added 2 commits July 27, 2020 17:12

address comments

9eecd96

Merge branch 'master' into fix-heuristic-method

5eadafd

eurekaka reviewed Jul 27, 2020

View reviewed changes

ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 27, 2020

winoros approved these changes Aug 5, 2020

View reviewed changes

ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Aug 5, 2020

ti-srebot approved these changes Aug 5, 2020

View reviewed changes

ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Aug 5, 2020

Merge branch 'master' into fix-heuristic-method

60828ab

qw4990 merged commit aeee152 into pingcap:master Aug 5, 2020

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Aug 5, 2020

cherry pick pingcap#18543 to release-3.0

0c3971e

Signed-off-by: ti-srebot <[email protected]>

ti-srebot mentioned this pull request Aug 5, 2020

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range (#18543) #18994

Closed

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Aug 5, 2020

cherry pick pingcap#18543 to release-3.1

24ef70d

Signed-off-by: ti-srebot <[email protected]>

ti-srebot mentioned this pull request Aug 5, 2020

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range (#18543) #18995

Closed

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Aug 5, 2020

cherry pick pingcap#18543 to release-4.0

c9ed082

Signed-off-by: ti-srebot <[email protected]>

ti-srebot mentioned this pull request Aug 5, 2020

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range (#18543) #18997

Merged

qw4990 mentioned this pull request Aug 14, 2020

fix inappropriate heuristic method of estimating out-of-range values #19200

Merged

ti-srebot added a commit that referenced this pull request Sep 1, 2020

planner: fix the inappropriate heuristic rule to estimate the EQ sele…

83fc2d8

…ctivity when out of range (#18543) (#18997) Signed-off-by: ti-srebot <[email protected]>

winoros mentioned this pull request Nov 5, 2020

Not so good row count estimation formula when the queried value is out of histogram's bounds #20875

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543

qw4990 commented Jul 14, 2020 •

edited

Loading

sre-bot commented Jul 14, 2020

sre-bot commented Jul 14, 2020

codecov bot commented Jul 14, 2020 •

edited

Loading

qw4990 commented Jul 21, 2020

eurekaka left a comment

qw4990 commented Jul 22, 2020 •

edited

Loading

eurekaka left a comment

winoros left a comment

qw4990 commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543

planner: fix the inappropriate heuristic rule to estimate the EQ selectivity when out of range #18543

Conversation

qw4990 commented Jul 14, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

sre-bot commented Jul 14, 2020

sre-bot commented Jul 14, 2020

codecov bot commented Jul 14, 2020 • edited Loading

Codecov Report

qw4990 commented Jul 21, 2020

eurekaka left a comment

Choose a reason for hiding this comment

qw4990 commented Jul 22, 2020 • edited Loading

eurekaka left a comment

Choose a reason for hiding this comment

winoros left a comment

Choose a reason for hiding this comment

qw4990 commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

ti-srebot commented Aug 5, 2020

qw4990 commented Jul 14, 2020 •

edited

Loading

codecov bot commented Jul 14, 2020 •

edited

Loading

qw4990 commented Jul 22, 2020 •

edited

Loading