release-21.1: opt: add cost penalty for scans with large cardinality #67388
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 2/2 commits from #66979.
/cc @cockroachdb/release
opt: ensure we prefer a reverse scan to sorting a forward scan
This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).
Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.
opt: add cost penalty for scans with large cardinality
This commit adds a new cost function,
largeCardinalityRowCountPenalty
,which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.
Fixes #64570
Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.