-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: ensure validation of unique constraints is efficient #65355
Conversation
Please ignore the first commit; it's from #65350. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/xform/limit_funcs.go, line 215 at r2 (raw file):
// SplitScanIntoUnionScans returns a UnionAll of Scan operators (with an // optional hard limit) that each scan over a single key from the original // Scan's constraints. If no such UnionAll of Scans can be found, ok=false is
[nit] returns a UnionAll [..] if the scans can provide the given ordering, and if the statistics suggest that splitting the scan could be beneficial.
pkg/sql/opt/xform/limit_funcs.go, line 263 at r2 (raw file):
if scan.Relational().Stats.Available && float64(scanCount*limit*threshold) >= scan.Relational().Stats.RowCount {
[nit] I would put all the code that is limit specific inside an if limit > 0
. It's more or less luck that it all works out with 0, better to not have to think about it.
pkg/sql/opt/xform/limit_funcs.go, line 272 at r2 (raw file):
// The index ordering must have a prefix of columns of length keyLength // followed by the ordering columns either in order or in reverse order. hasLimitOrderingSeq, reverse := indexHasOrderingSequence(
[nit] hasOrderingSeq
pkg/sql/opt/xform/rules/groupby.opt, line 163 at r2 (raw file):
) => ((OpName) $unionScans $aggs $private)
After this rule runs, we are relying on the existing exploration rules to generate a streaming GroupBy, correct?
a1b3ca6
to
7b9ad1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/opt/xform/limit_funcs.go, line 215 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] returns a UnionAll [..] if the scans can provide the given ordering, and if the statistics suggest that splitting the scan could be beneficial.
Done.
pkg/sql/opt/xform/limit_funcs.go, line 263 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] I would put all the code that is limit specific inside an
if limit > 0
. It's more or less luck that it all works out with 0, better to not have to think about it.
Done.
pkg/sql/opt/xform/limit_funcs.go, line 272 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] hasOrderingSeq
Done.
pkg/sql/opt/xform/rules/groupby.opt, line 163 at r2 (raw file):
Previously, RaduBerinde wrote…
After this rule runs, we are relying on the existing exploration rules to generate a streaming GroupBy, correct?
Yep! Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] the PR description is missing the second commit
Reviewed 5 of 7 files at r2, 2 of 2 files at r3, 3 of 3 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
scan memo.RelExpr, sp *memo.ScanPrivate, private *memo.GroupingPrivate, ) (_ memo.RelExpr, ok bool) { cons, ok := c.getKnownScanConstraint(sp)
[nit] move the definition of getKnownScanConstraint
from limit_funcs.go
to general_funcs.go
(probably in a separate commit)
pkg/sql/opt/xform/rules/groupby.opt, line 185 at r3 (raw file):
# inefficient and memory-intensive hash group by operation. Note that the # previous rule, SplitGroupByScanIntoUnionScans, is also needed since it would # apply in cases where a and b are not nullable.
[nit] It might be nice to expand this sentence a bit more—"are not nullable" confused me a bit when I read it because I thought it should read "are nullable". Something like "it would apply in cases where a and b are not nullable (the IS NOT NULL
filters would be eliminated during normalization)". (Or is it that the IS NOT NULL
filters are not part of the query to begin with?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool!
It might be nice to have a comment explaining why zero is passed as the limit to SplitScanIntoUnionScans
.
This commit adds two new exploration rules: SplitGroupByScanIntoUnionScans and SplitGroupByFilteredScanIntoUnionScans. SplitGroupByScanIntoUnionScans splits a non-inverted scan under a GroupBy, DistinctOn, or EnsureUpsertDistinctOn into a union-all of scans, where each scan is ordered on the grouping columns. This ordering is then maintained by the union-all operation and passed on to the grouping operation. Ordering on the grouping columns is important since it enables the grouping operation to execute in a streaming fashion, which is more efficient. Example: CREATE TABLE tab ( region STRING NOT NULL CHECK (region IN ('ASIA', 'EUROPE')), data INT NOT NULL, INDEX (region, data) ); SELECT DISTINCT data FROM tab; => SELECT DISTINCT data FROM (SELECT * FROM tab WHERE region='ASIA') UNION ALL (SELECT * FROM tab WHERE region='EUROPE'); This rule does not actually build the streaming grouping operation, but it allows another rule, GenerateStreamingGroupBy, to fire and use the new interesting orderings provided by the UnionAll of scans to build a streaming operation. SplitGroupByFilteredScanIntoUnionScans is like SplitGroupByScanIntoUnionScans, but the scan is wrapped in a Select. These transformations are important for ensuring that validation of the unique constraint in an implicitly-partitioned unique index is efficient. The validation query to verify that (a, b) is UNIQUE on table tbl looks like this: SELECT a, b FROM tbl WHERE a IS NOT NULL AND b IS NOT NULL GROUP BY a, b HAVING count(*) > 1 LIMIT 1; Without SplitGroupByFilteredScanIntoUnionScans, this query would require an inefficient and memory-intensive hash group by operation. Note that the previous rule, SplitGroupByScanIntoUnionScans, is also needed since it would apply in cases where a and b are not nullable. Fixes cockroachdb#56201 Release note (performance improvement): Validation of a new UNIQUE index in a REGIONAL BY ROW table no longer requires an inefficient and memory-intensive hash aggregation query. The optimizer can now plan the validation query so that it uses all streaming operations, which are much more efficient.
This commit moves splitScanIntoUnionScans and all helper functions into general_funcs.go since it's now used in both limit_funcs.go and groupby_funcs.go. This is a separate commit for ease of reviewing. Release note: None
7b9ad1f
to
35b1ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
[nit] the PR description is missing the second commit
Done.
It might be nice to have a comment explaining why zero is passed as the limit to SplitScanIntoUnionScans.
Done.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
[nit] move the definition of
getKnownScanConstraint
fromlimit_funcs.go
togeneral_funcs.go
(probably in a separate commit)
It was moved in the second commit.
pkg/sql/opt/xform/rules/groupby.opt, line 185 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
[nit] It might be nice to expand this sentence a bit more—"are not nullable" confused me a bit when I read it because I thought it should read "are nullable". Something like "it would apply in cases where a and b are not nullable (the
IS NOT NULL
filters would be eliminated during normalization)". (Or is it that theIS NOT NULL
filters are not part of the query to begin with?)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r5, 3 of 3 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/opt/xform/groupby_funcs.go, line 213 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It was moved in the second commit.
Ahh I missed that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
Build succeeded: |
opt: ensure validation of unique constraints is efficient
This commit adds two new exploration rules:
SplitGroupByScanIntoUnionScans
and
SplitGroupByFilteredScanIntoUnionScans
.SplitGroupByScanIntoUnionScans
splits a non-inverted scan under aGroupBy
,DistinctOn
, orEnsureUpsertDistinctOn
into a union-all of scans, where eachscan is ordered on the grouping columns. This ordering is then maintained by
the union-all operation and passed on to the grouping operation. Ordering on
the grouping columns is important since it enables the grouping operation to
execute in a streaming fashion, which is more efficient. Example:
This rule does not actually build the streaming grouping operation, but it
allows another rule,
GenerateStreamingGroupBy
, to fire and use the newinteresting orderings provided by the
UnionAll
of scans to build a streamingoperation.
SplitGroupByFilteredScanIntoUnionScans
is likeSplitGroupByScanIntoUnionScans
,but the scan is wrapped in a Select.
These transformations are important for ensuring that validation of the
unique constraint in an implicitly-partitioned unique index is efficient. The
validation query to verify that
(a, b)
is UNIQUE on tabletbl
looks like this:Without
SplitGroupByFilteredScanIntoUnionScans
, this query would require aninefficient and memory-intensive hash group by operation. Note that the
previous rule,
SplitGroupByScanIntoUnionScans
, is also needed since it wouldapply in cases where
a
andb
are not nullable.Fixes #56201
Release note (performance improvement): Validation of a new UNIQUE index in a
REGIONAL BY ROW table no longer requires an inefficient and memory-intensive
hash aggregation query. The optimizer can now plan the validation query so
that it uses all streaming operations, which are much more efficient.
opt: move splitScanIntoUnionScans to general_funcs.go
This commit moves
splitScanIntoUnionScans
and all helper functionsinto
general_funcs.go
since it's now used in bothlimit_funcs.go
andgroupby_funcs.go
. This is a separate commit for ease of reviewing.Release note: None