-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
opt: ensure validation of unique constraints is efficient
This commit adds two new exploration rules: SplitGroupByScanIntoUnionScans and SplitGroupByFilteredScanIntoUnionScans. SplitGroupByScanIntoUnionScans splits a non-inverted scan under a GroupBy, DistinctOn, or EnsureUpsertDistinctOn into a union-all of scans, where each scan is ordered on the grouping columns. This ordering is then maintained by the union-all operation and passed on to the grouping operation. Ordering on the grouping columns is important since it enables the grouping operation to execute in a streaming fashion, which is more efficient. Example: CREATE TABLE tab ( region STRING NOT NULL CHECK (region IN ('ASIA', 'EUROPE')), data INT NOT NULL, INDEX (region, data) ); SELECT DISTINCT data FROM tab; => SELECT DISTINCT data FROM (SELECT * FROM tab WHERE region='ASIA') UNION ALL (SELECT * FROM tab WHERE region='EUROPE'); This rule does not actually build the streaming grouping operation, but it allows another rule, GenerateStreamingGroupBy, to fire and use the new interesting orderings provided by the UnionAll of scans to build a streaming operation. SplitGroupByFilteredScanIntoUnionScans is like SplitGroupByScanIntoUnionScans, but the scan is wrapped in a Select. These transformations are important for ensuring that validation of the unique constraint in an implicitly-partitioned unique index is efficient. The validation query to verify that (a, b) is UNIQUE on table tbl looks like this: SELECT a, b FROM tbl WHERE a IS NOT NULL AND b IS NOT NULL GROUP BY a, b HAVING count(*) > 1 LIMIT 1; Without SplitGroupByFilteredScanIntoUnionScans, this query would require an inefficient and memory-intensive hash group by operation. Note that the previous rule, SplitGroupByScanIntoUnionScans, is also needed since it would apply in cases where a and b are not nullable. Fixes #56201 Release note (performance improvement): Validation of a new UNIQUE index in a REGIONAL BY ROW table no longer requires an inefficient and memory-intensive hash aggregation query. The optimizer can now plan the validation query so that it uses all streaming operations, which are much more efficient.
- Loading branch information
Showing
7 changed files
with
751 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.