opt: Locality optimized search #55185

nvanbenschoten · 2020-10-02T19:28:41Z

In #41535, we introduced the concept of partitioned uniqueness checks, coupled with partitioned unique indexes. This ensures that all data for a given row in a REGIONAL table is stored in a single region while continuing to allow the use of UNIQUE constraints.

Partitioned unique indexes have the interesting property that if a value is found in one partition of the index, it is known that no others exist in any other partition. This presents an opportunity to optimize lookups on these indexes. By default, we would expect that lookups would have to fanout to each partition when searching for a match to the unique column(s) – this is how hash-sharded index lookups work. But this naive fanout doesn't acknowledge the disparity of lookup cost between different partitions of the index. In the case of REGIONAL tables, one partition is local and all others are across a WAN hop. It also doesn't acknowledge that given a workload with some degree of access locality, the partition stored in the local region is more likely to hold the matching row than any other partition.

For these reasons, we would be well served to optimize these lookups for cases where the matching row is in the local partition. So instead of fanning out to all partitions immediately, we should search for the row in the local partition first, and only fan out to all other partitions if that initial lookup missed.

Example:

Picture a schema that looks like the following, using a partitioned unique index.

CREATE TABLE data (
    _region STRING NOT NULL CHECK (_region IN ('us-east', 'us-west', 'eu-west', 'asia-south')),
    id UUID UNIQUE NOT NULL,
    PRIMARY KEY (_region, id)
)

This would map to a single index on (_region, id), because a separate index on id would not be necessary with partitioned uniqueness checks.

The question becomes: "how do we execute the following query from a us-east gateway"?

SELECT * FROM data WHERE id = '0123-4567-89ab-cdef'

The naive approach would be to evaluate it like:

-- query all four regions in parallel
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region IN ('us-east', 'us-west', 'eu-west', 'asia-south')

A better approach using a locality optimized search algorithm would evaluate it like:

--- query the local region first
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region = 'us-east'
--- if this returns a row, return that single row and short circuit
--- if not, proceed with the following query of the remaining three regions
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region IN ('us-west', 'eu-west', 'asia-south')

Extension: Stale Read Hints

The approach listed above is a huge improvement. It's optimal for a local region search hit and correct for a remote region search hit or a miss. However, we can do better for remote region search hits by trying to avoid immediately fanning out to all remote regions on a local search miss. In REGIONAL tables, we expect that each region will have at least a non-voting replica of all data. That means that each region has fast access to a stale copy of the entire dataset. We can use this stale data as a hint for where to look if a row is not in the local region. Note that because this is stale, we still need a fallback plan in case the row has been "re-homed" recently.

Using this approach, the locality optimized search algorithm would become:

--- query the local region first
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region = 'us-east'
--- if this returns a row, return that single row and short circuit
--- if not, perform a stale read on the local copy of this data to locate the expected home region
SELECT * FROM data AS OF SYSTEM TIME follower_read_timestamp() WHERE id = '0123-4567-89ab-cdef' AND _region IN ('us-west', 'eu-west', 'asia-south')
--- if this returns a row, query only that remote region next
--- otherwise, perform the same fanout query transactionally and return
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region = 'us-west'
--- if this returns a row, return that single row and short circuit
--- if not, proceed with the following query of the remaining two regions
SELECT * FROM data WHERE id = '0123-4567-89ab-cdef' AND _region IN ('eu-west', 'asia-south')

This extension draws some parallels to the asynchronously updated "Lookup Master" index present in SLOG. With the proposed approach here, we avoid the need for a new index maintained on the side in a specialized manner at the expense of a local fanout during the stale lookup. That seems like a very reasonable trade-off.

Extension: Ignore Empty Regions

With the current proposal, adding a new region alternative to a table (broadening the CHECK constraint) immediately increases the fan-out factor of queries that miss the current region. This is mildly undesirable because users might want to add a new region to a database for replication purposes without homing any data in that region, at least for some tables.

To alleviate this problem, we could maintain some form of consistent table statistics (probably on the schema object itself because existing table stats are stale and therefore unsuitable) denoting which regions are empty and can be ignored for the purposes of queries to the table. This marker would need to be updated on the first time that a row is inserted into the empty region to instruct queries to begin searching in it.

This would allow users to more effectively "pay for what they use". They would avoid searching in regions that don't contain rows. The most important instance of this is that when all rows are homed in a single region and a query hits a gateway in that region, there would be no remote fan out at all.

Epic CRDB-2528

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2020-10-03T21:35:00Z

One thing that I didn't mention above but that deserves a callout here is that we'll want this search behavior to apply even when the lookup is only a part of a larger query plan. For instance, the initial row scan of a DELETE statement (e.g. DELETE FROM data WHERE id = '0123-4567-89ab-cdef') or an UPDATE statement (e.g. UPDATE data SET ... WHERE id = '0123-4567-89ab-cdef') should benefit from this optimization as well. This will ensure that these operations are local if the row they reference is homed in the gateway's region.

Given the design of the SQL optimizer, I expect that we'll get this for free, but we should make sure to keep this in mind when building this out.

nvanbenschoten · 2020-10-05T04:38:58Z

Two other notes:

this lookup approach should be used for foreign key checks. As in the previous comment, I expect that we'll get this for free now that fk checks have been pulled into the optimizer, but we should ensure that.
this lookup approach should apply to other forms of point lookups. For instance, it should apply to multi-point lookups like SELECT * FROM data WHERE id IN ('0123-4567-89ab-cdef', '1123-4567-89ab-cdef', '2123-4567-89ab-cdef'). I'm less confident that this will come for free without any additional work. If this needs more then it can be viewed as an extension.

This commit teaches the optimizer that columns with a valid UNIQUE WITHOUT INDEX constraint form a key, and the functional dependencies should reflect that. This will be necessary to support locality optimized search. Fixes cockroachdb#58944 Informs cockroachdb#55185 Release note (performance improvement): The optimizer now knows that the unique columns in an implicitly partitioned unique index form a key. This can be used to enable certain optimizations and may result in better plans.

60591: opt: create key for UNIQUE WITHOUT INDEX constraints r=rytaft a=rytaft **opt: don't add unique constraints with an index to test catalog** This commit updates the test catalog so that it is consistent with the real catalog and doesn't add unique constraints with an index to the catalog table. This also prevents some tests from failing unnecessarily. Release note: None **opt: create key for UNIQUE WITHOUT INDEX constraints** This commit teaches the optimizer that columns with a valid `UNIQUE WITHOUT INDEX` constraint form a key, and the functional dependencies should reflect that. This will be necessary to support locality optimized search. Fixes #58944 Informs #55185 Release note (performance improvement): The optimizer now knows that the unique columns in an implicitly partitioned unique index form a key. This can be used to enable certain optimizations and may result in better plans. Co-authored-by: Rebecca Taft <[email protected]>

This commit adds a new session setting locality_optimized_search and corresponding cluster default setting sql.defaults.locality_optimized_search.enabled. In future commits, these settings will be used to enable or disable locality optimized search. If enabled, the optimizer will try to plan scans and lookup joins in which local nodes are searched for matching rows before remote nodes, in the hope that the execution engine can avoid visiting remote nodes. Informs cockroachdb#55185 Release note (sql change): Added a new session setting locality_optimized_search and corresponding cluster default setting sql.defaults.locality_optimized_search.enabled. Both are currently disabled by default, and are currently unused. In the future, these settings will be used to enable or disable locality optimized search. If enabled, the optimizer will try to search locally for rows in REGIONAL BY ROW tables before searching remote nodes.

Prior to this commit, the opt catalog did not include zone information or prefixes specific to each partition of an index. This commit adds this information since it will be necessary to support locality optimized search in a future commit. Informs cockroachdb#55185 Release note: None

This commit adds a new transformation rule, GenerateLocalityOptimizedAntiJoin. GenerateLocalityOptimizedAntiJoin converts an anti join into a locality optimized anti join if possible. A locality optimized anti join is implemented as a nested pair of anti lookup joins and is designed to avoid communicating with remote nodes (relative to the gateway region) if at all possible. A locality optimized anti join can be planned under the following conditions: - The anti join can be planned as a lookup join. - The lookup join scans multiple spans in the lookup index for each input row, with some spans targeting partitions on local nodes (relative to the gateway region), and some targeting partitions on remote nodes. It is not known which span(s) will contain the matching row(s). The result of GenerateLocalityOptimizedAntiJoin will be a nested pair of anti lookup joins in which the first lookup join is an anti join targeting the local values from the original join, and the second lookup join is an anti join targeting the remote values. Because of the way anti join is defined, a row will only be returned by the first anti join if a match is *not* found locally. If a match is found, no row will be returned and therefore the second lookup join will not need to search the remote nodes. This nested pair of anti joins is logically equivalent to the original, single anti join. This is a useful optimization if there is locality of access in the workload, such that rows tend to be accessed from the region where they are located. If there is no locality of access, using a locality optimized anti join could be a slight pessimization, since rows residing in remote regions will be found slightly more slowly than they would be otherwise. For example, suppose we have a multi-region database with regions 'us-east1', 'us-west1' and 'europe-west1', and we have the following tables and query, issued from 'us-east1': CREATE TABLE parent ( p_id INT PRIMARY KEY ) LOCALITY REGIONAL BY ROW; CREATE TABLE child ( c_id INT PRIMARY KEY, c_p_id INT REFERENCES parent (p_id) ) LOCALITY REGIONAL BY ROW; SELECT * FROM child WHERE NOT EXISTS ( SELECT * FROM parent WHERE p_id = c_p_id ) AND c_id = 10; Normally, this would produce the following plan: anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-east1', 'us-west1')) ├── scan child │ └── constraint: /7/5 │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ ├── [/'us-east1'/10 - /'us-east1'/10] │ └── [/'us-west1'/10 - /'us-west1'/10] └── filters (true) but if the session setting locality_optimized_partitioned_index_scan is enabled, the optimizer will produce this plan, using locality optimized search, both for the scan of child and for the lookup join with parent. See the rule GenerateLocalityOptimizedScan for details about how the optimization is applied for scans. anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-west1')) ├── anti-join (lookup parent) │ ├── lookup columns are key │ ├── lookup expr: (p_id = c_p_id) AND (crdb_region = 'us-east1') │ ├── locality-optimized-search │ │ ├── scan child │ │ │ └── constraint: /13/11: [/'us-east1'/10 - /'us-east1'/10] │ │ └── scan child │ │ └── constraint: /18/16 │ │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ │ └── [/'us-west1'/10 - /'us-west1'/10] │ └── filters (true) └── filters (true) As long as child.c_id = 10 and the matching row in parent are both located in 'us-east1', the second plan will be much faster. But if they are located in one of the other regions, the first plan would be slightly faster. Informs cockroachdb#55185 Release note (performance improvement): The optimizer will now try to plan anti lookup joins using "locality optimized search". This optimization applies for anti lookup joins into REGIONAL BY ROW tables (i.e., the right side of the join is a REGIONAL BY ROW table), and if enabled, it means that the execution engine will first search locally for matching rows before searching remote nodes. If a matching row is found in a local node, remote nodes will not be searched. This optimization may improve the performance of foreign key checks when rows are inserted or updated in a table that references a foreign key in a REGIONAL BY ROW table.

@RaduBerinde

63044: opt: support locality optimized anti join r=rytaft a=rytaft **opt: fix a few omissions for locality optimized search** This commit fixes a few omissions where locality optimized search was not included when it should have been. These omissions only have a small impact on the cost so they were not noticeable. **opt: refactor and move some functions for locality optimized search** This commit refactors some functions previously in `xform/scan_funcs.go` for `GenerateLocalityOptimizedScan` and moves them to `xform/general_funcs.go` so they can be used to generate locality optimized anti joins in a future commit. **opt: support locality optimized anti join** Shout out to @RaduBerinde for this idea! This commit adds a new transformation rule, `GenerateLocalityOptimizedAntiJoin`. `GenerateLocalityOptimizedAntiJoin` converts an anti join into a locality optimized anti join if possible. A locality optimized anti join is implemented as a nested pair of anti lookup joins and is designed to avoid communicating with remote nodes (relative to the gateway region) if at all possible. A locality optimized anti join can be planned under the following conditions: - The anti join can be planned as a lookup join. - The lookup join scans multiple spans in the lookup index for each input row, with some spans targeting partitions on local nodes (relative to the gateway region), and some targeting partitions on remote nodes. It is not known which span(s) will contain the matching row(s). The result of `GenerateLocalityOptimizedAntiJoin` will be a nested pair of anti lookup joins in which the first lookup join is an anti join targeting the local values from the original join, and the second lookup join is an anti join targeting the remote values. Because of the way anti join is defined, a row will only be returned by the first anti join if a match is *not* found locally. If a match is found, no row will be returned and therefore the second lookup join will not need to search the remote nodes. This nested pair of anti joins is logically equivalent to the original, single anti join. This is a useful optimization if there is locality of access in the workload, such that rows tend to be accessed from the region where they are located. If there is no locality of access, using a locality optimized anti join could be a slight pessimization, since rows residing in remote regions will be found slightly more slowly than they would be otherwise. For example, suppose we have a multi-region database with regions 'us-east1', 'us-west1' and 'europe-west1', and we have the following tables and query, issued from 'us-east1': ``` CREATE TABLE parent ( p_id INT PRIMARY KEY ) LOCALITY REGIONAL BY ROW; CREATE TABLE child ( c_id INT PRIMARY KEY, c_p_id INT REFERENCES parent (p_id) ) LOCALITY REGIONAL BY ROW; SELECT * FROM child WHERE NOT EXISTS ( SELECT * FROM parent WHERE p_id = c_p_id ) AND c_id = 10; ``` Normally, this would produce the following plan: ``` anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-east1', 'us-west1')) ├── scan child │ └── constraint: /7/5 │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ ├── [/'us-east1'/10 - /'us-east1'/10] │ └── [/'us-west1'/10 - /'us-west1'/10] └── filters (true) ``` but if the session setting `locality_optimized_partitioned_index_scan` is enabled, the optimizer will produce this plan, using locality optimized search, both for the scan of `child` and for the lookup join with `parent`. See the rule `GenerateLocalityOptimizedScan` for details about how the optimization is applied for scans. ``` anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-west1')) ├── anti-join (lookup parent) │ ├── lookup columns are key │ ├── lookup expr: (p_id = c_p_id) AND (crdb_region = 'us-east1') │ ├── locality-optimized-search │ │ ├── scan child │ │ │ └── constraint: /13/11: [/'us-east1'/10 - /'us-east1'/10] │ │ └── scan child │ │ └── constraint: /18/16 │ │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ │ └── [/'us-west1'/10 - /'us-west1'/10] │ └── filters (true) └── filters (true) ``` As long as `child.c_id = 10` and the matching row in `parent` are both located in 'us-east1', the second plan will be much faster. But if they are located in one of the other regions, the first plan would be slightly faster. Informs #55185 Release note (performance improvement): The optimizer will now try to plan anti lookup joins using "locality optimized search". This optimization applies for anti lookup joins into `REGIONAL BY ROW` tables (i.e., the right side of the join is a `REGIONAL BY ROW` table), and if enabled, it means that the execution engine will first search locally for matching rows before searching remote nodes. If a matching row is found in a local node, remote nodes will not be searched. This optimization may improve the performance of foreign key checks when rows are inserted or updated in a table that references a foreign key in a `REGIONAL BY ROW` table. 63094: bazel: upgrade gazelle to latest release version r=rickystewart a=rickystewart Also move the declaration of `in_gopkg_yaml_v2` and `org_golang_x_xerrors` so Gazelle doesn't override them. Release note: None Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

awoods187 · 2021-05-05T13:42:16Z

Can we close this now cc @rytaft @ajstorm ?

rytaft · 2021-05-05T13:56:59Z

Not yet -- we still need to support locality optimized inner, left and semi lookup joins (currently only anti is supported), and we need to support locality optimized scans with more than a single row.

nvanbenschoten · 2021-05-05T13:57:03Z

If we do, let's make sure we open issues for any extensions that we still may want to do in the future. For instance, we'll eventually want to support this for multi-point reads (e.g. WHERE id IN (...)). Anything other query plans? Also, I think the "Extension: Stale Read Hints" part is very cool and worthy of its own issue.

awoods187 · 2021-05-05T14:17:32Z

I'm a plus one to breaking this up into smaller issues. It will also make it easier to track progress internally.

ajstorm · 2021-05-06T12:43:41Z

Splitting it up works for me too. It should make it easier to track that way.

…

On Wed, May 5, 2021 at 10:17 AM Andy Woods ***@***.***> wrote: I'm a plus one to breaking this up into smaller issues. It will also make it easier to track progress internally. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55185 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMXVOTISHFS42Q7ATDCPO3TMFHRHANCNFSM4SCCH37A> .

Note: I do not think this commit should be merged, because it still has several issues. For example, it currently causes vectorized execution errors on mutations because it runs buffered subqueries that use mutation output *before* the mutation query, rather than as part of the postqueries that use them. I'm not sure how difficult it is to change this behavior, but it seems risky to try to squeeze in to the end of the release cycle. It is also less important than originally thought to use locality optimized search with lookup joins, since it does not guarantee local plans for foreign key checks in the common case. Foreign key checks might be able to use this optimization to avoid remote nodes in case of a foreign key violation, but not if there is no violation. Intead, I am posting this PR to document my decision not to pursue this path, and will also use it as a starting point for a more comprehensive change (targeted for the next release) to support locality optimized lookup joins with a new joinReader strategy. Feel free to comment if you have an idea about how to make the current approach work for release 21.1 with minimal risk. Informs cockroachdb#55185 ---- This commit adds support for a new rule, GenerateLocalityOptimizedLookupJoin. GenerateLocalityOptimizedLookupJoin plans a LocalityOptimizedSearch operation if possible. LocalityOptimizedSearch is similar to UnionAll, but it is designed to avoid communicating with remote nodes (relative to the gateway region) if at all possible. LocalityOptimizedSearch can be planned under the following conditions: - A lookup join is known to produce at most one row. - The lookup join scans multiple spans in the lookup index for each input row, with some spans targeting partitions on local nodes (relative to the gateway region), and some targeting partitions on remote nodes. It is not known which span will produce the row. The result of GenerateLocalityOptimizedLookupJoin will be a LocalityOptimizedSearch in which the left child contains a new lookup join operator targeting the local values from the original lookup join, and the right child contains a new lookup join operator targeting the remote values. The LocalityOptimizedSearch operator ensures that the right child (targeting remote values) is only executed if the left child (targeting local values) does not return any rows. To avoid recalculating the input for each new lookup join, the input is buffered and stored as a common table expression. Each of the two child lookup joins scan the buffer instead of recalculating the input. This is a useful optimization if there is locality of access in the workload, such that rows tend to be accessed from the region where they are located. If there is no locality of access, using LocalityOptimizedSearch could be a slight pessimization, since rows residing in remote regions will be fetched slightly more slowly than they would be otherwise. For example, suppose we have a multi-region database with regions 'us-east1', 'us-west1' and 'europe-west1', and we have the following tables and query, issued from 'us-east1': CREATE TABLE parent ( p_id INT PRIMARY KEY ) LOCALITY REGIONAL BY ROW; CREATE TABLE child ( c_id INT PRIMARY KEY, c_p_id INT REFERENCES parent (p_id) ) LOCALITY REGIONAL BY ROW; SELECT * FROM parent, child WHERE c_id = 10 AND p_id = c_p_id; Normally, this would produce the following plan: inner-join (lookup parent) ├── lookup columns are key ├── scan child │ └── constraint: /7/5 │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ ├── [/'us-east1'/10 - /'us-east1'/10] │ └── [/'us-west1'/10 - /'us-west1'/10] └── filters (true) but if the session setting locality_optimized_partitioned_index_scan is enabled, the optimizer will produce this plan, using locality optimized search, both for the scan of child and for the lookup join with parent. See the rule GenerateLocalityOptimizedScan for details about how the optimization is applied for scans. project └── with &1 ├── project │ ├── locality-optimized-search │ │ ├── scan child │ │ │ └── constraint: /13/11: [/'us-east1'/10 - /'us-east1'/10] │ │ └── scan child │ │ └── constraint: /18/16 │ │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ │ └── [/'us-west1'/10 - /'us-west1'/10] │ └── projections │ ├── child.c_id │ └── child.c_p_id └── locality-optimized-search ├── inner-join (lookup parent) │ ├── lookup columns are key │ ├── with-scan &1 │ └── filters (true) └── inner-join (lookup parent) ├── lookup columns are key ├── with-scan &1 └── filters (true) As long as child.c_id = 10 and the matching row in parent are both located in 'us-east1', the second plan will be much faster. But if they are located in one of the other regions, the first plan would be slightly faster. Release note: None

rytaft · 2021-05-07T12:34:29Z

Done. Opened #64862, #64870, #64872, and #64874, so I'll close this now.

This commit adds a new transformation rule, GenerateLocalityOptimizedAntiJoin. GenerateLocalityOptimizedAntiJoin converts an anti join into a locality optimized anti join if possible. A locality optimized anti join is implemented as a nested pair of anti lookup joins and is designed to avoid communicating with remote nodes (relative to the gateway region) if at all possible. A locality optimized anti join can be planned under the following conditions: - The anti join can be planned as a lookup join. - The lookup join scans multiple spans in the lookup index for each input row, with some spans targeting partitions on local nodes (relative to the gateway region), and some targeting partitions on remote nodes. It is not known which span(s) will contain the matching row(s). The result of GenerateLocalityOptimizedAntiJoin will be a nested pair of anti lookup joins in which the first lookup join is an anti join targeting the local values from the original join, and the second lookup join is an anti join targeting the remote values. Because of the way anti join is defined, a row will only be returned by the first anti join if a match is *not* found locally. If a match is found, no row will be returned and therefore the second lookup join will not need to search the remote nodes. This nested pair of anti joins is logically equivalent to the original, single anti join. This is a useful optimization if there is locality of access in the workload, such that rows tend to be accessed from the region where they are located. If there is no locality of access, using a locality optimized anti join could be a slight pessimization, since rows residing in remote regions will be found slightly more slowly than they would be otherwise. For example, suppose we have a multi-region database with regions 'us-east1', 'us-west1' and 'europe-west1', and we have the following tables and query, issued from 'us-east1': CREATE TABLE parent ( p_id INT PRIMARY KEY ) LOCALITY REGIONAL BY ROW; CREATE TABLE child ( c_id INT PRIMARY KEY, c_p_id INT REFERENCES parent (p_id) ) LOCALITY REGIONAL BY ROW; SELECT * FROM child WHERE NOT EXISTS ( SELECT * FROM parent WHERE p_id = c_p_id ) AND c_id = 10; Normally, this would produce the following plan: anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-east1', 'us-west1')) ├── scan child │ └── constraint: /7/5 │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ ├── [/'us-east1'/10 - /'us-east1'/10] │ └── [/'us-west1'/10 - /'us-west1'/10] └── filters (true) but if the session setting locality_optimized_partitioned_index_scan is enabled, the optimizer will produce this plan, using locality optimized search, both for the scan of child and for the lookup join with parent. See the rule GenerateLocalityOptimizedScan for details about how the optimization is applied for scans. anti-join (lookup parent) ├── lookup columns are key ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-west1')) ├── anti-join (lookup parent) │ ├── lookup columns are key │ ├── lookup expr: (p_id = c_p_id) AND (crdb_region = 'us-east1') │ ├── locality-optimized-search │ │ ├── scan child │ │ │ └── constraint: /13/11: [/'us-east1'/10 - /'us-east1'/10] │ │ └── scan child │ │ └── constraint: /18/16 │ │ ├── [/'europe-west1'/10 - /'europe-west1'/10] │ │ └── [/'us-west1'/10 - /'us-west1'/10] │ └── filters (true) └── filters (true) As long as child.c_id = 10 and the matching row in parent are both located in 'us-east1', the second plan will be much faster. But if they are located in one of the other regions, the first plan would be slightly faster. Informs cockroachdb#55185 Release note (performance improvement): The optimizer will now try to plan anti lookup joins using "locality optimized search". This optimization applies for anti lookup joins into REGIONAL BY ROW tables (i.e., the right side of the join is a REGIONAL BY ROW table), and if enabled, it means that the execution engine will first search locally for matching rows before searching remote nodes. If a matching row is found in a local node, remote nodes will not be searched. This optimization may improve the performance of foreign key checks when rows are inserted or updated in a table that references a foreign key in a REGIONAL BY ROW table.

nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-partitioning labels Oct 2, 2020

nvanbenschoten mentioned this issue Oct 2, 2020

opt: Support partitioned uniqueness checks #41535

Closed

rytaft self-assigned this Nov 10, 2020

ajstorm added the A-multiregion Related to multi-region label Dec 2, 2020

kenliu added the T-multiregion label Dec 6, 2020

nvanbenschoten removed the T-multiregion label Dec 10, 2020

rytaft added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Feb 1, 2021

rytaft mentioned this issue Feb 15, 2021

opt: create key for UNIQUE WITHOUT INDEX constraints #60591

Merged

rytaft mentioned this issue Feb 19, 2021

sql,opt: add cluster and session settings for locality optimized search #60771

Merged

rytaft mentioned this issue Feb 19, 2021

opt: add partition info to the opt catalog #60818

Merged

rytaft mentioned this issue Apr 2, 2021

opt: support locality optimized anti join #63044

Merged

rytaft mentioned this issue Apr 5, 2021

release-21.1: opt: support locality optimized anti join #63118

Merged

rytaft closed this as completed May 7, 2021

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: Locality optimized search #55185

opt: Locality optimized search #55185

nvanbenschoten commented Oct 2, 2020 •

edited by rytaft

Loading

nvanbenschoten commented Oct 3, 2020

nvanbenschoten commented Oct 5, 2020

awoods187 commented May 5, 2021

rytaft commented May 5, 2021

nvanbenschoten commented May 5, 2021

awoods187 commented May 5, 2021

ajstorm commented May 6, 2021 via email

rytaft commented May 7, 2021

opt: Locality optimized search #55185

opt: Locality optimized search #55185

Comments

nvanbenschoten commented Oct 2, 2020 • edited by rytaft Loading

Example:

Extension: Stale Read Hints

Extension: Ignore Empty Regions

nvanbenschoten commented Oct 3, 2020

nvanbenschoten commented Oct 5, 2020

awoods187 commented May 5, 2021

rytaft commented May 5, 2021

nvanbenschoten commented May 5, 2021

awoods187 commented May 5, 2021

ajstorm commented May 6, 2021 via email

rytaft commented May 7, 2021

nvanbenschoten commented Oct 2, 2020 •

edited by rytaft

Loading