-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: infer lookup join equality conditions from unique key and foreign key #69617
Comments
Note the explicit |
Thanks for the detailed write up. It perfectly captures how we would like to use regional tables. One clarification is that the explicit join to |
Thanks @dain and @electrum for the comments. @dain, looks like @nvanbenschoten updated the issue description so it mentions that
Note that we do not plan uniqueness checks in this case since they are not necessary due to the uniqueness of
If you decide to go down this route, I will be very interested to hear your experience. @electrum thanks for pointing out the opportunity for join elimination in this case. There is an issue for join elimination when there is a foreign key (#47391), but it is closed. Clearly we are not handling all cases, though, so I will reopen it and try to identify what transformation rules we're missing. |
@rytaft Good to know about |
@electrum transparently introducing a join on a table that is not otherwise included in a query in order to make a lookup more efficient is a pretty wild idea. In some sense, it's analogous to secondary index selection on a single table, but taken to the next level. Given your prior experience with SQL engines and optimizers, do you know of any prior art in this area? Have you seen cases where systems, even OLAP systems, consider such query plans? |
Agreed, very interesting idea for an optimization! So just to confirm, @electrum, this is what the original query actually should have looked like:
and the idea is that the optimizer should add the join with |
@nvanbenschoten I don’t have any specific references, but I remember that Oracle definitely took advantage of constraints and foreign keys in its optimizer. @rytaft exactly! Given the foreign key relationship, it might be helpful to think about this as a primary key lookup via scalar subquery (not correlated): SELECT tweet_id, message
FROM tweets
WHERE (region, account_id) =
(SELECT region, account_id
FROM accounts
WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8'); It is a single row that is guaranteed to exist in the parent table if any child rows exist, not an arbitrary join that could change the cardinality. |
Interesting! Thanks for the additional info. |
Note: maybe open a 2nd issue to add a new transformation rule to add a join between a global and regional by row table. |
Fixes cockroachdb#69617 When a unique constraint exists on a subset of the referenced columns in a foreign key constraint, the remaining columns in the constraint can be used to generate equijoin predicates which may enable more efficient use of an index on the lookup side of a lookup join. If the index is a multiregion index, a join predicate may be derived which could potentially eliminate reads of remote rows. Example: ``` CREATE TABLE accounts ( account_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name STRING NOT NULL, crdb_region crdb_internal_region NOT NULL, UNIQUE INDEX acct_id_crdb_region_idx (account_id, crdb_region) ) LOCALITY GLOBAL; drop table if exists tweets; CREATE TABLE tweets ( account_id UUID NOT NULL, tweet_id UUID DEFAULT gen_random_uuid(), message STRING NOT NULL, crdb_region crdb_internal_region NOT NULL, PRIMARY KEY (crdb_region, account_id, tweet_id), -- The PK of accounts is a subset of the referenced columns in -- the FK constraint. FOREIGN KEY (account_id, crdb_region) REFERENCES accounts (account_id, crdb_region) ON DELETE CASCADE ON UPDATE CASCADE ) LOCALITY REGIONAL BY ROW as crdb_region; -- Join on account_id uses the uniqueness of accounts_pkey and the FK -- constraint to derive tweets.crdb_region = accounts.crdb_region EXPLAIN SELECT * FROM tweets INNER LOOKUP JOIN accounts@acct_id_crdb_region_idx USING (account_id) WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8'; ------------------------------------------------------------------------- distribution: local vectorized: true • lookup join │ table: accounts@accounts_pkey │ equality: (account_id) = (account_id) │ equality cols are key │ └── • lookup join │ table: accounts@acct_id_crdb_region_idx │ equality: (account_id, crdb_region) = (account_id,crdb_region) │ equality cols are key │ pred: account_id = '6f781502-4936-43cc-b384-04e5cf292cc8' │ └── • scan missing stats table: tweets@tweets_pkey spans: [/'ca'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'ca'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'eu'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'eu'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'us'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'us'/'6f781502-4936-43cc-b384-04e5cf292cc8'] ``` Release note (performance improvement): This patch enables more efficient lookup joins by deriving new join constraints when equijoin predicates exist on the column(s) of a unique constraint on one table which are a proper subset of the referencing columns of a foreign key constraint on the other table. If an index exists on those FK constraint referencing columns, equijoin predicates are derived between the PK and FK columns not currently bound by ON clause predicates.
Fixes cockroachdb#69617 This commit amends the PruneJoinLeftCols and PruneJoinRightCols normalization rules to include potential derived ON clause predicates so that columns not present in the SELECT list are not pruned away from the involved Scans before predicates are derived. Derived ON clause predicate columns are excluded from the set of columns to use for equijoin selectivity estimation. Release note: None
Fixes cockroachdb#69617 This commit amends the PruneJoinLeftCols and PruneJoinRightCols normalization rules to include potential derived ON clause predicates so that columns not present in the SELECT list are not pruned away from the involved Scans before predicates are derived. Derived ON clause predicate columns are excluded from the set of columns to use for equijoin selectivity estimation. Release note: None
Fixes cockroachdb#69617 This commit amends the PruneJoinLeftCols and PruneJoinRightCols normalization rules to include potential derived ON clause predicates so that columns not present in the SELECT list are not pruned away from the involved Scans before predicates are derived. Derived ON clause predicate columns are excluded from the set of columns to use for equijoin selectivity estimation. Release note: None
Fixes cockroachdb#69617 This commit amends the PruneJoinLeftCols and PruneJoinRightCols normalization rules to include potential derived ON clause predicates so that columns not present in the SELECT list are not pruned away from the involved Scans before predicates are derived. Derived ON clause predicate columns are excluded from the set of columns to use for equijoin selectivity estimation. Release note: None
90599: xform: derive implicit predicates from FK constraint for lookup join r=rytaft a=msirek Fixes #69617 When a unique constraint exists on a subset of the referenced columns in a foreign key constraint, the remaining columns in the constraint can be used to generate equijoin predicates which may enable more efficient use of an index on the lookup side of a lookup join. If the index is a multiregion index, a join predicate may be derived which could potentially eliminate reads of remote rows. Example: ``` CREATE TABLE accounts ( account_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name STRING NOT NULL, crdb_region crdb_internal_region NOT NULL, UNIQUE INDEX acct_id_crdb_region_idx (account_id, crdb_region) ) LOCALITY GLOBAL; drop table if exists tweets; CREATE TABLE tweets ( account_id UUID NOT NULL, tweet_id UUID DEFAULT gen_random_uuid(), message STRING NOT NULL, crdb_region crdb_internal_region NOT NULL, PRIMARY KEY (crdb_region, account_id, tweet_id), -- The PK of accounts is a subset of the referenced columns in -- the FK constraint. FOREIGN KEY (account_id, crdb_region) REFERENCES accounts (account_id, crdb_region) ON DELETE CASCADE ON UPDATE CASCADE ) LOCALITY REGIONAL BY ROW as crdb_region; -- Join on account_id uses the uniqueness of accounts_pkey and the FK -- constraint to derive tweets.crdb_region = accounts.crdb_region EXPLAIN SELECT * FROM tweets INNER LOOKUP JOIN accounts@acct_id_crdb_region_idx USING (account_id) WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8'; ------------------------------------------------------------------------- distribution: local vectorized: true • lookup join │ table: accounts@accounts_pkey │ equality: (account_id) = (account_id) │ equality cols are key │ └── • lookup join │ table: accounts@acct_id_crdb_region_idx │ equality: (account_id, crdb_region) = (account_id,crdb_region) │ equality cols are key │ pred: account_id = '6f781502-4936-43cc-b384-04e5cf292cc8' │ └── • scan missing stats table: tweets@tweets_pkey spans: [/'ca'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'ca'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'eu'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'eu'/'6f781502-4936-43cc-b384-04e5cf292cc8'] [/'us'/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'us'/'6f781502-4936-43cc-b384-04e5cf292cc8'] ``` Release note (performance improvement): This patch enables more efficient lookup joins by deriving new join constraints when equijoin predicates exist on the column(s) of a unique constraint on one table which are a proper subset of the referencing columns of a foreign key constraint on the other table. If an index exists on those FK constraint referencing columns, equijoin predicates are derived between the PK and FK columns not currently bound by ON clause predicates. norm: do not prune scan columns which may show up in derived join terms Fixes #69617 This commit amends the PruneJoinLeftCols and PruneJoinRightCols normalization rules to include potential derived ON clause predicates so that columns not present in the SELECT list are not pruned away from the involved Scans before predicates are derived. Derived ON clause predicate columns are excluded from the set of columns to use for equijoin selectivity estimation. Release note: None Co-authored-by: Mark Sirek <[email protected]>
Consider the following example, using
./cockroach demo --global --empty --nodes=9
:Ideally, the following query would be able to use a lookup join and visit only a single region, but it currently uses a merge join:
We can force a lookup join, but it still requires visiting all regions:
In order for the optimizer to plan a lookup join that visits a single region, it would need to infer that the join condition
USING (account_id)
is actually equivalent toUSING (account_id, crdb_region)
. This inference should be possible using the fact thataccount_id
is the primary key ofaccounts
as well as the fact that there is a foreign key intweets
that referencesaccounts (account_id, crdb_region)
. Includingcrdb_region
would allow the optimizer to change the lookup condition from(crdb_region IN ('europe-west1', 'us-east1', 'us-west1'))
to another equality condition(crdb_region = crdb_region)
, and thus visit the single region that contains 'ab887bbc-1a83-4324-8998-d18ebe448fa7'. The resulting plan should look like the following:cc @nvanbenschoten
Epic CRDB-26292
Jira issue: CRDB-9686
The text was updated successfully, but these errors were encountered: