Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add implicit join with global table using FK constraint to avoid scanning remote rows #91084

Open
msirek opened this issue Nov 1, 2022 · 2 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team

Comments

@msirek
Copy link
Contributor

msirek commented Nov 1, 2022

This issue is originally described towards the end of #69617.

Describe the solution you'd like
Given a query on a REGIONAL BY ROW table that has a foreign key constraint on a global table, when the query contains a predicate on the key of the global table and the crdb_region column value is not specified via an equality filter, add an implicit join with the global table to potentially avoid scanning rows in remote regions.
Example:

SET experimental_enable_unique_without_index_constraints = true;

CREATE TABLE accounts (
    account_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        STRING NOT NULL,
    crdb_region crdb_internal_region NOT NULL,
    UNIQUE WITHOUT INDEX (account_id, crdb_region)
) LOCALITY GLOBAL;

CREATE TABLE tweets (
    account_id UUID NOT NULL,
    tweet_id   UUID DEFAULT gen_random_uuid(),
    message    STRING NOT NULL,
    PRIMARY KEY (account_id, tweet_id),
    FOREIGN KEY (account_id, crdb_region) REFERENCES accounts (account_id, crdb_region) ON DELETE CASCADE ON UPDATE CASCADE
) LOCALITY REGIONAL BY ROW;

SELECT tweet_id, message
FROM tweets
WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8';

This query in its current form must access all regions because the primary key of tweets is not fully specified, so locality-optimized search cannot be used. The global table can be accessed locally, and specifies the crdb_region column value of the row in tweets via the FK constraint, so transforming the query into the following avoids the unconditional scan from all regions:

EXPLAIN
SELECT tweet_id, message
FROM tweets
WHERE (crdb_region, account_id) IN
  (SELECT crdb_region, account_id
   FROM accounts
   WHERE account_id = '6f781502-4936-43cc-b384-04e5cf292cc8');
                                                  info
--------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • lookup join
  │ estimated row count: 1
  │ table: tweets@tweets_pkey
  │ equality: (crdb_region, lookup_join_const_col_@1) = (crdb_region,account_id)
  │
  └── • render
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 8 minutes ago)
            table: accounts@accounts_pkey
            spans: [/'6f781502-4936-43cc-b384-04e5cf292cc8' - /'6f781502-4936-43cc-b384-04e5cf292cc8']

It's still possible the crdb_region column value is a remote region, but this plan at least provides the option of skipping scans from remote regions when crdb_region is the local region, or if no row with the specified account_id exists in accounts.

A second issue is that UNIQUE WITHOUT INDEX (account_id, crdb_region) should not be required to create the foreign key constraint as the primary key column, account_id, implies the uniqueness of (account_id, crdb_region). This issue could optionally do away with this unnecessary requirement.

Describe alternatives you've considered
None

Jira issue: CRDB-21100

@msirek msirek added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Nov 1, 2022
@ajwerner ajwerner changed the title Add implicit join with global table using FK constraint to avoid scanning remote rows opt: add implicit join with global table using FK constraint to avoid scanning remote rows Dec 6, 2022
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
@yuzefovich yuzefovich moved this from Backlog (DO NOT ADD NEW ISSUES) to Triage in SQL Queries Oct 10, 2023
@yuzefovich
Copy link
Member

Some context.

@yuzefovich yuzefovich moved this from Triage to 24.1 Release in SQL Queries Oct 24, 2023
@mgartner mgartner moved this from 24.1 Release to 24.2 Release in SQL Queries Nov 27, 2023
@mgartner
Copy link
Collaborator

More context.

@mgartner mgartner added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Mar 28, 2024
@mgartner mgartner moved this from 24.2 Release to New Backlog in SQL Queries Mar 28, 2024
@rytaft rytaft added the P-3 Issues/test failures with no fix SLA label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-3 Issues/test failures with no fix SLA T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

4 participants