Skip to content

Commit

Permalink
sql: fast path uniqueness checks for single-row insert
Browse files Browse the repository at this point in the history
This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

- a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
`id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
- a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in `FastPathUniqueChecks` of
the `InsertExpr`, to find any which have been rewritten as a constrained
`ScanExpr`. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the `FastPathUniqueChecks.Check`
relation.  If a constrained scan is found, it is used to build elements
of the `FastPathUniqueChecksItemPrivate` structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new `DatumsFromConstraint` field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the `InsertFastPathFKUniqCheck` structure
in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built
into a new the corresponding `FastPathUniqueChecksItem`s of a new
`FastPathUniqueChecksExpr` and communicated to the caller via return
value `newFastPathUniqueChecks`.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: cockroachdb#58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.
  • Loading branch information
Mark Sirek committed Oct 7, 2023
1 parent 4637e51 commit 851d230
Show file tree
Hide file tree
Showing 24 changed files with 2,051 additions and 422 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -338,54 +338,18 @@ EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_pk (id, part) VALUES (4321, 'seattle
distribution: local
vectorized: true
·
• root
│ columns: ()
├── • insert
│ │ columns: ()
│ │ estimated row count: 0 (missing stats)
│ │ into: t_unique_hash_pk(crdb_internal_id_shard_16, id, part)
│ │
│ └── • values
│ columns: (crdb_internal_id_shard_16_comp, column1, column2, check1, check2)
│ size: 5 columns, 1 row
│ row 0, expr 0: 4321
│ row 0, expr 1: 'seattle'
│ row 0, expr 2: 11
│ row 0, expr 3: true
│ row 0, expr 4: true
└── • constraint-check
└── • error if rows
│ columns: ()
└── • project
│ columns: (id)
└── • cross join (inner)
│ columns: (id, crdb_internal_id_shard_16, id, part)
│ estimated row count: 1 (missing stats)
├── • values
│ columns: (id)
│ size: 1 column, 1 row
│ row 0, expr 0: 4321
└── • limit
│ columns: (crdb_internal_id_shard_16, id, part)
│ count: 1
└── • filter
│ columns: (crdb_internal_id_shard_16, id, part)
│ estimated row count: 6 (missing stats)
│ filter: (crdb_internal_id_shard_16 != 11) OR (part != 'seattle')
└── • scan
columns: (crdb_internal_id_shard_16, id, part)
estimated row count: 0 (missing stats)
table: t_unique_hash_pk@t_unique_hash_pk_pkey
spans: /"new york"/11/4321/0 /"seattle"/11/4321/0
• insert fast path
columns: ()
estimated row count: 0 (missing stats)
into: t_unique_hash_pk(crdb_internal_id_shard_16, id, part)
auto commit
uniqueness check: t_unique_hash_pk@t_unique_hash_pk_pkey
size: 5 columns, 1 row
row 0, expr 0: 11
row 0, expr 1: 4321
row 0, expr 2: 'seattle'
row 0, expr 3: true
row 0, expr 4: true

query T
EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_pk (id, part) VALUES (4321, 'seattle') ON CONFLICT DO NOTHING;
Expand Down Expand Up @@ -946,79 +910,20 @@ EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_sec_key (id, email, part) VALUES (43
distribution: local
vectorized: true
·
• root
│ columns: ()
├── • insert
│ │ columns: ()
│ │ estimated row count: 0 (missing stats)
│ │ into: t_unique_hash_sec_key(id, email, part, crdb_internal_email_shard_16)
│ │
│ └── • values
│ columns: (column1, column2, column3, crdb_internal_email_shard_16_comp, check1, check2)
│ size: 6 columns, 1 row
│ row 0, expr 0: 4321
│ row 0, expr 1: 'some_email'
│ row 0, expr 2: 'seattle'
│ row 0, expr 3: 1
│ row 0, expr 4: true
│ row 0, expr 5: true
├── • constraint-check
│ │
│ └── • error if rows
│ │ columns: ()
│ │
│ └── • project
│ │ columns: (id)
│ │
│ └── • cross join (inner)
│ │ columns: (id, id, part)
│ │ estimated row count: 1 (missing stats)
│ │
│ ├── • values
│ │ columns: (id)
│ │ size: 1 column, 1 row
│ │ row 0, expr 0: 4321
│ │
│ └── • scan
│ columns: (id, part)
│ estimated row count: 1 (missing stats)
│ table: t_unique_hash_sec_key@t_unique_hash_sec_key_pkey
│ spans: /"new york"/4321/0
│ limit: 1
└── • constraint-check
└── • error if rows
│ columns: ()
└── • project
│ columns: (email)
└── • cross join (inner)
│ columns: (email, id, email, part)
│ estimated row count: 1 (missing stats)
├── • values
│ columns: (email)
│ size: 1 column, 1 row
│ row 0, expr 0: 'some_email'
└── • limit
│ columns: (id, email, part)
│ count: 1
└── • filter
│ columns: (id, email, part)
│ estimated row count: 6 (missing stats)
│ filter: (id != 4321) OR (part != 'seattle')
└── • scan
columns: (id, email, part)
estimated row count: 0 (missing stats)
table: t_unique_hash_sec_key@idx_uniq_hash_email
spans: /"new york"/1/"some_email"/0 /"seattle"/1/"some_email"/0
• insert fast path
columns: ()
estimated row count: 0 (missing stats)
into: t_unique_hash_sec_key(id, email, part, crdb_internal_email_shard_16)
auto commit
uniqueness check: t_unique_hash_sec_key@t_unique_hash_sec_key_pkey
uniqueness check: t_unique_hash_sec_key@idx_uniq_hash_email
size: 6 columns, 1 row
row 0, expr 0: 4321
row 0, expr 1: 'some_email'
row 0, expr 2: 'seattle'
row 0, expr 3: 1
row 0, expr 4: true
row 0, expr 5: true

query T
EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_sec_key (id, email, part) VALUES (4321, 'some_email', 'seattle') ON CONFLICT DO NOTHING;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -359,65 +359,18 @@ EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_pk (id) VALUES (4321);
distribution: local
vectorized: true
·
• root
│ columns: ()
├── • insert
│ │ columns: ()
│ │ estimated row count: 0 (missing stats)
│ │ into: t_unique_hash_pk(crdb_internal_id_shard_16, id, crdb_region)
│ │
│ └── • values
│ columns: (crdb_internal_id_shard_16_comp, column1, crdb_region_default, check1, check2)
│ size: 5 columns, 1 row
│ row 0, expr 0: 4321
│ row 0, expr 1: 'ap-southeast-2'
│ row 0, expr 2: 11
│ row 0, expr 3: true
│ row 0, expr 4: true
└── • constraint-check
└── • error if rows
│ columns: ()
└── • project
│ columns: (id)
└── • cross join (inner)
│ columns: (id, crdb_internal_id_shard_16, id, crdb_region)
│ estimated row count: 1 (missing stats)
├── • values
│ columns: (id)
│ size: 1 column, 1 row
│ row 0, expr 0: 4321
└── • limit
│ columns: (crdb_internal_id_shard_16, id, crdb_region)
│ count: 1
└── • filter
│ columns: (crdb_internal_id_shard_16, id, crdb_region)
│ estimated row count: 6 (missing stats)
│ filter: (crdb_internal_id_shard_16 != 11) OR (crdb_region != 'ap-southeast-2')
└── • union all
│ columns: (crdb_internal_id_shard_16, id, crdb_region)
│ estimated row count: 1 (missing stats)
│ limit: 3
├── • scan
│ columns: (crdb_internal_id_shard_16, id, crdb_region)
│ estimated row count: 1 (missing stats)
│ table: t_unique_hash_pk@t_unique_hash_pk_pkey
│ spans: /"@"/11/4321/0
└── • scan
columns: (crdb_internal_id_shard_16, id, crdb_region)
estimated row count: 1 (missing stats)
table: t_unique_hash_pk@t_unique_hash_pk_pkey
spans: /"\x80"/11/4321/0 /"\xc0"/11/4321/0
• insert fast path
columns: ()
estimated row count: 0 (missing stats)
into: t_unique_hash_pk(crdb_internal_id_shard_16, id, crdb_region)
auto commit
uniqueness check: t_unique_hash_pk@t_unique_hash_pk_pkey
size: 5 columns, 1 row
row 0, expr 0: 11
row 0, expr 1: 4321
row 0, expr 2: 'ap-southeast-2'
row 0, expr 3: true
row 0, expr 4: true

query T
EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_pk (id) VALUES (4321) ON CONFLICT DO NOTHING;
Expand Down Expand Up @@ -1024,106 +977,20 @@ EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_sec_key (id, email) VALUES (4321, 's
distribution: local
vectorized: true
·
• root
│ columns: ()
├── • insert
│ │ columns: ()
│ │ estimated row count: 0 (missing stats)
│ │ into: t_unique_hash_sec_key(id, email, crdb_region, crdb_internal_email_shard_16)
│ │
│ └── • values
│ columns: (column1, column2, crdb_region_default, crdb_internal_email_shard_16_comp, check1, check2)
│ size: 6 columns, 1 row
│ row 0, expr 0: 4321
│ row 0, expr 1: 'some_email'
│ row 0, expr 2: 'ap-southeast-2'
│ row 0, expr 3: 1
│ row 0, expr 4: true
│ row 0, expr 5: true
├── • constraint-check
│ │
│ └── • error if rows
│ │ columns: ()
│ │
│ └── • project
│ │ columns: (id)
│ │
│ └── • cross join (inner)
│ │ columns: (id, id, crdb_region)
│ │ estimated row count: 1 (missing stats)
│ │
│ ├── • values
│ │ columns: (id)
│ │ size: 1 column, 1 row
│ │ row 0, expr 0: 4321
│ │
│ └── • scan
│ columns: (id, crdb_region)
│ estimated row count: 1 (missing stats)
│ table: t_unique_hash_sec_key@t_unique_hash_sec_key_pkey
│ spans: /"\x80"/4321/0 /"\xc0"/4321/0
│ limit: 1
└── • constraint-check
└── • error if rows
│ columns: ()
└── • project
│ columns: (email)
└── • cross join (inner)
│ columns: (email, id, email, crdb_region)
│ estimated row count: 1 (missing stats)
├── • values
│ columns: (email)
│ size: 1 column, 1 row
│ row 0, expr 0: 'some_email'
└── • limit
│ columns: (id, email, crdb_region)
│ count: 1
└── • distinct
│ columns: (id, email, crdb_region)
│ estimated row count: 6 (missing stats)
│ distinct on: id, crdb_region
└── • union all
│ columns: (id, email, crdb_region)
│ estimated row count: 2 (missing stats)
├── • filter
│ │ columns: (id, email, crdb_region)
│ │ estimated row count: 1 (missing stats)
│ │ filter: id != 4321
│ │
│ └── • union all
│ │ columns: (id, email, crdb_region)
│ │ estimated row count: 1 (missing stats)
│ │ limit: 1
│ │
│ ├── • scan
│ │ columns: (id, email, crdb_region)
│ │ estimated row count: 1 (missing stats)
│ │ table: t_unique_hash_sec_key@idx_uniq_hash_email
│ │ spans: /"@"/1/"some_email"/0
│ │
│ └── • scan
│ columns: (id, email, crdb_region)
│ estimated row count: 1 (missing stats)
│ table: t_unique_hash_sec_key@idx_uniq_hash_email
│ spans: /"\x80"/1/"some_email"/0 /"\xc0"/1/"some_email"/0
│ parallel
└── • scan
columns: (id, email, crdb_region)
estimated row count: 1 (missing stats)
table: t_unique_hash_sec_key@idx_uniq_hash_email
spans: /"\x80"/1/"some_email"/0 /"\xc0"/1/"some_email"/0
• insert fast path
columns: ()
estimated row count: 0 (missing stats)
into: t_unique_hash_sec_key(id, email, crdb_region, crdb_internal_email_shard_16)
auto commit
uniqueness check: t_unique_hash_sec_key@t_unique_hash_sec_key_pkey
uniqueness check: t_unique_hash_sec_key@idx_uniq_hash_email
size: 6 columns, 1 row
row 0, expr 0: 4321
row 0, expr 1: 'some_email'
row 0, expr 2: 'ap-southeast-2'
row 0, expr 3: 1
row 0, expr 4: true
row 0, expr 5: true

query T
EXPLAIN (VERBOSE) INSERT INTO t_unique_hash_sec_key (id, email) VALUES (4321, 'some_email') ON CONFLICT DO NOTHING;
Expand Down
Loading

0 comments on commit 851d230

Please sign in to comment.