Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fast path uniqueness checks for single-row insert #111822

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

msirek
Copy link
Contributor

@msirek msirek commented Oct 5, 2023

This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

  • a single-row VALUES clause into a REGIONAL BY ROW table with a
    PRIMARY KEY which has a UUID column generated by default, ie.
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(), where the
    crdb_region column is not specified in the VALUES clause; either
    the gateway region is used or it is computed based on other column
    values.
  • a single-row VALUES clause into a REGIONAL BY ROW table with a
    hash sharded unique index where the crdb_region column is not specified
    in the VALUES clause

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in FastPathUniqueChecks of
the InsertExpr, to find any which have been rewritten as a constrained
ScanExpr. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the FastPathUniqueChecks.Check
relation. If a constrained scan is found, it is used to build elements
of the FastPathUniqueChecksItemPrivate structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new DatumsFromConstraint field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the InsertFastPathFKUniqCheck structure
in the execbuilder. The new FastPathUniqueChecksItemPrivate is built
into a new the corresponding FastPathUniqueChecksItems of a new
FastPathUniqueChecksExpr and communicated to the caller via return
value newFastPathUniqueChecks.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: #58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.


optbuilder: build insert fast-path uniq checks only if all checks can be built

This commit avoids unnecessary processing of insert fast path uniqueness
checks by modifying the optbuilder to avoid building a
FastPathUniqueChecksExpr in the Insert expression if one or more
of the required FastPathUniqueChecksItems could not be built.

Epic: CRDB-26290
Informs: #58047

Release note: None

@msirek msirek requested a review from a team as a code owner October 5, 2023 07:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msirek
Copy link
Contributor Author

msirek commented Oct 5, 2023

This PR contains the main commit enabling insert fast-path uniqueness checks, split off from #102995.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1, 2 of 2 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)


-- commits line 21 at r1:
nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.


pkg/sql/opt/exec/execbuilder/mutation.go line 182 at r1 (raw file):

			execFastPathCheck.InsertCols[j] = exec.TableColumnOrdinal(md.ColumnMeta(insertCol).Table.ColumnOrdinal(insertCol))
		}
		execFastPathCheck.MatchMethod = tree.MatchFull

I beleive match method is only relevant for foriegn key checks. Why is it needed here?


pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

		}
		if !found {
			panic(errors.AssertionFailedf(

It's probably not guaranteed that a panic will be caught in the context that execFastPathCheck.MkErr is used. There's no benefit to panicking here if we've already plumbed a return error. So to avoid crashing the node, let'd just return the error here.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 553 at r3 (raw file):

		newPossibleScan := newScanScope.expr
		if skipProjectExpr, ok := newPossibleScan.(*memo.ProjectExpr); ok {
			newPossibleScan = skipProjectExpr.Input

It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

		} else {
			// Don't build a fast-path check if we failed to create a new ScanExpr.
			buildFastPathCheck = false

Looks like there isn't coverage here either. Is it needed?


pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

		return memo.FastPathUniqueChecksExpr{}, false
	}
	if c.e.evalCtx == nil {

Can evalCtx be nil?


pkg/sql/opt/xform/insert_funcs.go line 88 at r1 (raw file):

		// appendNewUniqueChecksItem handles building of the FastPathCheck into a
		// new UniqueChecksItem for the new InsertExpr being constructed.
		appendNewUniqueChecksItem := func(private *memo.FastPathUniqueChecksItemPrivate) {

nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.


pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

		if indexJoinExpr, isIndexJoin := expr.(*memo.IndexJoinExpr); isIndexJoin {
			expr = indexJoinExpr.Input

Can't we only skip over the index join if there the filter is true?


pkg/sql/opt/xform/insert_funcs.go line 129 at r1 (raw file):

		if scan, isScan := expr.(*memo.ScanExpr); isScan {
			// We need a scan with a constraint for analysis, and it should not be
			// a contradiction (having no spans).

In what case would it be a contradiction?


pkg/sql/opt/xform/insert_funcs.go line 166 at r1 (raw file):

		span := scanExpr.Constraint.Spans.Get(j)
		// Verify there is a single key...
		if span.Prefix(scanExpr.Memo().EvalContext()) != span.StartKey().Length() {

I don't think this is sufficient. You should use span.HasSingleKey here instead.


pkg/sql/opt/xform/insert_funcs.go line 170 at r1 (raw file):

		}
		// ... and that the span has the same number of columns as the index key.
		if span.StartKey().Length() != numKeyCols {

nit: Length is likely a faster check than checking that it's a single key, so you could move this up.


pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):

# only applies to an insert of values, and not other forms like INSERT SELECT.
# If fast path information has already been built, this rule will not try to
# rewrite the insert a second time.

I know we've spoken about this before, but can you explain why this logic is done during the exploration phase? Is there any cases where we don't want to do a fast path because there is a more efficient alternative?


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 1640 at r1 (raw file):

                          table: regional_by_row_table@new_idx
                          spans: [/'ca-central-1'/1/1 - /'ca-central-1'/1/1] [/'us-east-1'/1/1 - /'us-east-1'/1/1]

nit: unnecessary new line


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

statement error pq: duplicate key value violates unique constraint "users2_email_key"\nDETAIL: Key \(email\)=\('[email protected]'\) already exists\.
INSERT INTO users2 (crdb_region, name, email)
VALUES ('ca-central-1', 'Craig Roacher', '[email protected]')

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

query T
EXPLAIN INSERT INTO users2 (name, email)
VALUES ('Bill Roacher', '[email protected]'), ('Jill Roacher', '[email protected]')

We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior

Don't forget the other EXPLAIN tests below.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3863 at r1 (raw file):

ALTER TABLE users2 ADD CONSTRAINT users4_fk FOREIGN KEY (name) REFERENCES users4 (name) ON DELETE CASCADE

# Verify multi-row insert fast path detects FK constraint violations.

nit "FK" should be "unique", correct?


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 4144 at r1 (raw file):

statement ok
CREATE TABLE hash_sharded_rbr_computed (
	region_id STRING(10) NOT NULL,

nit: replace tabs with spaces here


pkg/sql/logictest/testdata/logic_test/unique line 965 at r1 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

Don't you want these tables to mimic regional by row tables with a UNIQUE(prefix_col, other_cols...) and UNIQUE WITHOUT INDEX (other_cols...)? Or is there another reason for these tests?


pkg/sql/logictest/testdata/logic_test/unique line 1009 at r1 (raw file):

statement error pq: duplicate key value violates unique constraint "unique_b_c"\nDETAIL: Key \(b, c\)=\(2, 3\) already exists\.
INSERT INTO multiple_uniq (a,c,b,d)
VALUES (11,3,2,14), (15,16,17,18)

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.


pkg/sql/opt/exec/execbuilder/testdata/unique line 4800 at r1 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

This should mimic a RBR table, shouldn't it?

@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from 1a95cc8 to 0e9e42f Compare October 5, 2023 23:40
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


-- commits line 21 at r1:

Previously, mgartner (Marcus Gartner) wrote…

nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.

OK, removed that point


pkg/sql/opt/exec/execbuilder/mutation.go line 182 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I beleive match method is only relevant for foriegn key checks. Why is it needed here?

You're right. Removed it.


pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It's probably not guaranteed that a panic will be caught in the context that execFastPathCheck.MkErr is used. There's no benefit to panicking here if we've already plumbed a return error. So to avoid crashing the node, let'd just return the error here.

Done


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 553 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?

Hash-sharded RBR tables sometimes need to do this, like the following case:
https://github.com/msirek/cockroach/blob/6ab911a9d6c8efe420487224d8c5125ec4741001/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior#L4156-L4171

Added a code comment.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Looks like there isn't coverage here either. Is it needed?

Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.


pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can evalCtx be nil?

Yes. I believe the errors I saw were in tests using hypothetical indexes.


pkg/sql/opt/xform/insert_funcs.go line 88 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.

OK, removed the function.


pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can't we only skip over the index join if there the filter is true?

As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?


pkg/sql/opt/xform/insert_funcs.go line 129 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In what case would it be a contradiction?

Maybe if there is a check constraint on the table that contradicts the unique check filters. That case is not optimized right now, e.g., by allowing the fast path checks, but skipping that particular check, so fast path is just disabled.


pkg/sql/opt/xform/insert_funcs.go line 166 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I don't think this is sufficient. You should use span.HasSingleKey here instead.

Modified as suggested


pkg/sql/opt/xform/insert_funcs.go line 170 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Length is likely a faster check than checking that it's a single key, so you could move this up.

Done


pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):
One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.

Is there any cases where we don't want to do a fast path because there is a more efficient alternative?

It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of FastPathUniqueChecksItem, push it into FastPathUniqueChecksItemPrivate and change this rule to only call GenerateConstrainedScans on the check relation. This way less possible exploration functions are fired.

I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 1640 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: unnecessary new line

Removed


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3863 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit "FK" should be "unique", correct?

Fixed


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 4144 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: replace tabs with spaces here

Done

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.


pkg/sql/logictest/testdata/logic_test/unique line 965 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Don't you want these tables to mimic regional by row tables with a UNIQUE(prefix_col, other_cols...) and UNIQUE WITHOUT INDEX (other_cols...)? Or is there another reason for these tests?

No, logic tests for REGIONAL BY ROW tables don't need to use mimicking, and are in file regional_by_row_query_behavior. These are additional tests that check that for general functionality of UNIQUE WITHOUT INDEX with insert fast path.


pkg/sql/logictest/testdata/logic_test/unique line 1009 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.

@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from 0e9e42f to 4fba820 Compare October 6, 2023 08:02
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior

Don't forget the other EXPLAIN tests below.

Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.


pkg/sql/opt/exec/execbuilder/testdata/unique line 4800 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This should mimic a RBR table, shouldn't it?

No, these are general UNIQUE WITHOUT INDEX cases. I've gone ahead and added a test which mimics an RBR table though.

@msirek msirek requested a review from mgartner October 6, 2023 14:37
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r4, 10 of 10 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)


pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Done

This should still be an assertion error if the key values cannot be found, like it was before, but without the panic, because it is unexpected that we would be unable to find the key values and because mkUniqueCheckErr has not been proven to work with partial or no key values.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.

Any reason not to just return uniqueChecks, memo.FastPathUniqueChecksItem{} here then?


pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Yes. I believe the errors I saw were in tests using hypothetical indexes.

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.


pkg/sql/opt/xform/insert_funcs.go line 125 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?

Right, sorry I was confused.


pkg/sql/opt/xform/insert_funcs.go line 118 at r6 (raw file):

	var scanExpr *memo.ScanExpr
	for check = check.FirstExpr(); check != nil; check = check.NextExpr() {

Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.


pkg/sql/opt/xform/insert_funcs.go line 168 at r6 (raw file):

		}
		// ... and that the span has a single key.
		if !span.HasSingleKey(scanExpr.Memo().EvalContext()) {

nit: use c.e.evalCtx for consistency instead of scanExpr.Memo().EvalContext().


pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.

Is there any cases where we don't want to do a fast path because there is a more efficient alternative?

It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of FastPathUniqueChecksItem, push it into FastPathUniqueChecksItemPrivate and change this rule to only call GenerateConstrainedScans on the check relation. This way less possible exploration functions are fired.

I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.

The exploration phase's purpose is to explore alternative query plans, so the question I was asking is "are there any cases where we would prefer a query plan without an insert fast path over a query with a fast path insert"? If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.

I agree there is some benefit to relying on exsiting xform logic for check constraint filters, derivation of hash-sharded column constraints—but my comment here isn't related to that logic, it's related to the logic of this rule specifically.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3758 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.

Ahh good point.


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.

Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.


pkg/sql/opt/exec/execbuilder/testdata/unique line 5370 at r6 (raw file):

  INDEX (b,a) STORING(pk2, j),
  INVERTED INDEX (j),
  FAMILY (pk, pk2, a, b)

This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:

CREATE TABLE t (
  a INT,
  UNIQUE (a)
) REGIONAL BY ROW;
-- Is equivalent to:
CREATE TABLE t (
  region STRING, -- Using STRING, but would be crdb_region in practice.
  a INT,
  UNIQUE (region, a),
  UNIQUE WITHOUT INDEX (a),
  CHECK (region IN (...))
);

@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from 4fba820 to e33ad40 Compare October 6, 2023 17:31
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/sql/opt/exec/execbuilder/mutation.go line 881 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This should still be an assertion error if the key values cannot be found, like it was before, but without the panic, because it is unexpected that we would be unable to find the key values and because mkUniqueCheckErr has not been proven to work with partial or no key values.

Makes sense. Made the change.


pkg/sql/opt/optbuilder/mutation_builder_unique.go line 561 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Any reason not to just return uniqueChecks, memo.FastPathUniqueChecksItem{} here then?

Yes, that's cleaner. I've moved the building of fastPathChecks below uniqueChecks.


pkg/sql/opt/xform/insert_funcs.go line 43 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.

I'll comment this out and see which, if any, tests fail. Then we can either remove it or add a comment on it.


pkg/sql/opt/xform/insert_funcs.go line 168 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: use c.e.evalCtx for consistency instead of scanExpr.Memo().EvalContext().

Done


pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior line 3763 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.

Ahh, corrected this. Thanks!


pkg/sql/opt/exec/execbuilder/testdata/unique line 5370 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:

CREATE TABLE t (
  a INT,
  UNIQUE (a)
) REGIONAL BY ROW;
-- Is equivalent to:
CREATE TABLE t (
  region STRING, -- Using STRING, but would be crdb_region in practice.
  a INT,
  UNIQUE (region, a),
  UNIQUE WITHOUT INDEX (a),
  CHECK (region IN (...))
);

I think I'm doing about the same thing, but in your version you have UNIQUE (region, a) while I have INDEX (b,a) STORING(pk2, j). Maybe the underlying index should be unique too, so I've updated the test case to use UNIQUE INDEX.

@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from e33ad40 to 7edd6d2 Compare October 7, 2023 18:05
@msirek
Copy link
Contributor Author

msirek commented Oct 7, 2023

Can evalCtx be nil?

Yes. I believe the errors I saw were in tests using hypothetical indexes.

That's surprising because there are many places where c.e.evalCtx is accessed without checking if it's nil, like here:

if !c.e.evalCtx.SessionData().LocalityOptimizedSearch {

Do you think it can now be nil due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx are not affected.

I retested this with the nil pointer check removed, and no errors occurred, so I have removed it.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/sql/opt/xform/insert_funcs.go line 118 at r6 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.

As discussed offline it doesn't appear that the optgen language provides pattern matching functionality with state, meaning we'd have to generate a new expression for each matched FastPathUniqueChecksItem.
As agreed, keeping this as-is for now.

Possible alternative mentioned offline: 'A normalization rule that just "fills in" the fast path unique checks when a constrained scan is built within it'


pkg/sql/opt/xform/rules/insert.opt line 16 at r1 (raw file):
No, I don't think non-fast path would ever be preferred over fast path. There should be less overhead and more parallelization possibilities with fast path.

If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.

The optgen language doesn't seem to have a component which allows pattern matching with state (e.g. collection of a list of items which all satisfy a given property, with one resulting transformation). So, the only alternative is logic in a custom function.

@msirek msirek requested a review from mgartner October 7, 2023 18:21
@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from 7edd6d2 to 21d1e8b Compare October 7, 2023 19:07
@msirek
Copy link
Contributor Author

msirek commented Oct 7, 2023

I've added a check to disallow fast path uniqueness checks involving volatile expressions, as discussed offline.
This could potentially be supported in the future by grabbing the value from the insert row itself, instead of the constraint span, in the execbuilder.

Mark Sirek added 2 commits October 7, 2023 14:49
This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

- a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
`id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
- a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in `FastPathUniqueChecks` of
the `InsertExpr`, to find any which have been rewritten as a constrained
`ScanExpr`. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the `FastPathUniqueChecks.Check`
relation.  If a constrained scan is found, it is used to build elements
of the `FastPathUniqueChecksItemPrivate` structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new `DatumsFromConstraint` field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the `InsertFastPathFKUniqCheck` structure
in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built
into a new the corresponding `FastPathUniqueChecksItem`s of a new
`FastPathUniqueChecksExpr` and communicated to the caller via return
value `newFastPathUniqueChecks`.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: cockroachdb#58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.
… be built

This commit avoids unnecessary processing of insert fast path uniqueness
checks by modifying the optbuilder to avoid building a
`FastPathUniqueChecksExpr` in the Insert expression if one or more
of the required `FastPathUniqueChecksItem`s could not be built.

Epic: CRDB-26290
Informs: cockroachdb#58047

Release note: None
@msirek msirek force-pushed the insert-fast-path-uniqueness-checks branch from 21d1e8b to 2fefefc Compare October 7, 2023 21:49
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 9 files at r8, 1 of 6 files at r10, 5 of 5 files at r14, 5 of 5 files at r15, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)


pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

# random() because the evaluation used in the check relation may differ from the
# value used in the insert row.
opt format=show-fastpathchecks expect-not=InsertFastPath

This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!
bors r=mgartner

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.

Perhaps, but this test also has expect and expect-not checks to test when the rule fires, so we'd have to keep a duplicate of the test as a rules test to keep that functionality.

@craig
Copy link
Contributor

craig bot commented Oct 9, 2023

Build failed (retrying...):

@msirek
Copy link
Contributor Author

msirek commented Oct 9, 2023

bors retry

@craig
Copy link
Contributor

craig bot commented Oct 9, 2023

Already running a review

@craig
Copy link
Contributor

craig bot commented Oct 9, 2023

Build succeeded:

@craig craig bot merged commit 69731b1 into cockroachdb:master Oct 9, 2023
3 checks passed
@mgartner
Copy link
Collaborator

mgartner commented Oct 9, 2023

pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, msirek (Mark Sirek) wrote…

Perhaps, but this test also has expect and expect-not checks to test when the rule fires, so we'd have to keep a duplicate of the test as a rules test to keep that functionality.

The expect-not isn't really relevant here—you're testing that optbuilder doesn't build a fk-fast-path-check if an expression in the VALUES clause of the insert is volatile, which is not related to the exploration rule.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/xform/testdata/rules/insert line 344 at r14 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

The expect-not isn't really relevant here—you're testing that optbuilder doesn't build a fk-fast-path-check if an expression in the VALUES clause of the insert is volatile, which is not related to the exploration rule.

Sorry, I thought you meant change the whole test file to be an optbuilder test. Sure, I can open a PR to move this one test somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: add support for using the insert fast path with uniqueness checks
3 participants