-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fast path uniqueness checks for single-row insert #111822
sql: fast path uniqueness checks for single-row insert #111822
Conversation
This PR contains the main commit enabling insert fast-path uniqueness checks, split off from #102995. |
2e69f03
to
1a95cc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 20 of 20 files at r1, 2 of 2 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)
-- commits
line 21 at r1:
nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.
pkg/sql/opt/exec/execbuilder/mutation.go
line 182 at r1 (raw file):
execFastPathCheck.InsertCols[j] = exec.TableColumnOrdinal(md.ColumnMeta(insertCol).Table.ColumnOrdinal(insertCol)) } execFastPathCheck.MatchMethod = tree.MatchFull
I beleive match method is only relevant for foriegn key checks. Why is it needed here?
pkg/sql/opt/exec/execbuilder/mutation.go
line 881 at r1 (raw file):
} if !found { panic(errors.AssertionFailedf(
It's probably not guaranteed that a panic will be caught in the context that execFastPathCheck.MkErr
is used. There's no benefit to panicking here if we've already plumbed a return error
. So to avoid crashing the node, let'd just return the error here.
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 553 at r3 (raw file):
newPossibleScan := newScanScope.expr if skipProjectExpr, ok := newPossibleScan.(*memo.ProjectExpr); ok { newPossibleScan = skipProjectExpr.Input
It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 561 at r3 (raw file):
} else { // Don't build a fast-path check if we failed to create a new ScanExpr. buildFastPathCheck = false
Looks like there isn't coverage here either. Is it needed?
pkg/sql/opt/xform/insert_funcs.go
line 43 at r1 (raw file):
return memo.FastPathUniqueChecksExpr{}, false } if c.e.evalCtx == nil {
Can evalCtx be nil
?
pkg/sql/opt/xform/insert_funcs.go
line 88 at r1 (raw file):
// appendNewUniqueChecksItem handles building of the FastPathCheck into a // new UniqueChecksItem for the new InsertExpr being constructed. appendNewUniqueChecksItem := func(private *memo.FastPathUniqueChecksItemPrivate) {
nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.
pkg/sql/opt/xform/insert_funcs.go
line 125 at r1 (raw file):
if indexJoinExpr, isIndexJoin := expr.(*memo.IndexJoinExpr); isIndexJoin { expr = indexJoinExpr.Input
Can't we only skip over the index join if there the filter is true
?
pkg/sql/opt/xform/insert_funcs.go
line 129 at r1 (raw file):
if scan, isScan := expr.(*memo.ScanExpr); isScan { // We need a scan with a constraint for analysis, and it should not be // a contradiction (having no spans).
In what case would it be a contradiction?
pkg/sql/opt/xform/insert_funcs.go
line 166 at r1 (raw file):
span := scanExpr.Constraint.Spans.Get(j) // Verify there is a single key... if span.Prefix(scanExpr.Memo().EvalContext()) != span.StartKey().Length() {
I don't think this is sufficient. You should use span.HasSingleKey
here instead.
pkg/sql/opt/xform/insert_funcs.go
line 170 at r1 (raw file):
} // ... and that the span has the same number of columns as the index key. if span.StartKey().Length() != numKeyCols {
nit: Length is likely a faster check than checking that it's a single key, so you could move this up.
pkg/sql/opt/xform/rules/insert.opt
line 16 at r1 (raw file):
# only applies to an insert of values, and not other forms like INSERT SELECT. # If fast path information has already been built, this rule will not try to # rewrite the insert a second time.
I know we've spoken about this before, but can you explain why this logic is done during the exploration phase? Is there any cases where we don't want to do a fast path because there is a more efficient alternative?
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 1640 at r1 (raw file):
table: regional_by_row_table@new_idx spans: [/'ca-central-1'/1/1 - /'ca-central-1'/1/1] [/'us-east-1'/1/1 - /'us-east-1'/1/1]
nit: unnecessary new line
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3758 at r1 (raw file):
statement error pq: duplicate key value violates unique constraint "users2_email_key"\nDETAIL: Key \(email\)=\('[email protected]'\) already exists\. INSERT INTO users2 (crdb_region, name, email) VALUES ('ca-central-1', 'Craig Roacher', '[email protected]')
Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3763 at r1 (raw file):
query T EXPLAIN INSERT INTO users2 (name, email) VALUES ('Bill Roacher', '[email protected]'), ('Jill Roacher', '[email protected]')
We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
Don't forget the other EXPLAIN tests below.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3863 at r1 (raw file):
ALTER TABLE users2 ADD CONSTRAINT users4_fk FOREIGN KEY (name) REFERENCES users4 (name) ON DELETE CASCADE # Verify multi-row insert fast path detects FK constraint violations.
nit "FK" should be "unique", correct?
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 4144 at r1 (raw file):
statement ok CREATE TABLE hash_sharded_rbr_computed ( region_id STRING(10) NOT NULL,
nit: replace tabs with spaces here
pkg/sql/logictest/testdata/logic_test/unique
line 965 at r1 (raw file):
INDEX (b,a) STORING(pk2, j), INVERTED INDEX (j), FAMILY (pk, pk2, a, b)
Don't you want these tables to mimic regional by row tables with a UNIQUE(prefix_col, other_cols...)
and UNIQUE WITHOUT INDEX (other_cols...)
? Or is there another reason for these tests?
pkg/sql/logictest/testdata/logic_test/unique
line 1009 at r1 (raw file):
statement error pq: duplicate key value violates unique constraint "unique_b_c"\nDETAIL: Key \(b, c\)=\(2, 3\) already exists\. INSERT INTO multiple_uniq (a,c,b,d) VALUES (11,3,2,14), (15,16,17,18)
Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.
pkg/sql/opt/exec/execbuilder/testdata/unique
line 4800 at r1 (raw file):
INDEX (b,a) STORING(pk2, j), INVERTED INDEX (j), FAMILY (pk, pk2, a, b)
This should mimic a RBR table, shouldn't it?
1a95cc8
to
0e9e42f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
Previously, mgartner (Marcus Gartner) wrote…
nit: I'm not sure it's necessary to make this point—it's equivalent to the first point.
OK, removed that point
pkg/sql/opt/exec/execbuilder/mutation.go
line 182 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I beleive match method is only relevant for foriegn key checks. Why is it needed here?
You're right. Removed it.
pkg/sql/opt/exec/execbuilder/mutation.go
line 881 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It's probably not guaranteed that a panic will be caught in the context that
execFastPathCheck.MkErr
is used. There's no benefit to panicking here if we've already plumbed a returnerror
. So to avoid crashing the node, let'd just return the error here.
Done
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 553 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It looks like you're lacking test coverage of this case. Do you have an example of when it is needed?
Hash-sharded RBR tables sometimes need to do this, like the following case:
https://github.com/msirek/cockroach/blob/6ab911a9d6c8efe420487224d8c5125ec4741001/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior#L4156-L4171
Added a code comment.
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 561 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Looks like there isn't coverage here either. Is it needed?
Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.
pkg/sql/opt/xform/insert_funcs.go
line 43 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can evalCtx be
nil
?
Yes. I believe the errors I saw were in tests using hypothetical indexes.
pkg/sql/opt/xform/insert_funcs.go
line 88 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: I don't see much benefit in encapsulating this logic in an anonymous function when it's only used once below and it's fairly simple.
OK, removed the function.
pkg/sql/opt/xform/insert_funcs.go
line 125 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can't we only skip over the index join if there the filter is
true
?
As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?
pkg/sql/opt/xform/insert_funcs.go
line 129 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
In what case would it be a contradiction?
Maybe if there is a check constraint on the table that contradicts the unique check filters. That case is not optimized right now, e.g., by allowing the fast path checks, but skipping that particular check, so fast path is just disabled.
pkg/sql/opt/xform/insert_funcs.go
line 166 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't think this is sufficient. You should use
span.HasSingleKey
here instead.
Modified as suggested
pkg/sql/opt/xform/insert_funcs.go
line 170 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Length is likely a faster check than checking that it's a single key, so you could move this up.
Done
pkg/sql/opt/xform/rules/insert.opt
line 16 at r1 (raw file):
One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.
Is there any cases where we don't want to do a fast path because there is a more efficient alternative?
It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of FastPathUniqueChecksItem
, push it into FastPathUniqueChecksItemPrivate
and change this rule to only call GenerateConstrainedScans
on the check relation. This way less possible exploration functions are fired.
I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 1640 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: unnecessary new line
Removed
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3863 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit "FK" should be "unique", correct?
Fixed
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 4144 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: replace tabs with spaces here
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3758 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table.
Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.
pkg/sql/logictest/testdata/logic_test/unique
line 965 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Don't you want these tables to mimic regional by row tables with a
UNIQUE(prefix_col, other_cols...)
andUNIQUE WITHOUT INDEX (other_cols...)
? Or is there another reason for these tests?
No, logic tests for REGIONAL BY ROW tables don't need to use mimicking, and are in file regional_by_row_query_behavior. These are additional tests that check that for general functionality of UNIQUE WITHOUT INDEX with insert fast path.
pkg/sql/logictest/testdata/logic_test/unique
line 1009 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can you add a test where the two insert rows conflict with each other, but do not conflict with existing rows in the table here too.
Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.
0e9e42f
to
4fba820
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3763 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
We typically don't put EXPLAIN queries in logictests because they can flake due to the random column families that logic tests can assign. EXPLAIN queries can go in logic tests within execbuilder tests where those mutations are not made. But since these require CCL features, I'd move these to a separate query_behavior test file and make the column families explicit so they don't flake. Example: https://github.com/cockroachdb/cockroach/blob/b34827ad7e26169dbc7213ecb542194e06376624/pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
Don't forget the other EXPLAIN tests below.
Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.
pkg/sql/opt/exec/execbuilder/testdata/unique
line 4800 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This should mimic a RBR table, shouldn't it?
No, these are general UNIQUE WITHOUT INDEX cases. I've gone ahead and added a test which mimics an RBR table though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r4, 10 of 10 files at r6, 5 of 5 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)
pkg/sql/opt/exec/execbuilder/mutation.go
line 881 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Done
This should still be an assertion error if the key values cannot be found, like it was before, but without the panic
, because it is unexpected that we would be unable to find the key values and because mkUniqueCheckErr
has not been proven to work with partial or no key values.
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 561 at r3 (raw file):
Previously, msirek (Mark Sirek) wrote…
Even if a test case cannot be found, this code is needed because if the new scan cannot be constructed or found, there is no way to build a fast path check for this case.
Any reason not to just return uniqueChecks, memo.FastPathUniqueChecksItem{}
here then?
pkg/sql/opt/xform/insert_funcs.go
line 43 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Yes. I believe the errors I saw were in tests using hypothetical indexes.
That's surprising because there are many places where c.e.evalCtx
is accessed without checking if it's nil
, like here:
cockroach/pkg/sql/opt/xform/scan_funcs.go
Line 115 in 452a2de
if !c.e.evalCtx.SessionData().LocalityOptimizedSearch { |
Do you think it can now be nil
due to some change in this PR? If this is now required we need to understand why and verify that other accesses of c.e.evalCtx
are not affected.
pkg/sql/opt/xform/insert_funcs.go
line 125 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
As far as I can tell, index join has no filter. It just unconditionally looks up the primary index row, right?
Right, sorry I was confused.
pkg/sql/opt/xform/insert_funcs.go
line 118 at r6 (raw file):
var scanExpr *memo.ScanExpr for check = check.FirstExpr(); check != nil; check = check.NextExpr() {
Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.
pkg/sql/opt/xform/insert_funcs.go
line 168 at r6 (raw file):
} // ... and that the span has a single key. if !span.HasSingleKey(scanExpr.Memo().EvalContext()) {
nit: use c.e.evalCtx
for consistency instead of scanExpr.Memo().EvalContext()
.
pkg/sql/opt/xform/rules/insert.opt
line 16 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
One reason is that placeholders are not filled in during optbuild, so for normalization vs. exploration, exploration phase is required. For execbuilder vs. exploration, building of check constraint filters, derivation of hash-sharded column constraints, invisible index handling and costing occur during exploration. Re-implementing these in the execbuilder is possible, but means we'd have essentially duplicated logic for every new feature which is added needing fast-path support.
Is there any cases where we don't want to do a fast path because there is a more efficient alternative?
It's true, exploration has some processing overhead. The overhead may be very low compared to latencies of performing the actual KV lookups as it's just exploring a single Select and Scan. Though, it may be possible to further optimize it. For example, instead of keeping the fast path check relation as a child of
FastPathUniqueChecksItem
, push it intoFastPathUniqueChecksItemPrivate
and change this rule to only callGenerateConstrainedScans
on the check relation. This way less possible exploration functions are fired.I guess the point is that there are benefits to keeping the logic in the optimizer/exploration phase, the overhead of exploring a single Select/Scan is low compared to KV lookup times, and there are ways to reduce exploration costs by limiting exploration to a single rule, if need be.
The exploration phase's purpose is to explore alternative query plans, so the question I was asking is "are there any cases where we would prefer a query plan without an insert fast path over a query with a fast path insert"? If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.
I agree there is some benefit to relying on exsiting xform logic for check constraint filters, derivation of hash-sharded column constraints—but my comment here isn't related to that logic, it's related to the logic of this rule specifically.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3758 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Multirow insert doesn't currently use fast path, so that would just be testing pre-existing functionality.
Ahh good point.
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3763 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Thanks for the tip. OK, I've moved the EXPLAIN queries where the tables don't define column families to a new file, and kept the actual query runs in place so that random column families will provide better test coverage. For tables already defining column families, I left those EXPLAIN queries in place so you can see the query plan alongside the actual execution.
Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.
pkg/sql/opt/exec/execbuilder/testdata/unique
line 5370 at r6 (raw file):
INDEX (b,a) STORING(pk2, j), INVERTED INDEX (j), FAMILY (pk, pk2, a, b)
This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:
CREATE TABLE t (
a INT,
UNIQUE (a)
) REGIONAL BY ROW;
-- Is equivalent to:
CREATE TABLE t (
region STRING, -- Using STRING, but would be crdb_region in practice.
a INT,
UNIQUE (region, a),
UNIQUE WITHOUT INDEX (a),
CHECK (region IN (...))
);
4fba820
to
e33ad40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/exec/execbuilder/mutation.go
line 881 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This should still be an assertion error if the key values cannot be found, like it was before, but without the
panic
, because it is unexpected that we would be unable to find the key values and becausemkUniqueCheckErr
has not been proven to work with partial or no key values.
Makes sense. Made the change.
pkg/sql/opt/optbuilder/mutation_builder_unique.go
line 561 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Any reason not to just
return uniqueChecks, memo.FastPathUniqueChecksItem{}
here then?
Yes, that's cleaner. I've moved the building of fastPathChecks below uniqueChecks.
pkg/sql/opt/xform/insert_funcs.go
line 43 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
That's surprising because there are many places where
c.e.evalCtx
is accessed without checking if it'snil
, like here:cockroach/pkg/sql/opt/xform/scan_funcs.go
Line 115 in 452a2de
if !c.e.evalCtx.SessionData().LocalityOptimizedSearch { Do you think it can now be
nil
due to some change in this PR? If this is now required we need to understand why and verify that other accesses ofc.e.evalCtx
are not affected.
I'll comment this out and see which, if any, tests fail. Then we can either remove it or add a comment on it.
pkg/sql/opt/xform/insert_funcs.go
line 168 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: use
c.e.evalCtx
for consistency instead ofscanExpr.Memo().EvalContext()
.
Done
pkg/ccl/logictestccl/testdata/logic_test/regional_by_row_query_behavior
line 3763 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Thanks. You've switched the file names though—"query_behavior" should be the file with the EXPLAIN plans.
Ahh, corrected this. Thanks!
pkg/sql/opt/exec/execbuilder/testdata/unique
line 5370 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This does not mimic a RBR table. A unique index in a RBR table is syntactic sugar:
CREATE TABLE t ( a INT, UNIQUE (a) ) REGIONAL BY ROW; -- Is equivalent to: CREATE TABLE t ( region STRING, -- Using STRING, but would be crdb_region in practice. a INT, UNIQUE (region, a), UNIQUE WITHOUT INDEX (a), CHECK (region IN (...)) );
I think I'm doing about the same thing, but in your version you have UNIQUE (region, a)
while I have INDEX (b,a) STORING(pk2, j). Maybe the underlying index should be unique too, so I've updated the test case to use UNIQUE INDEX.
e33ad40
to
7edd6d2
Compare
I retested this with the nil pointer check removed, and no errors occurred, so I have removed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/xform/insert_funcs.go
line 118 at r6 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Iterating through members of the memo group is an odd thing to do in an exploration rule. Typically (maybe always) we rely on the code generated for all exploration rules to explore each equivalent expression in a group—I believe you'd need to move some of the pattern matching into the optgen portion of the rule to achieve this. Can you add an explanation why this iteration is done instead, if there is a specific reason.
As discussed offline it doesn't appear that the optgen language provides pattern matching functionality with state, meaning we'd have to generate a new expression for each matched FastPathUniqueChecksItem
.
As agreed, keeping this as-is for now.
Possible alternative mentioned offline: 'A normalization rule that just "fills in" the fast path unique checks when a constrained scan is built within it'
pkg/sql/opt/xform/rules/insert.opt
line 16 at r1 (raw file):
No, I don't think non-fast path would ever be preferred over fast path. There should be less overhead and more parallelization possibilities with fast path.
If the answer is "no", this rule needs some explanation for why this rule breaks from the normal pattern of exploration rules.
The optgen language doesn't seem to have a component which allows pattern matching with state (e.g. collection of a list of items which all satisfy a given property, with one resulting transformation). So, the only alternative is logic in a custom function.
7edd6d2
to
21d1e8b
Compare
I've added a check to disallow fast path uniqueness checks involving volatile expressions, as discussed offline. |
This adds support for building and executing simple INSERT statements with a single-row VALUES clause where any required uniqueness constraint checks can be handled via a constrained scan on an index. This includes INSERT cases such as: - a single-row VALUES clause into a REGIONAL BY ROW table with a PRIMARY KEY which has a UUID column generated by default, ie. `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the crdb_region column is not specified in the VALUES clause; either the gateway region is used or it is computed based on other column values. - a single-row VALUES clause into a REGIONAL BY ROW table with a hash sharded unique index where the crdb_region column is not specified in the VALUES clause In optbuild, when creating a uniqueness check for rows which are added to a table, a fast path index check relation is also built when the mutation source is a single-row values expression or WithScan from a single-row values expression. That relation is a filtered Select of a Scan from the target table, where the filters equate all of the unique check columns with their corresponding constants or placeholders from the Values expression. If there is a uniqueness check with a partial index predicate, fast path is disallowed. A new exploration rule called InsertFastPath is added to walk the memo group members created during exploration in `FastPathUniqueChecks` of the `InsertExpr`, to find any which have been rewritten as a constrained `ScanExpr`. If found, that means that Scan fully represents the lookup needed to check for duplicate entries and the Scan constraint can be used to identify the constants to use in a KV lookup on the scanned index in a fast path check. Function CanUseUniqueChecksForInsertFastPath walks the expressions generated during exploration of the `FastPathUniqueChecks.Check` relation. If a constrained scan is found, it is used to build elements of the `FastPathUniqueChecksItemPrivate` structure to communicate to the execbuilder the table and index to use for the check, and the column ids in the insert row to use for building the fast path KV request. In addition, a new `DatumsFromConstraint` field is added, consisting of a ScalarListExpr of TupleExprs specifying the index key, which allows an index key column to be matched with multiple Datums. One insert row may result in more than one KV lookup for a given uniqueness constraint. These items are used to build the `InsertFastPathFKUniqCheck` structure in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built into a new the corresponding `FastPathUniqueChecksItem`s of a new `FastPathUniqueChecksExpr` and communicated to the caller via return value `newFastPathUniqueChecks`. A small adjustment is made in the coster to make the fast path unique constraint slightly cheaper, so it should always be chosen over the original non-fast path check. Epic: CRDB-26290 Fixes: cockroachdb#58047 Release note (performance improvement): This patch adds support for insert fast-path uniqueness checks on REGIONAL BY ROW tables where the source is a VALUES clause with a single row. This results in a reduction in latency for single-row inserts to REGIONAL BY ROW tables and hash-sharded REGIONAL BY ROW tables with unique indexes.
… be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
21d1e8b
to
2fefefc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r8, 1 of 6 files at r10, 5 of 5 files at r14, 5 of 5 files at r15, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @msirek and @rharding6373)
pkg/sql/opt/xform/testdata/rules/insert
line 344 at r14 (raw file):
# random() because the evaluation used in the check relation may differ from the # value used in the insert row. opt format=show-fastpathchecks expect-not=InsertFastPath
This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=mgartner
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/xform/testdata/rules/insert
line 344 at r14 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This check is within optbuilder, so it'd make more sense to put the test in an optbuilder test—in addition optbuilder tests print the fast path check expressions by default, which is nice.
Perhaps, but this test also has expect
and expect-not
checks to test when the rule fires, so we'd have to keep a duplicate of the test as a rules test to keep that functionality.
Build failed (retrying...): |
bors retry |
Already running a review |
Build succeeded: |
Previously, msirek (Mark Sirek) wrote…
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/opt/xform/testdata/rules/insert
line 344 at r14 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
The
expect-not
isn't really relevant here—you're testing that optbuilder doesn't build a fk-fast-path-check if an expression in the VALUES clause of the insert is volatile, which is not related to the exploration rule.
Sorry, I thought you meant change the whole test file to be an optbuilder test. Sure, I can open a PR to move this one test somewhere else.
This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.
This includes INSERT cases such as:
PRIMARY KEY which has a UUID column generated by default, ie.
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
, where thecrdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause
In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.
A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in
FastPathUniqueChecks
ofthe
InsertExpr
, to find any which have been rewritten as a constrainedScanExpr
. If found, that means that Scan fully represents the lookupneeded to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.
Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the
FastPathUniqueChecks.Check
relation. If a constrained scan is found, it is used to build elements
of the
FastPathUniqueChecksItemPrivate
structure to communicate to theexecbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new
DatumsFromConstraint
field is added, consisting of aScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the
InsertFastPathFKUniqCheck
structurein the execbuilder. The new
FastPathUniqueChecksItemPrivate
is builtinto a new the corresponding
FastPathUniqueChecksItem
s of a newFastPathUniqueChecksExpr
and communicated to the caller via returnvalue
newFastPathUniqueChecks
.A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.
Epic: CRDB-26290
Fixes: #58047
Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.
optbuilder: build insert fast-path uniq checks only if all checks can be built
This commit avoids unnecessary processing of insert fast path uniqueness
checks by modifying the optbuilder to avoid building a
FastPathUniqueChecksExpr
in the Insert expression if one or moreof the required
FastPathUniqueChecksItem
s could not be built.Epic: CRDB-26290
Informs: #58047
Release note: None