-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add support for using the insert fast path with uniqueness checks #58047
Comments
This has come up again as something that would be important for a customer using RBR tables. It would allow us to cut the number of round-trips required for an insert from 3 down to 1 or 2, which impacts latency a lot. Let's investigate how hard it would be to support this. |
In addition to an explicit fast path, a simpler option may be to simply parallelize checks with the mutation, as described in #58942:
|
@rytaft To clarify, is this issue open for just making insert fast-path uniqueness checks for REGIONAL BY ROW tables faster, or all uniqueness checks in general (including non-RBR tables)? The former may be easier to do. We could just build GET requests for all the possible regions besides the region being inserted to, and make them part of the batched KV requests (including PUTs) sent by the For general uniqueness checks, we probably can't make any assumptions about the form of the uniqueness checks and would have to keep them in their current form (a separate |
The issue is for uniqueness checks in general, but I assume that we won't be able to create a fast path for all of them. In the same way we have a fast path for certain types of foreign key checks, I think we should be able to support certain types of uniqueness checks. If we need to limit to RBR that's ok, but as always, it would be better to limit the optimizer's reliance on the RBR syntax and instead use lower level building blocks to make it more general. This function is what needs to be changed in the optimizer:
|
This adds support for building and executing simple INSERT statements with a single-row VALUES clause where any required uniqueness constraint checks can be handled via a constrained scan on an index. This includes INSERT cases such as: - a single-row VALUES clause into a REGIONAL BY ROW table with a PRIMARY KEY which has a UUID column generated by default, ie. `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the crdb_region column is not specified in the VALUES clause; either the gateway region is used or it is computed based on other column values. - a single-row VALUES clause into a REGIONAL BY ROW table with a hash sharded unique index where the crdb_region column is not specified in the VALUES clause In optbuild, when creating a uniqueness check for rows which are added to a table, a fast path index check relation is also built when the mutation source is a single-row values expression or WithScan from a single-row values expression. That relation is a filtered Select of a Scan from the target table, where the filters equate all of the unique check columns with their corresponding constants or placeholders from the Values expression. If there is a uniqueness check with a partial index predicate, fast path is disallowed. A new exploration rule called InsertFastPath is added to walk the memo group members created during exploration in `FastPathUniqueChecks` of the `InsertExpr`, to find any which have been rewritten as a constrained `ScanExpr`. If found, that means that Scan fully represents the lookup needed to check for duplicate entries and the Scan constraint can be used to identify the constants to use in a KV lookup on the scanned index in a fast path check. Function CanUseUniqueChecksForInsertFastPath walks the expressions generated during exploration of the `FastPathUniqueChecks.Check` relation. If a constrained scan is found, it is used to build elements of the `FastPathUniqueChecksItemPrivate` structure to communicate to the execbuilder the table and index to use for the check, and the column ids in the insert row to use for building the fast path KV request. In addition, a new `DatumsFromConstraint` field is added, consisting of a ScalarListExpr of TupleExprs specifying the index key, which allows an index key column to be matched with multiple Datums. One insert row may result in more than one KV lookup for a given uniqueness constraint. These items are used to build the `InsertFastPathFKUniqCheck` structure in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built into a new the corresponding `FastPathUniqueChecksItem`s of a new `FastPathUniqueChecksExpr` and communicated to the caller via return value `newFastPathUniqueChecks`. A small adjustment is made in the coster to make the fast path unique constraint slightly cheaper, so it should always be chosen over the original non-fast path check. Epic: CRDB-26290 Fixes: cockroachdb#58047 Release note (performance improvement): This patch adds support for insert fast-path uniqueness checks on REGIONAL BY ROW tables where the source is a VALUES clause with a single row. This results in a reduction in latency for single-row inserts to REGIONAL BY ROW tables and hash-sharded REGIONAL BY ROW tables with unique indexes.
… be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
This adds support for building and executing simple INSERT statements with a single-row VALUES clause where any required uniqueness constraint checks can be handled via a constrained scan on an index. This includes INSERT cases such as: - a single-row VALUES clause into a REGIONAL BY ROW table with a PRIMARY KEY which has a UUID column generated by default, ie. `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the crdb_region column is not specified in the VALUES clause; either the gateway region is used or it is computed based on other column values. - a single-row VALUES clause into a REGIONAL BY ROW table with a hash sharded unique index where the crdb_region column is not specified in the VALUES clause In optbuild, when creating a uniqueness check for rows which are added to a table, a fast path index check relation is also built when the mutation source is a single-row values expression or WithScan from a single-row values expression. That relation is a filtered Select of a Scan from the target table, where the filters equate all of the unique check columns with their corresponding constants or placeholders from the Values expression. If there is a uniqueness check with a partial index predicate, fast path is disallowed. A new exploration rule called InsertFastPath is added to walk the memo group members created during exploration in `FastPathUniqueChecks` of the `InsertExpr`, to find any which have been rewritten as a constrained `ScanExpr`. If found, that means that Scan fully represents the lookup needed to check for duplicate entries and the Scan constraint can be used to identify the constants to use in a KV lookup on the scanned index in a fast path check. Function CanUseUniqueChecksForInsertFastPath walks the expressions generated during exploration of the `FastPathUniqueChecks.Check` relation. If a constrained scan is found, it is used to build elements of the `FastPathUniqueChecksItemPrivate` structure to communicate to the execbuilder the table and index to use for the check, and the column ids in the insert row to use for building the fast path KV request. In addition, a new `DatumsFromConstraint` field is added, consisting of a ScalarListExpr of TupleExprs specifying the index key, which allows an index key column to be matched with multiple Datums. One insert row may result in more than one KV lookup for a given uniqueness constraint. These items are used to build the `InsertFastPathFKUniqCheck` structure in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built into a new the corresponding `FastPathUniqueChecksItem`s of a new `FastPathUniqueChecksExpr` and communicated to the caller via return value `newFastPathUniqueChecks`. A small adjustment is made in the coster to make the fast path unique constraint slightly cheaper, so it should always be chosen over the original non-fast path check. Epic: CRDB-26290 Fixes: cockroachdb#58047 Release note (performance improvement): This patch adds support for insert fast-path uniqueness checks on REGIONAL BY ROW tables where the source is a VALUES clause with a single row. This results in a reduction in latency for single-row inserts to REGIONAL BY ROW tables and hash-sharded REGIONAL BY ROW tables with unique indexes.
… be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
… be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
110870: sql: anonymize function names for error reporting and telemetry r=rharding6373 a=rharding6373 Previously, we did not anonymize/redact function names because we only supported built-in, CRDB-named functions. However, since adding support for UDFs, function names may be user-provided. We need to omit user-provided names in error reporting and telemetry. We fix this oversight in this PR by no longer overriding the anonymizing format flag and redaction flag for FuncExprs. Unfortunately, since any functions in a query are not always resolved when the error occurs, we may also redact built-in function names. Epic: None Fixes: None Release note (sql change): Anonymizes function names in error reporting and telemetry so that user-provided names for UDFs are not included. 111447: kvcoord: provide local node ID to DistSender in shared-process mode r=yuzefovich a=yuzefovich This commit makes it so that the DistSender of SQL pods running in shared-process mode have access to the node ID of the local KV node. The node ID is used to preferentially route requests to replicas available on the local node. Previously, this optimization was only available in single-tenant mode, and now it'll be also available in shared-process multi-tenant. The optimization doesn't exist in separate-process multi-tenant, so in that mode we continue to return `0` as the node ID which makes it so that replicas are picked randomly. For testing, this commit extends existing `TestSecondaryTenantFollowerReadsRouting` to also have a shared-process test configuration. There were a few gotchas in that adjustment: - for simplicity, we need to create a single region cluster so that latency and locality don't come into consideration when choosing the replica (we want to make sure that the local replica is used if available) - starting shared-process virtual cluster on a multi node cluster can be racy if we want to provide testing knobs, so we have to start the service on the gateway node first. (If we don't start with the gateway, then the server controller on the gateway could start the tenant server with empty knobs _before_ we explicitly start the service there.) - previously the test relied on retrieving the follower reads metric counters from all nodes and asserting only for `n2` it increased. However, in shared-process config it is now possible for an internal query (e.g. issued by auto stats collection) to be served via follower reads too, which would increment the counter confusing the test. As a result, this commit adjusts the test to audit the trace of the query and verify that the single follower read was served specifically by `n2`. Fixes: #109522. Release note: None 111798: cli,server: allow user to specify port range for secondary VCs r=knz a=stevendanna Previously, the user could set a port range offset that would instruct the tenant controller to choose a gRPC port starting at the KV server's RPC port + the offset + an index. While this provides the user some predictability, there is no upper bound on the port that could be picked. Here, we add a new option --secondary-tenant-port-range That allows the user to specify an explicit range. Fixes #111848. Epic: CRDB-26691 Release note: None 111822: sql: fast path uniqueness checks for single-row insert r=mgartner a=msirek This adds support for building and executing simple INSERT statements with a single-row VALUES clause where any required uniqueness constraint checks can be handled via a constrained scan on an index. This includes INSERT cases such as: - a single-row VALUES clause into a REGIONAL BY ROW table with a PRIMARY KEY which has a UUID column generated by default, ie. `id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the crdb_region column is not specified in the VALUES clause; either the gateway region is used or it is computed based on other column values. - a single-row VALUES clause into a REGIONAL BY ROW table with a hash sharded unique index where the crdb_region column is not specified in the VALUES clause In optbuild, when creating a uniqueness check for rows which are added to a table, a fast path index check relation is also built when the mutation source is a single-row values expression or WithScan from a single-row values expression. That relation is a filtered Select of a Scan from the target table, where the filters equate all of the unique check columns with their corresponding constants or placeholders from the Values expression. If there is a uniqueness check with a partial index predicate, fast path is disallowed. A new exploration rule called InsertFastPath is added to walk the memo group members created during exploration in `FastPathUniqueChecks` of the `InsertExpr`, to find any which have been rewritten as a constrained `ScanExpr`. If found, that means that Scan fully represents the lookup needed to check for duplicate entries and the Scan constraint can be used to identify the constants to use in a KV lookup on the scanned index in a fast path check. Function CanUseUniqueChecksForInsertFastPath walks the expressions generated during exploration of the `FastPathUniqueChecks.Check` relation. If a constrained scan is found, it is used to build elements of the `FastPathUniqueChecksItemPrivate` structure to communicate to the execbuilder the table and index to use for the check, and the column ids in the insert row to use for building the fast path KV request. In addition, a new `DatumsFromConstraint` field is added, consisting of a ScalarListExpr of TupleExprs specifying the index key, which allows an index key column to be matched with multiple Datums. One insert row may result in more than one KV lookup for a given uniqueness constraint. These items are used to build the `InsertFastPathFKUniqCheck` structure in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built into a new the corresponding `FastPathUniqueChecksItem`s of a new `FastPathUniqueChecksExpr` and communicated to the caller via return value `newFastPathUniqueChecks`. A small adjustment is made in the coster to make the fast path unique constraint slightly cheaper, so it should always be chosen over the original non-fast path check. Epic: CRDB-26290 Fixes: #58047 Release note (performance improvement): This patch adds support for insert fast-path uniqueness checks on REGIONAL BY ROW tables where the source is a VALUES clause with a single row. This results in a reduction in latency for single-row inserts to REGIONAL BY ROW tables and hash-sharded REGIONAL BY ROW tables with unique indexes. ---- optbuilder: build insert fast-path uniq checks only if all checks can be built This commit avoids unnecessary processing of insert fast path uniqueness checks by modifying the optbuilder to avoid building a `FastPathUniqueChecksExpr` in the Insert expression if one or more of the required `FastPathUniqueChecksItem`s could not be built. Epic: CRDB-26290 Informs: #58047 Release note: None 112051: bulk: disable as-we-fill scatters for PCR r=stevendanna a=adityamaru In #111952 we saw log lines indicating that the SST batcher was attempting to scatter ranges that were bigger than our "max scatter size". This is likely because of the out-of-order nature of PCR's ingestion which means that the batcher is not guaranteed to have an empty RHS when it splits a soon-to-be overfull range. This change disable the as-we-fill scatters for PCR similar to how we disable them for restore. This is okay since the process already pre-splits and scatters the ranges into which the processors will ingest data. Informs: #111952 Release note: None Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: adityamaru <[email protected]>
This commit combines the foreign key checks of insert fast-path with the uniqueness checks into a single batch so that they may be processed in parallel for reduced latency. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
This commit adds a small benchmark for single-row insert fast path with a UNIQUE WITHOUT INDEX constraint. ``` BenchmarkUniqInsert/SingleRow BenchmarkUniqInsert/SingleRow/NoFastPath BenchmarkUniqInsert/SingleRow/NoFastPath-10 1623 634627 ns/op BenchmarkUniqInsert/SingleRow/FastPath BenchmarkUniqInsert/SingleRow/FastPath-10 3058 399215 ns/op ``` Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
This commit moves an insert fast path uniqueness check test case involving a volatile expression to a more applicable BuilderTest. Epic: CRDB-26290 Informs: cockroachdb#58047 Release note: None
110593: norm: push Select into Ordinality r=msirek a=msirek A join query with LIMIT 1 and a WHERE clause filter which could enable constrained scan on a table index may perform a full scan the table, and the optimizer may never consider query plans where the table is read via a constrained index scan. Similar queries which project more columns or expressions do consider constrained scans, because normalization of the LIMIT portion of the query is delayed in that case until the input to the LIMIT is a join expression. The issue is that normalization rules such as TryDecorrelateLimitOne inject an Ordinality operation above a join, which prevents rules which push a Select into a join from firing, such as PushSelectIntoJoinLeft, which matches on a Select as the immediate parent of a join. The expression tree may look like this: expression tree may look like this: ``` ├── select │ ├── columns: tr.id:12!null tr.trid:13!null tr.str16:15!null rownum:25!null │ ├── key: (25) │ ├── fd: ()-->(15), (12)-->(13), (25)-->(12,13) │ ├── ordinality │ │ ├── columns: tr.id:12!null tr.trid:13!null tr.str16:15 rownum:25!null │ │ ├── key: (25) │ │ ├── fd: (12)-->(13,15), (25)-->(12,13,15) │ │ └── project │ │ ├── columns: tr.id:12!null tr.trid:13!null tr.str16:15 │ │ ├── fd: (12)-->(13,15) │ │ └── left-join (hash) │ │ ├── columns: tr.id:12!null tr.trid:13!null tr.str16:15 trid:18 │ │ ├── multiplicity: left-rows(one-or-more), right-rows(zero-or-one) │ │ ├── fd: (12)-->(13,15) │ │ ├── scan trrec [as=tr] │ │ │ ├── columns: tr.id:12!null tr.trid:13!null tr.str16:15 │ │ │ ├── key: (12) │ │ │ └── fd: (12)-->(13,15) │ │ ├── project │ │ │ ├── columns: trid:18!null ... │ │ │ └── projections │ │└── tq.trid:8 [as=trid:18, outer=(8)] │ │ └── filters │ │ └── tr.id:12 = trid:18 [outer=(12,18), constraints=(/12: (/NULL - ]; /18: (/NULL - ]), fd=(12)==(18), (18)==(12)] │ └── filters │ └── tr.str16:15 = '12345' [outer=(15), constraints=(/15: [/'12345' - /'12345']; tight), fd=()-->(15)] ``` The `ordinality` expression, in this case, was created by normalization rule TryDecorrelateLimitOne to tag rows with a unique identifier so duplicates can be removed later. We should be able to push the `select` below this `ordinality`, because besides adding the unique identifier, it is just passing rows through to the next operation. The solution is to do just that, introduce a new normalization rule which pushes a Select operation below an Ordinality operation which was created for duplicate row removal if the selection filter applies to columns in the Ordinality expression's input. A new boolean flag is added to `OrdinalityPrivate` to distinguish an Ordinality built for duplicate row remove, where the values produced don't matter, as long as they're unique, from one built from WITH ORDINALITY clause, where the actual values do matter. Epic: none Fixes: #109751 Release note (bug fix): This patch fixes a performance issue in join queries with a LIMIT clause, where the optimizer may fail to push a WHERE clause filter into a join due to how the Limit operation is internally rewritten, causing a full scan of the table referenced in the filter. 111877: bazci,issues: allow posting GitHub issues from EngFlow r=rail,DarrylWong,srosenberg a=rickystewart To do this, we need to refactor out the TeamCity-specific logic. It all stays roughly in the same place but instead one must opt into either TC logic or EngFlow logic using either `TeamCityOptions` or `EngFlowOptions`. Epic: CRDB-8308 Release note: None Part of: DEVINF-871 111979: constraint: perform cancel checking when combining constraints r=yuzefovich,DrewKimball a=msirek Previously, the combining of constraint spans when one constraint is a suffix of the other could take a long time and cause the `statement_timeout` session setting to not be honored when each constraint has hundreds or thousands of spans. The issue is that `constraint.Combine` has double nested loops to consider every combination of one span from one constraint with one span of the other constraint. The building of possibly millions of spans may take excessive CPU time and allocate excessive amounts of memory. The fix is to maintain a counter in `constraint.Combine` and call the query cancel check function every 16 iterations. The cancel check function itself will check for query timeout every 1024 iterations, so effectively every 16K iterations `constraint.Combine` will perform cancel checking and abort the query if the timeout has been reached. Epic: none Fixes: #111862 Release note (bug fix): This patch fixes an issue where the optimizer fails to honor the `statement_timeout` session setting when generating constrained index scans for queries with large IN lists or `= ANY` predicates on multiple index key columns, which may lead to an out of memory condition on the node. 112059: sql: run insert fast-path FK checks in parallel with uniqueness checks r=yuzefovich,michae2 a=msirek This commit combines the foreign key checks of insert fast-path with the uniqueness checks into a single batch so that they may be processed in parallel for reduced latency. Epic: CRDB-26290 Informs: #58047 Release note: None ---- bench: insert fast path single row unique check benchmark This commit adds a small benchmark for single-row insert fast path with a UNIQUE WITHOUT INDEX constraint. ``` BenchmarkUniqInsert/SingleRow BenchmarkUniqInsert/SingleRow/NoFastPath BenchmarkUniqInsert/SingleRow/NoFastPath-10 1623 634627 ns/op BenchmarkUniqInsert/SingleRow/FastPath BenchmarkUniqInsert/SingleRow/FastPath-10 3058 399215 ns/op ``` Epic: CRDB-26290 Informs: #58047 Release note: None Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
112033: cli: remove redundant hex_ columns in debug.zip output r=rickystewart a=cucaroach In #80398 we fixed the CLI to respect bytea_output session variable and properly output BYTES columns in \xDEADBEEF format. Now we don't need the extra hex_ columns. This makes debug.zips smaller and quicker to create (jobs payloads can get big and so can descriptors for large tables). In order not to break cockroach debug decode-proto extend it to look for and remove the \x prefix when attempting to decode hex. Fixes: #112032 Epic: none Release note (cli change): debug zip's no longer include redundant hex_ columns for system table BYTES columns. 112040: sctest: Skip "crdb_internal" entry in comparator testing from logictest corpus r=Xiang-Gu a=Xiang-Gu Statements collected from `crdb_internal` is of little interest to us for the purpose of schema changer comparator testings, as it contains only `crdb_internal` functions. Furthermore, there are statements that will crash our framework, such as `SELECT crdb_internal.force_panic('foo')`. This commit therefore adds a check to skip it. This check will also home any future skipped entries for whatever reason. Epic: None Release note: None 112067: sql: fix minor bugs and add tests for procedures r=mgartner a=mgartner #### sql: add tests for procedures with anonymous arguments Release note: None Epic: CRDB-25388 #### sql: fix `CREATE OR REPLACE PROCEDURE` and add tests This commit fixes a minor bug with `CREATE OR REPLACE PROCEDURE` that allowed it to replace functions. The bug also allowed `CREATE OR REPLACE FUNCTION` to replace procedures. Tests for `CREATE OR REPLACE PROCEDURE` have also been added. Release note: None #### sql: fix routine-related error messages This commit fixes some minor bugs with the error message of routine-related statements. Release note: None 112074: optbuilder: move insert fast path test to optbuilder r=msirek a=msirek This commit moves an insert fast path uniqueness check test case involving a volatile expression to a more applicable BuilderTest. Epic: CRDB-26290 Informs: #58047 Release note: None 112086: serverutils: use options for SQL connections r=knz a=herkolategan Previously `ApplicationLayerInterface` contained multiple `SQLConn` method variants for passing connection parameters. Various tests throughout the code use `SQLConn` or `PGUrl` to connect, but the latter does not take into account virtual clusters unless informed. This PR is meant to enable `SQLConn` to become a standard replacement for `PGUrl` in order to leverage the probabilistic nature of the tests which select between running a second virtual cluster or not. Numerous tests still use `PGUrl` and pass user, password, or cert details. These tests will soon be updated to start using `SQLConn` and hence should have the same functionality available. This enables the test to not have to deal with querying its probabilistic mode to determine if it should pass a virtual cluster name to `PGUrl`. The `ApplicationLayerInterface` already handles this logic and tests should leverage this. See also: #111134 Epic: CRDB-31933 Release note: None 112096: optbuilder: move finishBuildLastStmt to routine.go r=mgartner a=mgartner When routine-related optbuilder logic was moved to a new file, `routine.go`, the `finishBuildLastStmt` function was missed. This commit corrects that mistake. Epic: CRDB-25388 Release note: None 112097: sql/logictest: add CALL statements to drop_procedure r=mgartner a=mgartner This commit improves the coverage of tests in `drop_procedure` by adding `CALL` statements to ensure that dropped procedures cannot be executed. Epic: CRDB-25388 Release note: None Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Mark Sirek <[email protected]> Co-authored-by: Herko Lategan <[email protected]>
It is currently not possible to use the insert fast path if any uniqueness check postqueries are required. Since some
REGIONAL BY ROW
tables will require uniqueness checks on every insert, it would be unfortunate if we cannot ever use the insert fast path for those tables. This issue covers the work to investigate whether it may be possible to support uniqueness checks with the insert fast path, and if so, implement the necessary changes.Epic CRDB-26290
Jira issue: CRDB-3441
The text was updated successfully, but these errors were encountered: