Merge #110870 #111447 #111798 #111822 #112051 · cockroachdb/cockroach@69731b1

Commit

Merge #110870 #111447 #111798 #111822 #112051

110870: sql: anonymize function names for error reporting and telemetry r=rharding6373 a=rharding6373

Previously, we did not anonymize/redact function names because we only supported built-in, CRDB-named functions. However, since adding support for UDFs, function names may be user-provided. We need to omit user-provided names in error reporting and telemetry.

We fix this oversight in this PR by no longer overriding the anonymizing format flag and redaction flag for FuncExprs. Unfortunately, since any functions in a query are not always resolved when the error occurs, we may also redact built-in function names.

Epic: None
Fixes: None

Release note (sql change): Anonymizes function names in error reporting and telemetry so that user-provided names for UDFs are not included.

111447: kvcoord: provide local node ID to DistSender in shared-process mode r=yuzefovich a=yuzefovich

This commit makes it so that the DistSender of SQL pods running in shared-process mode have access to the node ID of the local KV node. The node ID is used to preferentially route requests to replicas available on the local node. Previously, this optimization was only available in single-tenant mode, and now it'll be also available in shared-process multi-tenant. The optimization doesn't exist in separate-process multi-tenant, so in that mode we continue to return `0` as the node ID which makes it so that replicas are picked randomly.

For testing, this commit extends existing `TestSecondaryTenantFollowerReadsRouting` to also have a shared-process test configuration. There were a few gotchas in that adjustment:
- for simplicity, we need to create a single region cluster so that latency and locality don't come into consideration when choosing the replica (we want to make sure that the local replica is used if available)
- starting shared-process virtual cluster on a multi node cluster can be racy if we want to provide testing knobs, so we have to start the service on the gateway node first. (If we don't start with the gateway, then the server controller on the gateway could start the tenant server with empty knobs _before_ we explicitly start the service there.)
- previously the test relied on retrieving the follower reads metric counters from all nodes and asserting only for `n2` it increased. However, in shared-process config it is now possible for an internal query (e.g. issued by auto stats collection) to be served via follower reads too, which would increment the counter confusing the test. As a result, this commit adjusts the test to audit the trace of the query and verify that the single follower read was served specifically by `n2`.

Fixes: #109522.

Release note: None

111798: cli,server: allow user to specify port range for secondary VCs r=knz a=stevendanna

Previously, the user could set a port range offset that would instruct
the tenant controller to choose a gRPC port starting at the KV
server's RPC port + the offset + an index.

While this provides the user some predictability, there is no upper
bound on the port that could be picked.

Here, we add a new option

   --secondary-tenant-port-range

That allows the user to specify an explicit range.

Fixes #111848.
Epic: CRDB-26691

Release note: None

111822: sql: fast path uniqueness checks for single-row insert r=mgartner a=msirek

This adds support for building and executing simple INSERT statements
with a single-row VALUES clause where any required uniqueness constraint
checks can be handled via a constrained scan on an index.

This includes INSERT cases such as:

- a single-row VALUES clause into a REGIONAL BY ROW table with a
PRIMARY KEY which has a UUID column generated by default, ie.
`id UUID PRIMARY KEY DEFAULT gen_random_uuid()`, where the
crdb_region column is not specified in the VALUES clause; either
the gateway region is used or it is computed based on other column
values.
- a single-row VALUES clause into a REGIONAL BY ROW table with a
hash sharded unique index where the crdb_region column is not specified
in the VALUES clause

In optbuild, when creating a uniqueness check for rows which are added
to a table, a fast path index check relation is also built when the
mutation source is a single-row values expression or WithScan from
a single-row values expression. That relation is a filtered
Select of a Scan from the target table, where the filters equate
all of the unique check columns with their corresponding
constants or placeholders from the Values expression. If there is a
uniqueness check with a partial index predicate, fast path is
disallowed.

A new exploration rule called InsertFastPath is added to walk the memo
group members created during exploration in `FastPathUniqueChecks` of
the `InsertExpr`, to find any which have been rewritten as a constrained
`ScanExpr`. If found, that means that Scan fully represents the lookup
needed to check for duplicate entries and the Scan constraint can be
used to identify the constants to use in a KV lookup on the scanned
index in a fast path check.

Function CanUseUniqueChecksForInsertFastPath walks the expressions
generated during exploration of the `FastPathUniqueChecks.Check`
relation.  If a constrained scan is found, it is used to build elements
of the `FastPathUniqueChecksItemPrivate` structure to communicate to the
execbuilder the table and index to use for the check, and the column ids
in the insert row to use for building the fast path KV request. In
addition, a new `DatumsFromConstraint` field is added, consisting of a
ScalarListExpr of TupleExprs specifying the index key, which allows an
index key column to be matched with multiple Datums. One insert row may
result in more than one KV lookup for a given uniqueness constraint.
These items are used to build the `InsertFastPathFKUniqCheck` structure
in the execbuilder. The new `FastPathUniqueChecksItemPrivate` is built
into a new the corresponding `FastPathUniqueChecksItem`s of a new
`FastPathUniqueChecksExpr` and communicated to the caller via return
value `newFastPathUniqueChecks`.

A small adjustment is made in the coster to make the fast path unique
constraint slightly cheaper, so it should always be chosen over the
original non-fast path check.

Epic: CRDB-26290
Fixes: #58047

Release note (performance improvement): This patch adds support for
insert fast-path uniqueness checks on REGIONAL BY ROW tables where
the source is a VALUES clause with a single row. This results in a
reduction in latency for single-row inserts to REGIONAL BY ROW tables
and hash-sharded REGIONAL BY ROW tables with unique indexes.

----
optbuilder: build insert fast-path uniq checks only if all checks can be built

This commit avoids unnecessary processing of insert fast path uniqueness
checks by modifying the optbuilder to avoid building a
`FastPathUniqueChecksExpr` in the Insert expression if one or more
of the required `FastPathUniqueChecksItem`s could not be built.

Epic: CRDB-26290
Informs: #58047

Release note: None

112051: bulk: disable as-we-fill scatters for PCR r=stevendanna a=adityamaru

In #111952 we saw log lines indicating that the SST batcher was attempting to scatter ranges that were bigger than our "max scatter size". This is likely because of the out-of-order nature of PCR's ingestion which means that the batcher is not guaranteed to have an empty RHS when it splits a soon-to-be overfull range.

This change disable the as-we-fill scatters for PCR similar to how we disable them for restore. This is okay since the process already pre-splits and scatters the ranges into which the processors will ingest data.

Informs: #111952
Release note: None

Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: adityamaru <[email protected]>

Loading branch information

6 people committed Oct 9, 2023

6 parents f840e7f + a9af2c7 + c5d847f + 395f018 + 2fefefc + 9680b3e commit 69731b1

pkg/base/addr_validation.go

-Original file line number
+Diff line change
@@ Expand Up @@
     		return invalidFlagErr(err, cliflags.ListenHTTPAddr)
     	}
     	cfg.HTTPAddr = net.JoinHostPort(httpHost, httpPort)
+    	// Validate secondary tenant port configuration.
+    	if cfg.SecondaryTenantPortOffset > 0 && cfg.ApplicationInternalRPCPortMin > 0 {
+    		return errors.Newf("cannot specify both %s and %s", cliflags.ApplicationInternalRPCPortRange.Name, cliflags.SecondaryTenantPortOffset.Name)
+    	}
     	return nil
     }
@@ Expand Down @@

pkg/base/config.go

-Original file line number
+Diff line change
@@ Expand Up / @@ -422,6 +422,13 @@ type Config struct { @@
     	// See: https://github.com/cockroachdb/cockroach/issues/84585
     	SecondaryTenantPortOffset int
+    	// ApplicationInternalRPCPortMin/PortMax define the range of TCP ports
+    	// used to start the internal RPC service for application-level
+    	// servers. This service is used for node-to-node RPC traffic and to
+    	// serve data for 'debug zip'.
+    	ApplicationInternalRPCPortMin int
+    	ApplicationInternalRPCPortMax int
     	// Enables the use of an PTP hardware clock user space API for HLC current time.
     	// This contains the path to the device to be used (i.e. /dev/ptp0)
     	ClockDevicePath string
@@ Expand Down Expand Up / @@ -495,6 +502,8 @@ func (cfg *Config) InitDefaults() { @@
     	cfg.ClockDevicePath = ""
     	cfg.AcceptSQLWithoutTLS = false
     	cfg.SecondaryTenantPortOffset = 0
+    	cfg.ApplicationInternalRPCPortMin = 0
+    	cfg.ApplicationInternalRPCPortMax = 0
     }
     // HTTPRequestScheme returns "http" or "https" based on the value of
@@ Expand Down @@

pkg/base/test_server_args.go

-Original file line number
+Diff line change
@@ Expand Up / @@ -61,10 +61,12 @@ type TestServerArgs struct { @@
     	// DisableTLSForHTTP if set, disables TLS for the HTTP interface.
     	DisableTLSForHTTP bool
-    	// SecondaryTenantPortOffset if non-zero forces the network addresses
-    	// generated for servers started by the serverController to be offset
-    	// from the base addressed by the specified amount.
-    	SecondaryTenantPortOffset int
+    	// ApplicationInternalRPCPortMin/PortMax define the range of TCP ports
+    	// used to start the internal RPC service for application-level
+    	// servers. This service is used for node-to-node RPC traffic and to
+    	// serve data for 'debug zip'.
+    	ApplicationInternalRPCPortMin int
+    	ApplicationInternalRPCPortMax int
     	// JoinAddr is the address of a node we are joining.
     	//
@@ Expand Down @@

pkg/ccl/kvccl/kvfollowerreadsccl/BUILD.bazel

-Original file line number
+Diff line change
@@ Expand Up / @@ -48,7 +48,6 @@ go_test( @@
         deps = [
             "//pkg/base",
             "//pkg/ccl",
-            "//pkg/ccl/kvccl/kvtenantccl",
             "//pkg/ccl/utilccl",
             "//pkg/keys",
             "//pkg/kv",
@@ Expand Down @@

0 comments on commit `69731b1`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `69731b1`

Commit

There are no files selected for viewing

0 comments on commit 69731b1

0 comments on commit `69731b1`