Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
96424: kvserver: factor out replica loading phase r=tbg a=pavelkalinnikov

This commit factors out Replica state loading and invariant checking so that it
is more consolidated rather than interleaved into Replica creation. Replica
creation path was loading this state, to post-factum assert that the in-memory
state matches the state in storage. Now this is a pre-requisite and input into
creating the Replica, and turning it from uninitialized to initialized state.

This change also eliminates the concept of "unloaded" Replica. Now the Replica
is created in a valid state (either initialized or uninitialized).

Touches #93898

96553: rpc,server: use node certs for shared-process tenant RPCs r=stevendanna a=knz

TLDR: This PR makes it possible to use regular node certs with shared-process multi-tenancy (i.e. skip creating client tenant certs). I think this will help `@aadityasondhi` 's work on `debug zip`.

Fixes #96215.
Epic: CRDB-23559

The PR also contains a few cleanup commits; it's a short change away from fixing  #87141 and #77173, which I intend to do in a separate PR.




96565: kv/logstore: avoid heap allocations around non-blocking sync waiter callback r=nvanbenschoten a=nvanbenschoten

This commit structures the non-blocking sync callback provided to the Raft log `SyncWaiterLoop` as a struct with a method that satisfies an interface (i.e. a functor) instead of an anonymous function. This provides more control over the memory layout of the callback and prevents individual fields from escaping to the heap. The change also provides the opportunity to pool the callback to avoid an additional heap allocation.

The change has the following effect on microbenchmarks:
```
name                                                 old time/op    new time/op    delta
ReplicaProposal/bytes=1.0_KiB,withFollower=false-10    40.1µs ± 3%    39.1µs ± 1%   -2.51%  (p=0.000 n=10+9)
ReplicaProposal/bytes=512_B,withFollower=false-10      38.2µs ± 2%    37.3µs ± 4%   -2.48%  (p=0.015 n=10+10)
ReplicaProposal/bytes=256_B,withFollower=false-10      37.1µs ± 1%    36.2µs ± 3%   -2.32%  (p=0.000 n=10+10)
ReplicaProposal/bytes=256_B,withFollower=true-10       52.2µs ± 1%    51.2µs ± 1%   -1.91%  (p=0.000 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=true-10     58.5µs ± 2%    57.5µs ± 2%   -1.79%  (p=0.001 n=10+10)
ReplicaProposal/bytes=512_B,withFollower=true-10       53.8µs ± 2%    52.8µs ± 1%   -1.74%  (p=0.000 n=10+10)

name                                                 old speed      new speed      delta
ReplicaProposal/bytes=512_B,withFollower=false-10    13.4MB/s ± 2%  13.7MB/s ± 4%   +2.57%  (p=0.016 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=false-10  25.5MB/s ± 2%  26.2MB/s ± 1%   +2.57%  (p=0.000 n=10+9)
ReplicaProposal/bytes=256_B,withFollower=false-10    6.91MB/s ± 1%  7.07MB/s ± 3%   +2.36%  (p=0.000 n=10+10)
ReplicaProposal/bytes=256_B,withFollower=true-10     4.90MB/s ± 1%  5.00MB/s ± 1%   +1.96%  (p=0.000 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=true-10   17.5MB/s ± 2%  17.8MB/s ± 1%   +1.82%  (p=0.001 n=10+10)
ReplicaProposal/bytes=512_B,withFollower=true-10     9.52MB/s ± 2%  9.69MB/s ± 1%   +1.76%  (p=0.000 n=10+10)

name                                                 old alloc/op   new alloc/op   delta
ReplicaProposal/bytes=256_B,withFollower=false-10      14.6kB ± 0%    12.8kB ± 0%  -12.73%  (p=0.000 n=10+10)
ReplicaProposal/bytes=256_B,withFollower=true-10       35.0kB ± 1%    30.6kB ± 1%  -12.59%  (p=0.000 n=10+10)
ReplicaProposal/bytes=512_B,withFollower=true-10       42.9kB ± 0%    38.6kB ± 1%  -10.04%  (p=0.000 n=8+10)
ReplicaProposal/bytes=512_B,withFollower=false-10      18.5kB ± 2%    16.8kB ± 1%   -9.19%  (p=0.000 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=true-10     60.6kB ± 1%    55.9kB ± 1%   -7.76%  (p=0.000 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=false-10    27.5kB ± 2%    25.6kB ± 2%   -7.06%  (p=0.000 n=10+10)

name                                                 old allocs/op  new allocs/op  delta
ReplicaProposal/bytes=512_B,withFollower=false-10        70.0 ± 0%      61.6 ± 1%  -12.00%  (p=0.000 n=10+10)
ReplicaProposal/bytes=256_B,withFollower=false-10        69.0 ± 0%      61.0 ± 0%  -11.59%  (p=0.000 n=10+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=false-10      73.0 ± 0%      65.0 ± 0%  -10.96%  (p=0.002 n=8+10)
ReplicaProposal/bytes=256_B,withFollower=true-10          179 ± 0%       161 ± 0%  -10.21%  (p=0.000 n=10+7)
ReplicaProposal/bytes=512_B,withFollower=true-10          181 ± 1%       162 ± 0%  -10.11%  (p=0.000 n=9+10)
ReplicaProposal/bytes=1.0_KiB,withFollower=true-10        186 ± 0%       168 ± 0%   -9.84%  (p=0.000 n=9+10)
```

Release note: None
Epic: None

96679: backupccl: only backup active tenants r=dt a=dt

Release note: none.
Epic: none.
Fixes: #89997.

Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: David Taylor <[email protected]>
  • Loading branch information
5 people committed Feb 7, 2023
5 parents 9444613 + 574a4d7 + fb76cc9 + 894216b + 36b4329 commit 75eb88d
Show file tree
Hide file tree
Showing 33 changed files with 717 additions and 383 deletions.
11 changes: 6 additions & 5 deletions pkg/ccl/backupccl/backup_planning_tenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -156,21 +156,22 @@ func retrieveAllTenantsMetadata(
) ([]mtinfopb.TenantInfoWithUsage, error) {
rows, err := txn.QueryBuffered(
ctx, "backupccl.retrieveAllTenantsMetadata", txn.KV(),
// TODO(?): Should we add a `WHERE active`? We require the tenant to be active
// when it is specified.
// See: https://github.com/cockroachdb/cockroach/issues/89997
tenantMetadataQuery(ctx, settings)+` WHERE id != $1`,
roachpb.SystemTenantID.ToUint64(),
)
if err != nil {
return nil, err
}
res := make([]mtinfopb.TenantInfoWithUsage, len(rows))
res := make([]mtinfopb.TenantInfoWithUsage, 0, len(rows))
for i := range rows {
res[i], err = tenantMetadataFromRow(rows[i])
r, err := tenantMetadataFromRow(rows[i])
if err != nil {
return nil, err
}
if r.DataState != mtinfopb.DataStateReady {
continue
}
res = append(res, r)
}
return res, nil
}
3 changes: 0 additions & 3 deletions pkg/ccl/backupccl/testdata/backup-restore/restore-tenants
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ query-sql
SELECT id,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
5 false {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "DROP", "deprecatedId": "5", "droppedName": "tenant-5", "tenantReplicationJobId": "0"}
6 false {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "ADD", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

exec-sql
Expand All @@ -82,7 +81,6 @@ query-sql
SELECT id,name,data_state,service_mode,active,crdb_internal.pb_to_json('cockroach.multitenant.ProtoInfo', info, true) FROM system.tenants;
----
1 system 1 2 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
5 tenant-5 2 0 false {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "DROP", "deprecatedId": "5", "droppedName": "tenant-5", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

exec-sql expect-error-regex=(tenant 6 already exists)
Expand All @@ -109,7 +107,6 @@ SELECT id,name,data_state,service_mode,active,crdb_internal.pb_to_json('cockroac
----
1 system 1 2 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "1", "droppedName": "", "tenantReplicationJobId": "0"}
2 newname 1 1 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "2", "droppedName": "", "tenantReplicationJobId": "0"}
5 tenant-5 2 0 false {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "DROP", "deprecatedId": "5", "droppedName": "tenant-5", "tenantReplicationJobId": "0"}
6 tenant-6 1 1 true {"capabilities": {"canAdminSplit": false}, "deprecatedDataState": "READY", "deprecatedId": "6", "droppedName": "", "tenantReplicationJobId": "0"}

# Check that another service mode is also preserved.
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/clierrorplus/decorate_error.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ func MaybeDecorateError(
return connInsecureHint()
}

if wErr := (*security.Error)(nil); errors.As(err, &wErr) {
if errors.Is(err, security.ErrCertManagement) {
// Avoid errors.Wrapf here so that we have more control over the
// formatting of the message with error text.
const format = "cannot load certificates.\n" +
Expand Down
58 changes: 31 additions & 27 deletions pkg/cli/democluster/demo_cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -1303,8 +1303,8 @@ func (c *transientCluster) generateCerts(ctx context.Context, certsDir string) (
nodeKeyExists && nodeCertExists &&
rootClientKeyExists && rootClientCertExists &&
demoClientKeyExists && demoClientCertExists &&
tenantSigningKeyExists && tenantSigningCertExists &&
tenantKeyExists && tenantCertExists {
(!c.demoCtx.Multitenant || (tenantSigningKeyExists && tenantSigningCertExists &&
(c.demoCtx.DisableServerController || (tenantKeyExists && tenantCertExists)))) {
// All good.
return nil
}
Expand Down Expand Up @@ -1430,33 +1430,37 @@ func (c *transientCluster) generateCerts(ctx context.Context, certsDir string) (
}
}

if !(tenantKeyExists && tenantCertExists) {
c.infoLog(ctx, "generating tenant server key/cert pair in %q", certsDir)
pair, err := security.CreateTenantPair(
certsDir,
caKeyPath,
c.demoCtx.DefaultKeySize,
c.demoCtx.DefaultCertLifetime,
2,
tlsServerNames,
)
if err != nil {
return err
}
if err := security.WriteTenantPair(certsDir, pair, true /* overwrite */); err != nil {
return err
if c.demoCtx.Multitenant {
if !(tenantSigningKeyExists && tenantSigningCertExists) {
c.infoLog(ctx, "generating tenant signing key/cert pair in %q", certsDir)
if err := security.CreateTenantSigningPair(
certsDir,
c.demoCtx.DefaultCertLifetime,
true, /* overwrite */
2,
); err != nil {
return err
}
}
}

if !(tenantSigningKeyExists && tenantSigningCertExists) {
c.infoLog(ctx, "generating tenant signing key/cert pair in %q", certsDir)
if err := security.CreateTenantSigningPair(
certsDir,
c.demoCtx.DefaultCertLifetime,
true, /* overwrite */
2,
); err != nil {
return err
if c.demoCtx.DisableServerController {
if !(tenantKeyExists && tenantCertExists) {
c.infoLog(ctx, "generating tenant server key/cert pair in %q", certsDir)
pair, err := security.CreateTenantPair(
certsDir,
caKeyPath,
c.demoCtx.DefaultKeySize,
c.demoCtx.DefaultCertLifetime,
2,
tlsServerNames,
)
if err != nil {
return err
}
if err := security.WriteTenantPair(certsDir, pair, true /* overwrite */); err != nil {
return err
}
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ eexpect $prompt

# create some cert without an IP address in there.
set db_dir "logs/db"
set certs_dir "logs/my-safe-directory"
set certs_dir "my-safe-directory"
send "mkdir -p $certs_dir\r"
eexpect $prompt

Expand Down
19 changes: 17 additions & 2 deletions pkg/cli/interactive_tests/test_error_hints.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ end_test
# Check what happens when attempting to connect securely to an
# insecure server.

send "$argv start-single-node --insecure\r"
send "$argv start-single-node --insecure --store=logs/db\r"
eexpect "initialized new cluster"

spawn /bin/bash
Expand Down Expand Up @@ -73,9 +73,24 @@ interrupt
interrupt
eexpect ":/# "

send "$argv start-single-node --listen-addr=localhost --certs-dir=$certs_dir\r"
send "$argv start-single-node --listen-addr=localhost --certs-dir=$certs_dir --store=logs/db\r"
eexpect "restarted pre-existing node"

set spawn_id $client_spawn_id
start_test "Connecting an insecure RPC client to a secure server"
send "$argv node drain 1 --insecure\r"
eexpect "ERROR"
eexpect "failed to connect to the node"
eexpect ":/# "
end_test

start_test "Connecting an insecure SQL client to a secure server"
send "$argv sql -e 'select 1' --insecure\r"
eexpect "ERROR: node is running secure mode, SSL connection required"
eexpect ":/# "
end_test


# Check what happens when attempting to connect to something
# that is not a CockroachDB server.
set spawn_id $shell_spawn_id
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/testdata/zip/testzip
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ debug zip --concurrency=1 --cpu-profile-duration=1s /dev/null
[node 1] requesting heap file list... received response... done
[node ?] ? heap profiles found
[node 1] requesting goroutine dump list... received response... done
[node 1] 0 goroutine dumps found
[node ?] ? goroutine dumps found
[node 1] requesting log file ...
[node 1] 0 log file ...
[node 1] requesting ranges... received response... done
Expand Down
Loading

0 comments on commit 75eb88d

Please sign in to comment.