Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
107297: storage,kvserver: Foundational changes for disaggregated ingestions r=sumeerbhola a=itsbilal

This change contains two commits (split off from the original mega-PR, #105839). The first is a pkg/storage change to add new interface methods to call pebble's db.ScanInternal as well as implement related helper methods in sstable writers/batch readers/writers to be able to do disaggregated snapshot ingestion. The second is a kvserver/rditer change to allow finer-grained control on what replicated spans we iterate on, as well as to be able to specifically opt into skip-shared iteration over the user key span through the use of `ScanInternal`.

---

**storage: Update Engine/Reader/Writer interfaces for ScanInternal**

This change updates pkg/storage interfaces and implementations to allow
the use of ScanInternal in skip-shared iteration mode as well as
writing/reading of internal point keys, range dels and range keys.
Replication / snapshot code will soon rely on these changes to
be able to replicate internal keys in higher levels plus metadata
of shared sstables in lower levels, as opposed to just observed
user keys.

Part of #103028

Epic: none

Release note: None

**kvserver: Add ability to filter replicated spans in Select/Iterate**

This change adds the ability to select for just the replicated
span in rditer.Select and rditer.IterateReplicaKeySpans. Also
adds a new rditer.IterateReplicaKeySpansShared that does a
ScanInternal on just the user key span, to be able to collect
metadata of shared sstables as well as any internal keys above
them.

We only use skip-shared iteration for the replicated user key span
of a range, and in practice, only if it's a non-system range.

Part of #103028.

Epic: none

Release note: None

108336: sql: retry more distributed errors as local r=yuzefovich a=yuzefovich

This PR contains a couple of commits that increase the allow-list of errors that are retried locally. In particular, it allows us to hide some issues we have around using DistSQL and shutting down SQL pods.

Fixes: #106537.
Fixes: #108152.
Fixes: #108271.

Release note: None

108406: server,testutils: remove complexity r=yuzefovich,herkolategan a=knz

There is a saying (paraphrasing) that it always takes more work removing unwanted complexity than it takes to add it. This is an example of that.

Prior to this commit, there was an "interesting" propagation of the flag that decides whether or not to define a test tenant for test servers and clusters. In a nutshell, we had:

- an "input" flag in `base.TestServerArgs`, which remained mostly immutable
- a boolean decided once by `ShouldStartDefaultTestTenant()` either in:
  - `serverutils.StartServerOnlyE`
  - or `testcluster.Start`
- that boolean choice was then propagated to `server.testServer` via _another_ boolean config flag in `server.BaseConfig`
- both the 2nd boolean and the original input flag were then again checked when the time came to do the work (in `maybeStartDefaultTestTenant`).

Additional complexity was then incurred by the need of `TestCluster` to make the determination just once (and not once per server).

This commit cuts through all the layers of complexity by simply propagating the choice of `ShouldStartDefaultTestTenant()` back into the `TestServerArgs` and only ever reading from that subsequently.

Release note: None
Epic: CRDB-18499

108465: cloudccl: allow external connection tests to be run in parallel r=rhu713 a=rhu713

Currently external connection tests read and write to the same path in cloud storage. Add a random uint64 as part of the path so that test runs have unique paths and can be run in parallel.

Fixes: #107407

Release note: None

108481: acceptance: stabilize start-single-node in tcl test r=santamaura a=dhartunian

We've continued to see flakes on this test which contain messages of throttled stores on node startup. The hypothesis is that these are due to leftover data directories from prior startups during the same test.

This change clears the `logs/db` data directory for those invocations and also adds the sql memory flag which the common tcl function also uses.

Resolves #108405
Epic: None

Release note: None

108496: kv: unit test `PrepareTransactionForRetry` and `TransactionRefreshTimestamp` r=miraradeva a=nvanbenschoten

Informs #104233.

This commit adds a pair of new unit tests to verify the behavior of `PrepareTransactionForRetry` and `TransactionRefreshTimestamp`. These functions will be getting more complex for #104233, so it will be helpful to have these tests in place. The tests also serve as good documentation.

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
  • Loading branch information
7 people committed Aug 10, 2023
7 parents eff0185 + e714d52 + 5c09eb1 + 49374d4 + 38f086c + 64d2b3f + a4005e3 commit 1f8fa96
Show file tree
Hide file tree
Showing 76 changed files with 1,302 additions and 294 deletions.
3 changes: 0 additions & 3 deletions pkg/base/test_server_args.go
Original file line number Diff line number Diff line change
Expand Up @@ -281,9 +281,6 @@ var (

// InternalNonDefaultDecision is a sentinel value used inside a
// mechanism in serverutils. Should not be used by tests directly.
//
// TODO(#76378): Investigate how we can remove the need for this
// sentinel value.
InternalNonDefaultDecision = DefaultTestTenantOptions{testBehavior: ttDisabled, allowAdditionalTenants: true}
)

Expand Down
21 changes: 10 additions & 11 deletions pkg/ccl/backupccl/backup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,11 @@ func TestBackupRestorePartitioned(t *testing.T) {

// Disabled to run within tenant as certain MR features are not available to tenants.
args := base.TestClusterArgs{
ServerArgs: base.TestServerArgs{
DefaultTestTenant: base.TODOTestTenantDisabled,
},
ServerArgsPerNode: map[int]base.TestServerArgs{
0: {
DefaultTestTenant: base.TODOTestTenantDisabled,
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "region", Value: "west"},
// NB: This has the same value as an az in the east region
Expand All @@ -349,7 +351,6 @@ func TestBackupRestorePartitioned(t *testing.T) {
}},
},
1: {
DefaultTestTenant: base.TODOTestTenantDisabled,
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "region", Value: "east"},
// NB: This has the same value as an az in the west region
Expand All @@ -359,7 +360,6 @@ func TestBackupRestorePartitioned(t *testing.T) {
}},
},
2: {
DefaultTestTenant: base.TODOTestTenantDisabled,
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "region", Value: "east"},
{Key: "az", Value: "az2"},
Expand Down Expand Up @@ -490,34 +490,33 @@ func TestBackupRestoreExecLocality(t *testing.T) {

// Disabled to run within tenant as certain MR features are not available to tenants.
args := base.TestClusterArgs{
ServerArgs: base.TestServerArgs{
DefaultTestTenant: base.TODOTestTenantDisabled,
},
ServerArgsPerNode: map[int]base.TestServerArgs{
0: {
ExternalIODir: "/west0",
DefaultTestTenant: base.TODOTestTenantDisabled,
ExternalIODir: "/west0",
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "tier", Value: "0"},
{Key: "region", Value: "west"},
}},
},
1: {
ExternalIODir: "/west1",
DefaultTestTenant: base.TODOTestTenantDisabled,
ExternalIODir: "/west1",
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "tier", Value: "1"},
{Key: "region", Value: "west"},
}},
},
2: {
ExternalIODir: "/east0",
DefaultTestTenant: base.TODOTestTenantDisabled,
ExternalIODir: "/east0",
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "tier", Value: "0"},
{Key: "region", Value: "east"},
}},
},
3: {
ExternalIODir: "/east1",
DefaultTestTenant: base.TODOTestTenantDisabled,
ExternalIODir: "/east1",
Locality: roachpb.Locality{Tiers: []roachpb.Tier{
{Key: "tier", Value: "1"},
{Key: "region", Value: "east"},
Expand Down
26 changes: 17 additions & 9 deletions pkg/ccl/changefeedccl/changefeed_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5689,10 +5689,7 @@ func TestChangefeedHandlesRollingRestart(t *testing.T) {
nodeDrainChannels[n].Store(make(chan struct{}))

return base.TestServerArgs{
// Test uses SPLIT AT, which isn't currently supported for
// secondary tenants. Tracked with #76378.
DefaultTestTenant: base.TODOTestTenantDisabled,
UseDatabase: "test",
UseDatabase: "test",
Knobs: base.TestingKnobs{
DistSQL: &execinfra.TestingKnobs{
DrainFast: true,
Expand Down Expand Up @@ -5770,6 +5767,11 @@ func TestChangefeedHandlesRollingRestart(t *testing.T) {
}
return perNode
}(),
ServerArgs: base.TestServerArgs{
// Test uses SPLIT AT, which isn't currently supported for
// secondary tenants. Tracked with #76378.
DefaultTestTenant: base.TODOTestTenantDisabled,
},
})
defer tc.Stopper().Stop(context.Background())

Expand Down Expand Up @@ -5882,9 +5884,6 @@ func TestChangefeedPropagatesTerminalError(t *testing.T) {
perServerKnobs := make(map[int]base.TestServerArgs, numNodes)
for i := 0; i < numNodes; i++ {
perServerKnobs[i] = base.TestServerArgs{
// Test uses SPLIT AT, which isn't currently supported for
// secondary tenants. Tracked with #76378.
DefaultTestTenant: base.TODOTestTenantDisabled,
Knobs: base.TestingKnobs{
DistSQL: &execinfra.TestingKnobs{
DrainFast: true,
Expand All @@ -5901,6 +5900,11 @@ func TestChangefeedPropagatesTerminalError(t *testing.T) {
base.TestClusterArgs{
ServerArgsPerNode: perServerKnobs,
ReplicationMode: base.ReplicationManual,
ServerArgs: base.TestServerArgs{
// Test uses SPLIT AT, which isn't currently supported for
// secondary tenants. Tracked with #76378.
DefaultTestTenant: base.TODOTestTenantDisabled,
},
})
defer tc.Stopper().Stop(context.Background())

Expand Down Expand Up @@ -8241,13 +8245,17 @@ func TestChangefeedExecLocality(t *testing.T) {
str := strconv.Itoa

const nodes = 4
args := base.TestClusterArgs{ServerArgsPerNode: map[int]base.TestServerArgs{}}
args := base.TestClusterArgs{
ServerArgs: base.TestServerArgs{
DefaultTestTenant: base.TODOTestTenantDisabled, // need nodelocal and splits.
},
ServerArgsPerNode: map[int]base.TestServerArgs{},
}
for i := 0; i < nodes; i++ {
args.ServerArgsPerNode[i] = base.TestServerArgs{
ExternalIODir: path.Join(dir, str(i)),
Locality: roachpb.Locality{
Tiers: []roachpb.Tier{{Key: "x", Value: str(i / 2)}, {Key: "y", Value: str(i % 2)}}},
DefaultTestTenant: base.TODOTestTenantDisabled, // need nodelocal and splits.
}
}

Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/cloudccl/amazon/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ go_test(
"//pkg/cloud",
"//pkg/cloud/amazon",
"//pkg/cloud/cloudpb",
"//pkg/cloud/cloudtestutils",
"//pkg/cloud/externalconn/providers",
"//pkg/security/securityassets",
"//pkg/security/securitytest",
Expand Down
21 changes: 13 additions & 8 deletions pkg/ccl/cloudccl/amazon/s3_connection_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/cloud"
"github.com/cockroachdb/cockroach/pkg/cloud/amazon"
"github.com/cockroachdb/cockroach/pkg/cloud/cloudpb"
"github.com/cockroachdb/cockroach/pkg/cloud/cloudtestutils"
_ "github.com/cockroachdb/cockroach/pkg/cloud/externalconn/providers" // import External Connection providers.
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
Expand Down Expand Up @@ -77,6 +78,8 @@ func TestS3ExternalConnection(t *testing.T) {
skip.IgnoreLint(t, "AWS_S3_BUCKET env var must be set")
}

testID := cloudtestutils.NewTestID()

t.Run("auth-implicit", func(t *testing.T) {
// You can create an IAM that can access S3
// in the AWS console, then set it up locally.
Expand All @@ -93,14 +96,14 @@ func TestS3ExternalConnection(t *testing.T) {
params := make(url.Values)
params.Add(cloud.AuthParam, cloud.AuthParamImplicit)

s3URI := fmt.Sprintf("s3://%s/backup-ec-test-default?%s", bucket, params.Encode())
s3URI := fmt.Sprintf("s3://%s/backup-ec-test-default-%d?%s", bucket, testID, params.Encode())
ecName := "auth-implicit-s3"
createExternalConnection(ecName, s3URI)
backupAndRestoreFromExternalConnection(ecName)
})

t.Run("auth-specified", func(t *testing.T) {
s3URI := amazon.S3URI(bucket, "backup-ec-test-default",
s3URI := amazon.S3URI(bucket, fmt.Sprintf("backup-ec-test-default-%d", testID),
&cloudpb.ExternalStorage_S3{
AccessKey: creds.AccessKeyID,
Secret: creds.SecretAccessKey,
Expand All @@ -126,7 +129,7 @@ func TestS3ExternalConnection(t *testing.T) {
"refer to https://docs.aws.com/cli/latest/userguide/cli-configure-role.html: %s", err)
}

s3URI := amazon.S3URI(bucket, "backup-ec-test-sse-256", &cloudpb.ExternalStorage_S3{
s3URI := amazon.S3URI(bucket, fmt.Sprintf("backup-ec-test-sse-256-%d", testID), &cloudpb.ExternalStorage_S3{
Region: "us-east-1",
Auth: cloud.AuthParamImplicit,
ServerEncMode: "AES256",
Expand All @@ -139,7 +142,7 @@ func TestS3ExternalConnection(t *testing.T) {
if v == "" {
skip.IgnoreLint(t, "AWS_KMS_KEY_ARN env var must be set")
}
s3KMSURI := amazon.S3URI(bucket, "backup-ec-test-sse-kms", &cloudpb.ExternalStorage_S3{
s3KMSURI := amazon.S3URI(bucket, fmt.Sprintf("backup-ec-test-sse-kms-%d", testID), &cloudpb.ExternalStorage_S3{
Region: "us-east-1",
Auth: cloud.AuthParamImplicit,
ServerEncMode: "aws:kms",
Expand All @@ -163,7 +166,7 @@ func TestS3ExternalConnection(t *testing.T) {
}

// Unsupported server side encryption option.
invalidS3URI := amazon.S3URI(bucket, "backup-ec-test-sse-256", &cloudpb.ExternalStorage_S3{
invalidS3URI := amazon.S3URI(bucket, fmt.Sprintf("backup-ec-test-sse-256-%d", testID), &cloudpb.ExternalStorage_S3{
Region: "us-east-1",
Auth: cloud.AuthParamImplicit,
ServerEncMode: "unsupported-algorithm",
Expand All @@ -172,7 +175,7 @@ func TestS3ExternalConnection(t *testing.T) {
"unsupported server encryption mode unsupported-algorithm. Supported values are `aws:kms` and `AES256",
fmt.Sprintf(`BACKUP DATABASE foo INTO '%s'`, invalidS3URI))

invalidS3URI = amazon.S3URI(bucket, "backup-ec-test-sse-256", &cloudpb.ExternalStorage_S3{
invalidS3URI = amazon.S3URI(bucket, fmt.Sprintf("backup-ec-test-sse-256-%d", testID), &cloudpb.ExternalStorage_S3{
Region: "us-east-1",
Auth: cloud.AuthParamImplicit,
ServerEncMode: "aws:kms",
Expand Down Expand Up @@ -257,8 +260,9 @@ func TestAWSKMSExternalConnection(t *testing.T) {
skip.IgnoreLint(t, "AWS_S3_BUCKET env var must be set")
}

testID := cloudtestutils.NewTestID()
// Create an external connection where we will write the backup.
backupURI := fmt.Sprintf("s3://%s/backup?%s=%s", bucket,
backupURI := fmt.Sprintf("s3://%s/backup-%d?%s=%s", bucket, testID,
cloud.AuthParam, cloud.AuthParamImplicit)
backupExternalConnectionName := "backup"
createExternalConnection(backupExternalConnectionName, backupURI)
Expand Down Expand Up @@ -372,8 +376,9 @@ func TestAWSKMSExternalConnectionAssumeRole(t *testing.T) {
skip.IgnoreLint(t, "AWS_S3_BUCKET env var must be set")
}

testID := cloudtestutils.NewTestID()
// Create an external connection where we will write the backup.
backupURI := fmt.Sprintf("s3://%s/backup?%s=%s", bucket,
backupURI := fmt.Sprintf("s3://%s/backup-%d?%s=%s", bucket, testID,
cloud.AuthParam, cloud.AuthParamImplicit)
backupExternalConnectionName := "backup"
createExternalConnection(backupExternalConnectionName, backupURI)
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/cloudccl/azure/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ go_test(
"//pkg/ccl",
"//pkg/ccl/kvccl/kvtenantccl",
"//pkg/cloud/azure",
"//pkg/cloud/cloudtestutils",
"//pkg/cloud/externalconn/providers",
"//pkg/security/securityassets",
"//pkg/security/securitytest",
Expand Down
10 changes: 6 additions & 4 deletions pkg/ccl/cloudccl/azure/azure_connection_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/base"
_ "github.com/cockroachdb/cockroach/pkg/ccl"
"github.com/cockroachdb/cockroach/pkg/cloud/azure"
"github.com/cockroachdb/cockroach/pkg/cloud/cloudtestutils"
_ "github.com/cockroachdb/cockroach/pkg/cloud/externalconn/providers" // import External Connection providers.
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
Expand All @@ -29,9 +30,9 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/log"
)

func (a azureConfig) URI(file string) string {
return fmt.Sprintf("azure-storage://%s/%s?%s=%s&%s=%s&%s=%s",
a.bucket, file,
func (a azureConfig) URI(file string, testID uint64) string {
return fmt.Sprintf("azure-storage://%s/%s-%d?%s=%s&%s=%s&%s=%s",
a.bucket, file, testID,
azure.AzureAccountKeyParam, url.QueryEscape(a.key),
azure.AzureAccountNameParam, url.QueryEscape(a.account),
azure.AzureEnvironmentKeyParam, url.QueryEscape(a.environment))
Expand Down Expand Up @@ -97,7 +98,8 @@ func TestExternalConnections(t *testing.T) {
return
}

testID := cloudtestutils.NewTestID()
ecName := "azure-ec"
createExternalConnection(ecName, cfg.URI("backup-ec"))
createExternalConnection(ecName, cfg.URI("backup-ec", testID))
backupAndRestoreFromExternalConnection(ecName)
}
Loading

0 comments on commit 1f8fa96

Please sign in to comment.