Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
99430: roachprod: ensure scp uses supported flags r=AlexTalks a=AlexTalks

In #99283 the flags `-R -A` were added to the `scp` command invocation, in order to ensure that copies between two remote hosts are done without transferring to the localhost. These flags are only supported on MacOS (`"darwin"`) and not on Linux, as this behavior is already the default on Linux. This change fixes that issue by only using these flags on MacOS, allowing the `scp` invocation (and by extension, roachprod) to work on Linux.

Epic: none

Release note: None

99982: clusterversion: clarify where to add new version gates during stability r=rail a=celiala

The previous direction said to always put append version gates to the end
of the list. However, during the stability period, before we mint the
release branch, there are two places to add new gates:

- new gates should go _before_ the "mint" version, if it is being backported
  to the release branch and we have not yet merged the "mint" version to
  the release branch.
- new gates for the upcoming development, should be appended to the end
  of the list.

This direction should be reverted/simplified (to direct to always append) once the commit to "mint" 23.1 has been backported and merged to release-23.1. Tracking issue for this is: https://cockroachlabs.atlassian.net/browse/REL-311

Release note: None
Epic: REL-283

100378: kvserver: fix merge queue test failure due to race r=AlexTalks a=AlexTalks

Previously, we saw `TestMergeQueueDoesNotInterruptReplicationChange` fail on arm64 machines (#99349), which, after testing, was determined to be due to timing issues in the test. As such, this change modifies the test so as to not assume that the snapshot will start within the 100ms time frame used as a buffer, and will instead wait for the snapshot to start before attempting the merge.

Fixes: #99349

Release note: None

Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: Celia La <[email protected]>
  • Loading branch information
3 people committed Apr 3, 2023
4 parents 6cd1f1b + 2785011 + bc36d7f + 43b9a39 commit 8e9273c
Show file tree
Hide file tree
Showing 10 changed files with 77 additions and 34 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -291,4 +291,4 @@ trace.opentelemetry.collector string address of an OpenTelemetry trace collecto
trace.snapshot.rate duration 0s if non-zero, interval at which background trace snapshots are captured tenant-rw
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez tenant-rw
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used. tenant-rw
version version 1000023.1-4 set the active cluster version in the format '<major>.<minor>' tenant-rw
version version 1000023.1-2 set the active cluster version in the format '<major>.<minor>' tenant-rw
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,6 @@
<tr><td><div id="setting-trace-snapshot-rate" class="anchored"><code>trace.snapshot.rate</code></div></td><td>duration</td><td><code>0s</code></td><td>if non-zero, interval at which background trace snapshots are captured</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://&lt;ui&gt;/#/debug/tracez</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-4</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000023.1-2</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
</tbody>
</table>
2 changes: 1 addition & 1 deletion pkg/cli/testdata/declarative-rules/deprules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
dep
----
debug declarative-print-rules 1000023.1-4 dep
debug declarative-print-rules 1000023.1-2 dep
deprules
----
- name: 'CheckConstraint transitions to ABSENT uphold 2-version invariant: PUBLIC->VALIDATED'
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/testdata/declarative-rules/oprules
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
op
----
debug declarative-print-rules 1000023.1-4 op
debug declarative-print-rules 1000023.1-2 op
rules
----
[]
49 changes: 37 additions & 12 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -508,23 +508,35 @@ const (
// that are optimized for the console.
V23_1AddSystemActivityTables

// V23_1StopWritingPayloadAndProgressToSystemJobs is the version where the
// payload and progress columns are no longer written to system.jobs.
V23_1StopWritingPayloadAndProgressToSystemJobs

// V23_1ChangeSQLStatsTTL is the version where the gc TTL was updated to all
// SQL Stats tables.
V23_1ChangeSQLStatsTTL

// **********************************************************
// ** If we haven't yet selected a final 23.1 RC candidate **
// Step 1a: Add new versions for release-23.1 branch above here.
// **********************************************************
// Where to add new versions?
// - If the version gate is being backported to release-23.1, add the new version above this comment.
// This can be done during 23.1 Stability until we select a final RC.
// - If the version gate is for 23.2 development (not being backported to release-23.1), add the
// new version above "Step 1b"
// - Do not add new versions to a patch release.
// *************************************************

// V23_1 is CockroachDB v23.1. It's used for all v23.1.x patch releases.
V23_1

// V23_2_Start demarcates the start of cluster versions stepped through during
// V23_2Start demarcates the start of cluster versions stepped through during
// the process of upgrading from previous supported releases to 23.2.
V23_2Start

// V23_2StopWritingPayloadAndProgressToSystemJobs is the version where the
// payload and progress columns are no longer written to system.jobs.
V23_2StopWritingPayloadAndProgressToSystemJobs

// *************************************************
// Step (1): Add new versions here.
// Step 1b: Add new version for 23.2 development here.
// Do not add new versions to a patch release.
// *************************************************
)
Expand Down Expand Up @@ -893,9 +905,26 @@ var rawVersionsSingleton = keyedVersions{
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 94},
},
{
Key: V23_1ChangeSQLStatsTTL,
Key: V23_1StopWritingPayloadAndProgressToSystemJobs,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 96},
},
{
Key: V23_1ChangeSQLStatsTTL,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 98},
},

// **********************************************************
// ** If we haven't yet selected a final 23.1 RC candidate **
// Step 2a: Add new versions for release-23.1 branch above here.
// **********************************************************
// Where to add new versions?
// - If the version gate is being backported to release-23.1, add the new version above this comment.
// This can be done during 23.1 Stability until we select a final RC.
// - If the version gate is for 23.2 development (not being backported to release-23.1), add the
// new version above "Step 2b"
// - Do not add new versions to a patch release.
// *************************************************

{
Key: V23_1,
Version: roachpb.Version{Major: 23, Minor: 1, Internal: 0},
Expand All @@ -904,13 +933,9 @@ var rawVersionsSingleton = keyedVersions{
Key: V23_2Start,
Version: roachpb.Version{Major: 23, Minor: 1, Internal: 2},
},
{
Key: V23_2StopWritingPayloadAndProgressToSystemJobs,
Version: roachpb.Version{Major: 23, Minor: 1, Internal: 4},
},

// *************************************************
// Step (2): Add new versions here.
// Step 2b: Add new version gates for 23.2 development here.
// Do not add new versions to a patch release.
// *************************************************
}
Expand Down
6 changes: 3 additions & 3 deletions pkg/jobs/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ func batchJobInsertStmt(

// TODO(adityamaru: Remove this once we are outside the compatability
// window for 22.2.
if r.settings.Version.IsActive(ctx, clusterversion.V23_2StopWritingPayloadAndProgressToSystemJobs) {
if r.settings.Version.IsActive(ctx, clusterversion.V23_1StopWritingPayloadAndProgressToSystemJobs) {
columns = []string{`id`, `created`, `status`, `claim_session_id`, `claim_instance_id`, `job_type`}
valueFns = map[string]func(*Job) (interface{}, error){
`id`: func(job *Job) (interface{}, error) { return job.ID(), nil },
Expand Down Expand Up @@ -615,7 +615,7 @@ func (r *Registry) CreateJobWithTxn(
cols := []string{"id", "created", "status", "payload", "progress", "claim_session_id", "claim_instance_id", "job_type"}
vals := []interface{}{jobID, created, StatusRunning, payloadBytes, progressBytes, s.ID().UnsafeBytes(), r.ID(), jobType.String()}
log.Infof(ctx, "active version is %s", r.settings.Version.ActiveVersion(ctx))
if r.settings.Version.IsActive(ctx, clusterversion.V23_2StopWritingPayloadAndProgressToSystemJobs) {
if r.settings.Version.IsActive(ctx, clusterversion.V23_1StopWritingPayloadAndProgressToSystemJobs) {
cols = []string{"id", "created", "status", "claim_session_id", "claim_instance_id", "job_type"}
vals = []interface{}{jobID, created, StatusRunning, s.ID().UnsafeBytes(), r.ID(), jobType.String()}
}
Expand Down Expand Up @@ -739,7 +739,7 @@ func (r *Registry) CreateAdoptableJobWithTxn(
if !r.settings.Version.IsActive(ctx, clusterversion.V23_1AddTypeColumnToJobsTable) {
nCols -= 1
}
if r.settings.Version.IsActive(ctx, clusterversion.V23_2StopWritingPayloadAndProgressToSystemJobs) {
if r.settings.Version.IsActive(ctx, clusterversion.V23_1StopWritingPayloadAndProgressToSystemJobs) {
cols = []string{"id", "status", "created_by_type", "created_by_id", "job_type"}
placeholders = []string{"$1", "$2", "$3", "$4", "$5"}
values = []interface{}{jobID, StatusRunning, createdByType, createdByID, typ}
Expand Down
2 changes: 1 addition & 1 deletion pkg/jobs/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ func (u Updater) update(ctx context.Context, useReadLock bool, updateFn UpdateFn
}
}

if !u.j.registry.settings.Version.IsActive(ctx, clusterversion.V23_2StopWritingPayloadAndProgressToSystemJobs) {
if !u.j.registry.settings.Version.IsActive(ctx, clusterversion.V23_1StopWritingPayloadAndProgressToSystemJobs) {
if payloadBytes != nil {
addSetter("payload", payloadBytes)
}
Expand Down
34 changes: 21 additions & 13 deletions pkg/kv/kvserver/replica_learner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1732,7 +1732,9 @@ func TestMergeQueueDoesNotInterruptReplicationChange(t *testing.T) {
defer log.Scope(t).Close(t)
ctx := context.Background()
var activateSnapshotTestingKnob int64
var snapshotStarted int64
blockSnapshot := make(chan struct{})
snapshotInProgress := make(chan struct{})
tc := testcluster.StartTestCluster(
t, 2, base.TestClusterArgs{
ReplicationMode: base.ReplicationManual,
Expand All @@ -1744,6 +1746,14 @@ func TestMergeQueueDoesNotInterruptReplicationChange(t *testing.T) {
DisableLoadBasedSplitting: true,
ReceiveSnapshot: func(_ *kvserverpb.SnapshotRequest_Header) error {
if atomic.LoadInt64(&activateSnapshotTestingKnob) == 1 {
// While the snapshot RPC should only happen once given
// that the cluster is running under manual replication,
// retries or other mechanisms can cause this to be called
// multiple times, so let's ensure we only close the channel
// snapshotInProgress once by using the snapshotStarted flag.
if atomic.CompareAndSwapInt64(&snapshotStarted, 0, 1) {
close(snapshotInProgress)
}
<-blockSnapshot
}
return nil
Expand Down Expand Up @@ -1772,7 +1782,7 @@ func TestMergeQueueDoesNotInterruptReplicationChange(t *testing.T) {
require.NoError(t, err)

select {
case <-time.After(100 * time.Millisecond):
case <-snapshotInProgress:
// Continue.
case <-replicationChange:
t.Fatal("did not expect the replication change to complete")
Expand All @@ -1787,18 +1797,16 @@ func TestMergeQueueDoesNotInterruptReplicationChange(t *testing.T) {
// TestCluster currently overrides this when used with ReplicationManual.
db.Exec(t, `SET CLUSTER SETTING kv.range_merge.queue_enabled = true`)

testutils.SucceedsSoon(t, func() error {
// While this replication change is stalled, we'll trigger a merge and
// ensure that the merge correctly notices that there is a snapshot in
// flight and ignores the range.
store, repl := getFirstStoreReplica(t, tc.Server(0), scratchKey)
_, processErr, enqueueErr := store.Enqueue(
ctx, "merge", repl, true /* skipShouldQueue */, false, /* async */
)
require.NoError(t, enqueueErr)
require.True(t, kvserver.IsReplicationChangeInProgressError(processErr))
return nil
})
// While this replication change is stalled, we'll trigger a merge and
// ensure that the merge correctly notices that there is a snapshot in
// flight and ignores the range.
store, repl := getFirstStoreReplica(t, tc.Server(0), scratchKey)
_, processErr, enqueueErr := store.Enqueue(
ctx, "merge", repl, true /* skipShouldQueue */, false, /* async */
)
require.NoError(t, enqueueErr)
require.Truef(t, kvserver.IsReplicationChangeInProgressError(processErr),
"expected replication change in progress error, got %+v", processErr)
}

func TestMergeQueueSeesLearnerOrJointConfig(t *testing.T) {
Expand Down
10 changes: 10 additions & 0 deletions pkg/roachprod/install/cluster_synced.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"os/exec"
"os/signal"
"path/filepath"
"runtime"
"sort"
"strings"
"sync"
Expand Down Expand Up @@ -2382,9 +2383,18 @@ func (c *SyncedCluster) SSH(ctx context.Context, l *logger.Logger, sshArgs, args
// which we do want to be able to retry.
func scp(src, dest string) (*RunResultDetails, error) {
args := []string{
// Enable recursive copies, compression.
"scp", "-r", "-C",
"-o", "StrictHostKeyChecking=no",
}
if runtime.GOOS == "darwin" {
// SSH to src node and excute SCP there using agent-forwarding,
// as these are not the defaults on MacOS.
// TODO(sarkesian): Rather than checking Darwin, it would be preferable
// to check the output of `ssh-V` and check the version to see what flags
// are supported.
args = append(args, "-R", "-A")
}
args = append(args, sshAuthArgs()...)
args = append(args, src, dest)
cmd := exec.Command(args[0], args[1:]...)
Expand Down
2 changes: 1 addition & 1 deletion pkg/upgrade/upgrades/upgrades.go
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ var upgrades = []upgradebase.Upgrade{
),
upgrade.NewTenantUpgrade(
"stop writing payload and progress to system.jobs",
toCV(clusterversion.V23_2StopWritingPayloadAndProgressToSystemJobs),
toCV(clusterversion.V23_1StopWritingPayloadAndProgressToSystemJobs),
upgrade.NoPrecondition,
alterPayloadColumnToNullable,
),
Expand Down

0 comments on commit 8e9273c

Please sign in to comment.