Merge cockroachdb#102793 cockroachdb#103241 cockroachdb#103278 cockro…

…achdb#103343 102793: backupccl,kvserver: log failed ExportRequest trace on client and server r=irfansharif a=adityamaru This change strives to improve observability around backups that fail because of timed out ExportRequests. Currently, there is very little indication of what the request was doing when the client cancelled the context after the pre-determined timeout window. With this change we now log the trace of the ExportRequest that failed. If the ExportRequest was served locally, then the trace will be part of the sender's tracing span. However, if the request was served on a remote node then the sender does not wait for the server side evaluation to notice the context cancellation. To work around this, we also print the trace on the server side if the request encountered a context cancellation and the associating tracing span is not a noop. This change also adds a private cluster setting `bulkio.backup.export_request_verbose_tracing` that if set to true will send all backup export requests with verbose tracing mode. To debug a backup failing with a timed out export request we can now: - SET CLUSTER SETTING bulkio.backup.export_request_verbose_tracing = true; - SET CLUSTER SETTING trace.snapshot.rate = '1m' Once the backup times out we can look at the logs for the server side and client side ExportRequest traces to then understand where we were stuck executing for so long. Fixes: cockroachdb#86047 Release note: None 103241: kv: decouple parallel commit auto-retries from refresh auto-retries r=arulajmani a=nvanbenschoten Informs cockroachdb#100131. This commit decouples the concepts of a parallel commit auto-retry from a refresh auto-retry. This is important for weak isolation transactions which can tolerate write skew, where a failed parallel commit does not necessarily imply that the transaction needs to refresh its read spans in order to commit. Parallel commit auto-retries are performed when a parallel committing batch is successful in its execution but fails to qualify for the implicit commit condition. In such cases, the EndTxn request (and just the EndTxn request) is re-issued to move the transaction directly to the COMMITTED state. This only implies the need for a refresh for serializable transactions. For these transactions, a preemptive refresh will be performed by the txnSpanRefresher before the EndTxn is re-issued. Conversely, refresh auto-retries are performed when a batch fails to execute fully and returns an error. In such cases, the txnSpanRefresher will intercept the error, attempt to refresh, and then re-issue the entire original batch. Because we no longer make an attempt to determine which parts of a previously issued batch were successful (this was subtle and error-prone when it existed), we instead re-issue the entire batch and rely on idempotency. To make this change, this commit moves the txnCommitter interceptor above the txnSpanRefresher interceptor in the TxnCoordSender interceptor stack. This means that the txnCommitter no longer needs to guard against re-issued batches, because all retries are handled below it. Instead, the txnSpanRefresher now needs to learn about the STAGING transaction status. The txnSpanRefresher was already taught about the STAGING status on successful batches in ef77322, and is taught about the STAGING status on errors here. While the primary motivation of this change is to permit weak isolation transactions to perform parallel commit auto-retries without needing to refresh, it also has another benefit. Failed parallel commits no longer re-issue their entire batch on serialization failures. Instead, they just re-issue the EndTxn. As a result, this replaces and closes cockroachdb#93099. It also motivates release note below. Release note (performance improvement): Some large, long-running INSERT statements now perform less work during their commit phase and can run faster. 103278: kvserver: add expiration lease escape hatch r=erikgrinaker a=erikgrinaker This patch adds `COCKROACH_DISABLE_EXPIRATION_LEASES_ONLY`, which can be used to hard-disable expiration-based leases, e.g. in cases where the lease extensions overload the cluster and prevent it from working, and thus prevent operators from disabling the setting. Epic: none Release note: None 103343: logictest: increase max-sql-memory from 192MiB to 256MiB r=yuzefovich a=yuzefovich We have seen some rare flakes where the validation (after all the test queries are done) hits "memory budget exceeded" on the root monitor. To avoid this, let's just bump the root pool a little. Fixes: cockroachdb#103244. Release note: None Co-authored-by: adityamaru <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
wenyihu6 · May 15, 2023 · 7b86c7c · 7b86c7c
5 parents 19e5755 + d59a111 + 019d157 + ec5ac9f + 13d23f0
commit 7b86c7c
Show file tree

Hide file tree

Showing 16 changed files with 400 additions and 142 deletions.
diff --git a/pkg/ccl/backupccl/BUILD.bazel b/pkg/ccl/backupccl/BUILD.bazel
@@ -150,6 +150,7 @@ go_library(
         "//pkg/util/syncutil",
         "//pkg/util/timeutil",
         "//pkg/util/tracing",
+        "//pkg/util/tracing/tracingpb",
         "//pkg/util/uuid",
         "@com_github_cockroachdb_errors//:errors",
         "@com_github_cockroachdb_logtags//:logtags",

diff --git a/pkg/ccl/backupccl/backup_processor.go b/pkg/ccl/backupccl/backup_processor.go
@@ -30,6 +30,7 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/sql/rowenc"
 	"github.com/cockroachdb/cockroach/pkg/sql/rowexec"
 	"github.com/cockroachdb/cockroach/pkg/sql/types"
+	"github.com/cockroachdb/cockroach/pkg/util"
 	"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
 	"github.com/cockroachdb/cockroach/pkg/util/bulk"
 	"github.com/cockroachdb/cockroach/pkg/util/contextutil"
@@ -40,6 +41,7 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/util/stop"
 	"github.com/cockroachdb/cockroach/pkg/util/timeutil"
 	"github.com/cockroachdb/cockroach/pkg/util/tracing"
+	"github.com/cockroachdb/cockroach/pkg/util/tracing/tracingpb"
 	"github.com/cockroachdb/errors"
 	"github.com/cockroachdb/logtags"
 	"github.com/cockroachdb/redact"
@@ -55,7 +57,7 @@ var (
 		time.Minute,
 		settings.NonNegativeDuration,
 	).WithPublic()
-	delayPerAttmpt = settings.RegisterDurationSetting(
+	delayPerAttempt = settings.RegisterDurationSetting(
 		settings.TenantWritable,
 		"bulkio.backup.read_retry_delay",
 		"amount of time since the read-as-of time, per-prior attempt, to wait before making another attempt",
@@ -82,6 +84,13 @@ var (
 		"split backup data on timestamps when writing revision history",
 		true,
 	)
+
+	sendExportRequestWithVerboseTracing = settings.RegisterBoolSetting(
+		settings.TenantWritable,
+		"bulkio.backup.export_request_verbose_tracing",
+		"send each export request with a verbose tracing span",
+		util.ConstantWithMetamorphicTestBool("export_request_verbose_tracing", false),
+	)
 )
 
 const (
@@ -378,7 +387,7 @@ func runBackupProcessor(
 						// We're okay with delaying this worker until then since we assume any
 						// other work it could pull off the queue will likely want to delay to
 						// a similar or later time anyway.
-						if delay := delayPerAttmpt.Get(&clusterSettings.SV) - timeutil.Since(span.lastTried); delay > 0 {
+						if delay := delayPerAttempt.Get(&clusterSettings.SV) - timeutil.Since(span.lastTried); delay > 0 {
 							timer.Reset(delay)
 							log.Infof(ctx, "waiting %s to start attempt %d of remaining spans", delay, span.attempts+1)
 							select {
@@ -427,13 +436,22 @@ func runBackupProcessor(
 					log.VEventf(ctx, 1, "sending ExportRequest for span %s (attempt %d, priority %s)",
 						span.span, span.attempts+1, header.UserPriority.String())
 					var rawResp kvpb.Response
+					var recording tracingpb.Recording
 					var pErr *kvpb.Error
 					requestSentAt := timeutil.Now()
 					exportRequestErr := contextutil.RunWithTimeout(ctx,
 						fmt.Sprintf("ExportRequest for span %s", span.span),
 						timeoutPerAttempt.Get(&clusterSettings.SV), func(ctx context.Context) error {
+							sp := tracing.SpanFromContext(ctx)
+							opts := make([]tracing.SpanOption, 0)
+							opts = append(opts, tracing.WithParent(sp))
+							if sendExportRequestWithVerboseTracing.Get(&clusterSettings.SV) {
+								opts = append(opts, tracing.WithRecording(tracingpb.RecordingVerbose))
+							}
+							ctx, exportSpan := sp.Tracer().StartSpanCtx(ctx, "backupccl.ExportRequest", opts...)
 							rawResp, pErr = kv.SendWrappedWithAdmission(
 								ctx, flowCtx.Cfg.DB.KV().NonTransactionalSender(), header, admissionHeader, req)
+							recording = exportSpan.FinishAndGetConfiguredRecording()
 							if pErr != nil {
 								return pErr.GoError()
 							}
@@ -453,6 +471,9 @@ func runBackupProcessor(
 						// TimeoutError improves the opaque `context deadline exceeded` error
 						// message so use that instead.
 						if errors.HasType(exportRequestErr, (*contextutil.TimeoutError)(nil)) {
+							if recording != nil {
+								log.Errorf(ctx, "failed export request for span %s\n trace:\n%s", span.span, recording)
+							}
 							return errors.Wrap(exportRequestErr, "export request timeout")
 						}
 						// BatchTimestampBeforeGCError is returned if the ExportRequest
@@ -467,6 +488,10 @@ func runBackupProcessor(
 								continue
 							}
 						}
+
+						if recording != nil {
+							log.Errorf(ctx, "failed export request %s\n trace:\n%s", span.span, recording)
+						}
 						return errors.Wrapf(exportRequestErr, "exporting %s", span.span)
 					}
 

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender_server_test.go b/pkg/kv/kvclient/kvcoord/dist_sender_server_test.go
@@ -2060,6 +2060,7 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 		expClientRestart               bool   // client-side txn restart
 		expServerRefresh               bool   // server-side refresh
 		expOnePhaseCommit              bool   // 1PC commits
+		expParallelCommitAutoRetry     bool   // parallel commit auto-retries
 		expFailure                     string // regexp pattern to match on error, if not empty
 	}
 	type testCase struct {
@@ -3045,6 +3046,7 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 				expServerRefresh:               true,
 				expClientRefreshSuccess:        false,
 				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     false,
 			},
 		},
 		{
@@ -3062,10 +3064,12 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 			// The Put to "c" will succeed with a forwarded timestamp. However, the
 			// txn has already staged on the other range at an earlier timestamp. As a
 			// result, it does not qualify for the implicit commit condition and
-			// requires a client-side refresh.
+			// requires a parallel commit auto-retry and preemptive client-side
+			// refresh.
 			allIsoLevels: &expect{
 				expClientRefreshSuccess:        true,
-				expClientAutoRetryAfterRefresh: true,
+				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     true,
 			},
 		},
 		{
@@ -3149,6 +3153,7 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 				expServerRefresh:               true,
 				expClientRefreshSuccess:        false,
 				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     false,
 			},
 		},
 		{
@@ -3165,11 +3170,12 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 			// The Put to "c" will succeed after a server-side refresh. However, the
 			// txn has already staged on the other range at the pre-refresh timestamp.
 			// As a result, it does not qualify for the implicit commit condition and
-			// requires a client-side refresh.
+			// requires a parallel commit auto-retry.
 			allIsoLevels: &expect{
 				expServerRefresh:               true,
-				expClientRefreshSuccess:        true,
-				expClientAutoRetryAfterRefresh: true,
+				expClientRefreshSuccess:        false,
+				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     true,
 			},
 		},
 		{
@@ -3184,12 +3190,13 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 				return txn.CommitInBatch(ctx, b)
 			},
 			priorReads: true,
-			// The Put to "a" will fail, failing the parallel commit and forcing a
-			// client-side refresh.
+			// The Put to "a" will fail, failing the parallel commit with an error and
+			// forcing a client-side refresh and auto-retry of the full batch.
 			allIsoLevels: &expect{
 				expServerRefresh:               false,
 				expClientRefreshSuccess:        true,
 				expClientAutoRetryAfterRefresh: true,
+				expParallelCommitAutoRetry:     false,
 			},
 		},
 		{
@@ -3205,11 +3212,12 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 			},
 			priorReads: true,
 			// The Put to "c" will fail, failing the parallel commit and forcing a
-			// client-side refresh.
+			// parallel commit auto-retry and preemptive client-side refresh.
 			allIsoLevels: &expect{
 				expServerRefresh:               false,
 				expClientRefreshSuccess:        true,
-				expClientAutoRetryAfterRefresh: true,
+				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     true,
 			},
 		},
 		{
@@ -3387,11 +3395,12 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 			// The cput to "c" will succeed after a server-side refresh. However, the
 			// txn has already staged on the other range at the pre-refresh timestamp.
 			// As a result, it does not qualify for the implicit commit condition and
-			// requires a client-side refresh.
+			// requires a parallel commit auto-retry.
 			allIsoLevels: &expect{
 				expServerRefresh:               true,
-				expClientRefreshSuccess:        true,
-				expClientAutoRetryAfterRefresh: true,
+				expClientRefreshSuccess:        false,
+				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     true,
 			},
 		},
 		{
@@ -3435,11 +3444,12 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 			// The cput to "c" will succeed after a server-side refresh. However, the
 			// txn has already staged on the other range at the pre-refresh timestamp.
 			// As a result, it does not qualify for the implicit commit condition and
-			// requires a client-side refresh.
+			// requires a parallel commit auto-retry.
 			allIsoLevels: &expect{
 				expServerRefresh:               true,
-				expClientRefreshSuccess:        true,
-				expClientAutoRetryAfterRefresh: true,
+				expClientRefreshSuccess:        false,
+				expClientAutoRetryAfterRefresh: false,
+				expParallelCommitAutoRetry:     true,
 			},
 		},
 		{
@@ -3615,6 +3625,7 @@ func TestTxnCoordSenderRetries(t *testing.T) {
 		require.Equal(t, exp.expServerRefresh, metrics.ServerRefreshSuccess.Count() != 0, "TxnMetrics.ServerRefreshSuccess")
 		require.Equal(t, exp.expClientRestart, metrics.Restarts.TotalSum() != 0, "TxnMetrics.Restarts")
 		require.Equal(t, exp.expOnePhaseCommit, metrics.Commits1PC.Count() != 0, "TxnMetrics.Commits1PC")
+		require.Equal(t, exp.expParallelCommitAutoRetry, metrics.ParallelCommitAutoRetries.Count() != 0, "TxnMetrics.ParallelCommitAutoRetries")
 	}
 	for _, tc := range testCases {
 		t.Run(tc.name, func(t *testing.T) {

diff --git a/pkg/kv/kvclient/kvcoord/txn_coord_sender.go b/pkg/kv/kvclient/kvcoord/txn_coord_sender.go
@@ -163,8 +163,8 @@ type TxnCoordSender struct {
 		txnHeartbeater
 		txnSeqNumAllocator
 		txnPipeliner
-		txnSpanRefresher
 		txnCommitter
+		txnSpanRefresher
 		txnMetricRecorder
 		txnLockGatekeeper // not in interceptorStack array.
 	}
@@ -254,6 +254,7 @@ func newRootTxnCoordSender(
 	tcs.interceptorAlloc.txnCommitter = txnCommitter{
 		st:      tcf.st,
 		stopper: tcs.stopper,
+		metrics: &tcs.metrics,
 		mu:      &tcs.mu.Mutex,
 	}
 	tcs.interceptorAlloc.txnMetricRecorder = txnMetricRecorder{
@@ -274,18 +275,20 @@ func newRootTxnCoordSender(
 		// never generate transaction retry errors that could be avoided
 		// with a refresh.
 		&tcs.interceptorAlloc.txnPipeliner,
+		// The committer sits above the span refresher because it will
+		// never generate transaction retry errors that could be avoided
+		// with a refresh. However, it may re-issue parts of "successful"
+		// batches that failed to qualify for the parallel commit condition.
+		// These re-issued batches benefit from pre-emptive refreshes in the
+		// span refresher.
+		&tcs.interceptorAlloc.txnCommitter,
 		// The span refresher may resend entire batches to avoid transaction
 		// retries. Because of that, we need to be careful which interceptors
 		// sit below it in the stack.
+		// The span refresher sits below the committer, so it must be prepared
+		// to see STAGING transaction protos in responses and errors. It should
+		// defer to the committer for how to handle those.
 		&tcs.interceptorAlloc.txnSpanRefresher,
-		// The committer sits beneath the span refresher so that any
-		// retryable errors that it generates have a chance of being
-		// "refreshed away" without the need for a txn restart. Because the
-		// span refresher can re-issue batches, it needs to be careful about
-		// what parts of the batch it mutates. Any mutation needs to be
-		// idempotent and should avoid writing to memory when not changing
-		// it to avoid looking like a data race.
-		&tcs.interceptorAlloc.txnCommitter,
 		// The metrics recorder sits at the bottom of the stack so that it
 		// can observe all transformations performed by other interceptors.
 		&tcs.interceptorAlloc.txnMetricRecorder,