kv,rangefeed: integrate catchup scans with elastic cpu #89709

irfansharif · 2022-10-11T00:46:15Z

Part of #65957. First commit is from #89656.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In #89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in #86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri). Experimentally we observed that during
initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something #89589 can help with. We leave admission
integration of parallel worker goroutines to future work.

Unlike export requests, rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to use at most 100ms of CPU time (it makes
for an ineffective controller otherwise), and to that end we introduce
the following component used within the rangefeed catchup iterator:

// Pacer is used in tight loops (CPU-bound) for non-premptible
// elastic work. Callers are expected to invoke Pace() every loop
// iteration and Close() once done. Internally this type integrates
// with elastic CPU work queue, acquiring tokens for the CPU work
// being done, and blocking if tokens are unavailable. This allows
// for a form of cooperative scheduling with elastic CPU granters.
type Pacer struct
func (p *Pacer) Pace(ctx context.Context) error { ... }
func (p *Pacer) Close() { ... }

Release note: None

cockroach-teamcity · 2022-10-11T00:46:36Z

This change is

irfansharif · 2022-10-11T03:12:07Z

Before (lack of throughput isolation, very poor latency isolation -- p99 latency hits 10+s):

After (throughput isolation, ok latency isolation -- p99 latency is ~60ms):

erikgrinaker

Very promising!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

pkg/kv/kvserver/rangefeed/catchup_scan.go line 334 at r4 (raw file):

		}

		if err := i.pacer.Pace(ctx); err != nil {

This should be lifted to the start of the loop, to account for the continue statements. In particular, the !withDiff fast path.

sumeerbhola

(flushing some comments -- haven't finished reading)

just curious: Is the total number of catchup scans the same in the before and after runs? The time duration for the changefeeds look similar despite the lower cpu being given to the them in the after case.

Reviewed 16 of 16 files at r2, 2 of 2 files at r3, 6 of 16 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

-- commits line 20 at r2:
how will this affect someone who starts using this in v22.2 and then upgrade to v23.1?

-- commits line 46 at r4:
Add a reference to #88733.
Also, this is something we should consider backporting to v22.2 so worth pulling out into a separate PR/commit.

pkg/ccl/changefeedccl/kvfeed/scanner.go line 128 at r4 (raw file):

	knobs TestingKnobs,
) error {
	txn := p.db.NewTxn(ctx, "changefeed backfill")

See #88733. That was suggesting overriding the priority at txn creation time. What this code is doing is overriding for each batch. The latter may be nicer to others in that the CreateTime is advanced, which means later batches will not out-compete others because their txn started earlier.
This can use a code comment.

pkg/ccl/changefeedccl/kvfeed/scanner.go line 137 at r4 (raw file):

	stopwatchStart := timeutil.Now()
	var scanDuration, bufferDuration time.Duration
	targetBytesPerScan := changefeedbase.ScanRequestSize.Get(&p.settings.SV)

changefeed.backfill.scan_request_size of 16MB is too large. Can you add a TODO for cdc folks to reduce the default?

pkg/clusterversion/cockroach_versions.go line 314 at r4 (raw file):

	// something that was not true in older versions.
	//
	// TODO(irfansharif): Depends on v22.2 being minted first.

why do we need a cluster version for this change? This doesn't seem to need cluster-wide coordination.

miretskiy · 2022-10-11T13:59:22Z

pkg/ccl/changefeedccl/changefeedbase/settings.go

@@ -108,7 +108,7 @@ var FrontierCheckpointMaxBytes = settings.RegisterByteSizeSetting(
 	settings.TenantWritable,
 	"changefeed.frontier_checkpoint_max_bytes",
 	"controls the maximum size of the checkpoint as a total size of key bytes",
-	1<<20,
+	1<<20, // 1 MiB


minor nit:maybe exclude this file from this pr
(but if you feel strongly, keep it)

miretskiy · 2022-10-11T14:00:33Z

pkg/ccl/changefeedccl/kvfeed/scanner.go

@@ -140,6 +141,12 @@ func (p *scanRequestScanner) exportSpan(
 		r := roachpb.NewScan(remaining.Key, remaining.EndKey, false /* forUpdate */).(*roachpb.ScanRequest)
 		r.ScanFormat = roachpb.BATCH_RESPONSE
 		b.Header.TargetBytes = targetBytesPerScan
+		b.AdmissionHeader = roachpb.AdmissionHeader{
+			Priority:                 int32(admissionpb.BulkNormalPri),


How would you feel about adding an option to kvfeed library to control this priority? Do we ever envision if e.g. some system table scanners may want to run at an elevated priority?

miretskiy · 2022-10-11T14:02:32Z

pkg/cmd/roachtest/tests/admission_control_elastic_cdc.go

+		Name:  "admission-control/elastic-cdc",
+		Owner: registry.OwnerAdmissionControl,
+		// TODO(irfansharif): After two weeks of nightly baking time, reduce
+		// this to a weekly cadence. This is a long-running test and serves only


for my own info: where can we access the results/perf numbers for this test?

miretskiy · 2022-10-11T14:03:42Z

pkg/cmd/roachtest/tests/admission_control.go

@@ -29,6 +29,7 @@ func registerAdmission(r registry.Registry) {
 	// over some latency threshold. Will be Useful to track over time.

 	registerElasticControlForBackups(r)
+	registerElasticControlForCDC(r)


nit: Should we use ForRangefeed as a name (and as a file name of the controller)? It's not specific just to CDC -- c2c will benefit for example.

miretskiy · 2022-10-11T14:04:04Z

pkg/cmd/roachtest/tests/admission_control_elastic_cdc.go

+// We've observed latency spikes during backfills because of its CPU/scan-heavy
+// nature -- it can elevate CPU scheduling latencies which in turn translates to
+// an increase in foreground latency.
+func registerElasticControlForCDC(r registry.Registry) {


nit: consider renaming ForRangefeed

miretskiy · 2022-10-11T14:06:07Z

pkg/cmd/roachtest/tests/admission_control_elastic_cdc.go

+								stmtWithCursor := fmt.Sprintf(`
+									CREATE CHANGEFEED FOR tpcc.order_line, tpcc.stock, tpcc.customer
+									INTO 'null://' WITH cursor = '-%ds'
+								`, int64(float64(i+1)*padDuration.Seconds()))


let's add a comment re cursor -- that part is important to trigger catchup scans.

miretskiy · 2022-10-11T14:06:50Z

pkg/cmd/roachtest/tests/admission_control_elastic_cdc.go

+					}
+
+					t.Status(fmt.Sprintf("setting performance baseline (<%s)", padDuration))
+					time.Sleep(padDuration)


I think this sleep is important -- small comment por favore?

miretskiy · 2022-10-11T14:08:37Z

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go

+		AdmissionHeader: roachpb.AdmissionHeader{
+			// NB: AdmissionHeader is used only at the start of the range feed
+			// stream since the initial catch-up scan is expensive.
+			Priority:                 int32(admissionpb.BulkNormalPri),


Same comment as on rangefeed library -- consider making it an option
to support system rangefeeds running with higher priority if needed.

miretskiy · 2022-10-11T14:31:48Z

pkg/kv/kvserver/rangefeed/catchup_scan.go

@@ -319,6 +330,10 @@ func (i *CatchUpIterator) CatchUpScan(outputFn outputEventFn, withDiff bool) err
 				i.Next()
 			}
 		}
+
+		if err := i.pacer.Pace(ctx); err != nil {
+			return errors.Wrap(err, "automatic pacing: %v")


How is this error handled and propagated through rangefeed stack?
Is this error ever bubbled up to the client (dist_sender_rangefeed)? If so, it needs to be handled - presumably in a retryable fashion.

irfansharif · 2022-10-11T17:01:21Z

Experiment data: rangefeed-catchup-scan.zip. Follow instructions here to visualize things locally: #89656 (comment). +cc @jayshrivastava.

sumeerbhola

Reviewed 10 of 16 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

pkg/kv/kvserver/store.go line 3106 at r4 (raw file):

	tenID, _ := repl.TenantID()
	if pacer := s.cfg.KVAdmissionController.AdmitRangefeedRequest(tenID, args); pacer != nil {
		ctx = kvadmission.ContextWithPacer(ctx, pacer)

nit: what is the motivation for this context dance instead of passing the pacer directly as a parameter? Unlike the normal BatchRequest execution path there is only 1 caller along this code path.

irfansharif · 2022-10-17T20:12:43Z

I'm going to pull out the version-gate related things and backport it to 22.2 since it's not out yet, to give ourselves the flexibility to use this integration in 22.2 clusters under supervision. Similar to what we're planning to do for backups. I'm still not planning to backport the actual integration itself, not until it's had a few more weeks to bake, but including the right AdmissionHeaders in older versions seems fine.

Edit: see #90093.

This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (cockroachdb#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (cockroachdb#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers are safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None

90017: sql: remove unnecessary rowsorts from distsql_stats logic tests r=mgartner a=michae2 **sql: remove unnecessary rowsorts from distsql_stats logic tests** In distsql_stats and distsql_automatic_stats logic tests we frequently use queries like the following: ```sql SELECT ... FROM [SHOW STATISTICS FOR TABLE foo] ORDER BY column_names, created DESC ``` These queries have guaranteed order, and do not need the rowsort option. Epic: CRDB-20535 Release note: None **sql: bump testlogic --rewrite delay for queries with retry to 2 sec** This prevents some spurious rewrites in the distsql_automatic_stats logic test. Epic: CRDB-20535 Release note: None 90070: ui: add apply recommendations to table details r=maryliag a=maryliag On Table details page, we have a table with index recommendations, which right now is focused on drop unused indexes. This commit adds the button the apply the recommendations, when there is one. Fixes #86949 <img width="1235" alt="Screen Shot 2022-10-17 at 12 49 05 PM" src="https://user-images.githubusercontent.com/1017486/196237146-72814f16-05c4-4bba-a716-f6ba875c7b65.png"> <img width="1281" alt="Screen Shot 2022-10-17 at 12 49 11 PM" src="https://user-images.githubusercontent.com/1017486/196237156-777470de-3fd8-41ea-951f-d053bf6337a1.png"> Release note (ui change): Add apply button on Table Details page (db console only), when there is a recommendation to drop an unused index. 90093: changefeed,kvcoord: populate AC headers for backfill work r=irfansharif a=irfansharif This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers is safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None Release justification: Low-risk change that opens the door to a future backport in a non-.0 release to address cases where rangefeed catchup scans affect foreground latencies. Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: irfan sharif <[email protected]>

This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers are safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None

sumeerbhola

Reviewed 29 of 33 files at r5, 5 of 7 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This is a good suggestion. I did something slightly different -- instead of treating all rangefeed catchup scans homogeneously, now system-internal rangefeeds (we use this all over the place in span configs, settings watchers, etc.) use into a higher priority; PTAL at uses + plumbing for rangefeed.WithSystemTablePriority. As for using a configurable pri at this level -- we might want to if we introduce notions of "higher priority" changefeeds, perhaps over more important data or for more important apps. I'll leave that as a TODO.

Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?

pkg/clusterversion/cockroach_versions.go line 304 at r8 (raw file):

	// supported in cloud storage and KMS.
	SupportAssumeRoleAuth

nit: stray diff here due to the empty line being removed

pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

					"admission.admitted.sql-sql-response.locking",
					"admission.admitted.sql-sql-response.normal",
					"admission.errored.elastic-cpu.normal",

I must have missed something here. Why is normal priority getting classified as elastic traffic?

irfansharif

@sumeerbhola, you might've hit the wrong reviewable button. bors will no longer let me merge unless the green stamp goes through -- it doesn't recognize the LGTM sticker.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)

pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, sumeerbhola wrote…

Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?

We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:

cockroach/pkg/kv/kvclient/rangefeed/scanner.go

Lines 24 to 30 in c1e4c93

    
           // runInitialScan will attempt to perform an initial data scan. 
        
           // It will retry in the face of errors and will only return upon 
        
           // success, context cancellation, or an error handling function which indicates 
        
           // that an error is unrecoverable. The return value will be true if the context 
        
           // was canceled or if the OnInitialScanError function indicated that the 
        
           // RangeFeed should stop. 
        
           func (f *RangeFeed) runInitialScan(

But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?

cockroach/pkg/kv/kvclient/rangefeed/db_adapter.go

Lines 181 to 192 in 9f11345

    
           return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { 
        
           	if err := txn.SetFixedTimestamp(ctx, asOf); err != nil { 
        
           		return err 
        
           	} 
        
           	sp := span 
        
           	var b kv.Batch 
        
           	for { 
        
           		b.Header.TargetBytes = targetScanBytes 
        
           		b.Scan(sp.Key, sp.EndKey) 
        
           		if err := txn.Run(ctx, &b); err != nil { 
        
           			return err 
        
           		}

pkg/clusterversion/cockroach_versions.go line 304 at r8 (raw file):

Previously, sumeerbhola wrote…

nit: stray diff here due to the empty line being removed

Was on purpose, none of the named versions above have line breaks between them.

pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, sumeerbhola wrote…

I must have missed something here. Why is normal priority getting classified as elastic traffic?

That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.

sumeerbhola

Reviewed 1 of 16 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:

cockroach/pkg/kv/kvclient/rangefeed/scanner.go

Lines 24 to 30 in c1e4c93

// runInitialScan will attempt to perform an initial data scan.

// It will retry in the face of errors and will only return upon

// success, context cancellation, or an error handling function which indicates

// that an error is unrecoverable. The return value will be true if the context

// was canceled or if the OnInitialScanError function indicated that the

// RangeFeed should stop.

func (f *RangeFeed) runInitialScan(

But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?

cockroach/pkg/kv/kvclient/rangefeed/db_adapter.go

Lines 181 to 192 in 9f11345

return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {

if err := txn.SetFixedTimestamp(ctx, asOf); err != nil {

return err

}

sp := span

var b kv.Batch

for {

b.Header.TargetBytes = targetScanBytes

b.Scan(sp.Key, sp.EndKey)

if err := txn.Run(ctx, &b); err != nil {

return err

}

That dbc.db.Txn() method is doing

	return db.TxnWithAdmissionControl(
		ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)

so it will bypass admission control because of OTHER.

pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, irfansharif (irfan sharif) wrote…

That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.

Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?

irfansharif

PTAL. Want to merge this soon-ish after recent discussions with @jayshrivastava who wants to make use of this Pacer library up in changefeed code where we decode rows 1-by-1 one.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)

pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, sumeerbhola wrote…

That dbc.db.Txn() method is doing
	return db.TxnWithAdmissionControl(
		ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)
so it will bypass admission control because of OTHER.

Brilliant, shoudn't have just assumed (frequent occurrence like you've seen). Made these rangefeed catch up scans opt into AC, I think they should, and for the ones declared over system tables specify NormalPri (for everything else use BulkNormalPri). Amended the last commit -- PTAL.

pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, sumeerbhola wrote…

Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?

This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.

sumeerbhola

Reviewed 4 of 16 files at r9, 13 of 17 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)

pkg/kv/db.go line 895 at r10 (raw file):

// the other variant (TxnWithAdmissionControl) the default, or maybe rename this
// to be more explicit (TxnWithoutAdmissionControl) so new callers have to be
// conscious about what they want.

+1 to TxnWithoutAdmissionControl

pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.

I agree that it is likely that the total slots will stay too large, and then this control will become ineffective.

Pure code movement/renaming. We also renamed a cluster setting kv.store.admission.provisioned_bandwidth to kvadmission.store.provisioned_bandwidth. Release note

There's only ever one URL. Release note: None

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

.. catch-up scans, introduce a private cluster setting (kvadmission.rangefeed_catchup_scan_elastic_control.enabled) to selectively switch off catch-up scan integration if needed, and plumb kvadmission.Pacer in explicitly to rangefeed catchup scan loop instead of opaquely through the surrounding context. Release note: None

irfansharif · 2022-11-04T00:31:50Z

bors r+

craig · 2022-11-04T01:38:21Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-11-04T03:11:20Z

Build succeeded:

Bazel Essential CI (Cockroach)

90579: roachtest: reduce frequency of benchmark-only AC tests r=irfansharif a=irfansharif First four commits are from #89709 and should be ignored here. These tests will only serve as coarse-grained benchmarks for things AC cares about -- we don't need to run them nightly. They've spent ~2 weeks through our nightly CI suite without flaking so reduce frequency to a weekly cadence. We'll do this same thing for most tests added as part of #89208. Release note: None Co-authored-by: irfan sharif <[email protected]>

irfansharif · 2022-11-04T21:04:36Z

I'm just going to backport it, but not to the .0 point. In 22.2 this machinery is all switched off by default anyway. If the next few weeks of baking time surface anything specific to this catchup scan integration, we can quickly fix and backport those too.

blathers backport 22.2

blathers-crl · 2022-11-04T21:04:43Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 3414b03 to blathers/backport-release-22.2-89709: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2 failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

irfansharif requested a review from a team October 11, 2022 00:46

irfansharif requested review from a team as code owners October 11, 2022 00:46

irfansharif requested review from renatolabs and miretskiy and removed request for a team and renatolabs October 11, 2022 00:46

shermanCRL requested a review from jayshrivastava October 11, 2022 01:06

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 33a2630 to 1de437b Compare October 11, 2022 03:33

irfansharif mentioned this pull request Oct 11, 2022

admission: add integration tests #89208

Open

21 tasks

erikgrinaker reviewed Oct 11, 2022

View reviewed changes

sumeerbhola requested changes Oct 11, 2022

View reviewed changes

miretskiy reviewed Oct 11, 2022

View reviewed changes

sumeerbhola reviewed Oct 11, 2022

View reviewed changes

irfansharif mentioned this pull request Oct 17, 2022

admission,changefeedccl: integrate parallel workers w/ elastic cpu control #90089

Closed

irfansharif mentioned this pull request Oct 17, 2022

changefeed,kvcoord: populate AC headers for backfill work #90093

Merged

blathers-crl bot mentioned this pull request Oct 17, 2022

release-22.2: changefeed,kvcoord: populate AC headers for backfill work #90102

Merged

blathers-crl bot mentioned this pull request Oct 17, 2022

release-22.2.0: changefeed,kvcoord: populate AC headers for backfill work #90103

Merged

irfansharif mentioned this pull request Oct 26, 2022

*: fine-grained cpu attribution #82625

Closed

5 tasks

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 83f7d67 to 9f11345 Compare October 26, 2022 18:58

sumeerbhola requested changes Oct 28, 2022

View reviewed changes

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 9f11345 to f94d821 Compare October 28, 2022 14:24

irfansharif commented Oct 28, 2022

View reviewed changes

irfansharif requested a review from sumeerbhola October 28, 2022 15:39

sumeerbhola requested changes Oct 28, 2022

View reviewed changes

irfansharif commented Nov 2, 2022

View reviewed changes

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from f94d821 to f960408 Compare November 2, 2022 18:16

irfansharif requested a review from sumeerbhola November 2, 2022 18:18

sumeerbhola approved these changes Nov 3, 2022

View reviewed changes

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from f960408 to 0cf54c1 Compare November 3, 2022 22:10

irfansharif added 4 commits November 3, 2022 18:25

kvadmission: carve out package for AC<->KV integration

3414b03

Pure code movement/renaming. We also renamed a cluster setting kv.store.admission.provisioned_bandwidth to kvadmission.store.provisioned_bandwidth. Release note

roachprod: simplify grafana-url computation

6f3fc68

There's only ever one URL. Release note: None

irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 0cf54c1 to a042b60 Compare November 3, 2022 22:25

craig bot merged commit 2238d70 into cockroachdb:master Nov 4, 2022

pav-kv mentioned this pull request Nov 4, 2022

kvserver: make consistency checking rate elastic #91288

Open

irfansharif deleted the 221010.elastic-cpu-rangefeed branch November 4, 2022 20:58

irfansharif mentioned this pull request Nov 24, 2022

release-22.2: kv,rangefeed: integrate catchup scans with elastic cpu #92439

Merged

nvanbenschoten mentioned this pull request Mar 17, 2023

kv: pace rangefeed closed timestamp notifications across refresh interval #98192

Merged

exalate-issue-sync bot mentioned this pull request Jun 7, 2023

Describe backup priority + admission control cockroachdb/docs#13738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv,rangefeed: integrate catchup scans with elastic cpu #89709

kv,rangefeed: integrate catchup scans with elastic cpu #89709

irfansharif commented Oct 11, 2022 •

edited

Loading

cockroach-teamcity commented Oct 11, 2022

irfansharif commented Oct 11, 2022 •

edited

Loading

erikgrinaker left a comment

sumeerbhola left a comment

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

shermanCRL Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

miretskiy Oct 11, 2022

irfansharif commented Oct 11, 2022

sumeerbhola left a comment

irfansharif commented Oct 17, 2022 •

edited

Loading

sumeerbhola left a comment

irfansharif left a comment

sumeerbhola left a comment

irfansharif left a comment

sumeerbhola left a comment

irfansharif commented Nov 4, 2022

craig bot commented Nov 4, 2022

craig bot commented Nov 4, 2022

irfansharif commented Nov 4, 2022

blathers-crl bot commented Nov 4, 2022

	// runInitialScan will attempt to perform an initial data scan.
	// It will retry in the face of errors and will only return upon
	// success, context cancellation, or an error handling function which indicates
	// that an error is unrecoverable. The return value will be true if the context
	// was canceled or if the OnInitialScanError function indicated that the
	// RangeFeed should stop.
	func (f *RangeFeed) runInitialScan(

	return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
	if err := txn.SetFixedTimestamp(ctx, asOf); err != nil {
	return err
	}
	sp := span
	var b kv.Batch
	for {
	b.Header.TargetBytes = targetScanBytes
	b.Scan(sp.Key, sp.EndKey)
	if err := txn.Run(ctx, &b); err != nil {
	return err
	}

kv,rangefeed: integrate catchup scans with elastic cpu #89709

kv,rangefeed: integrate catchup scans with elastic cpu #89709

Conversation

irfansharif commented Oct 11, 2022 • edited Loading

cockroach-teamcity commented Oct 11, 2022

irfansharif commented Oct 11, 2022 • edited Loading

erikgrinaker left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irfansharif commented Oct 11, 2022

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif commented Oct 17, 2022 • edited Loading

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif commented Nov 4, 2022

craig bot commented Nov 4, 2022

craig bot commented Nov 4, 2022

irfansharif commented Nov 4, 2022

blathers-crl bot commented Nov 4, 2022

irfansharif commented Oct 11, 2022 •

edited

Loading

irfansharif commented Oct 11, 2022 •

edited

Loading

irfansharif commented Oct 17, 2022 •

edited

Loading