-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv,rangefeed: integrate catchup scans with elastic cpu #89709
kv,rangefeed: integrate catchup scans with elastic cpu #89709
Conversation
33a2630
to
1de437b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very promising!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
pkg/kv/kvserver/rangefeed/catchup_scan.go
line 334 at r4 (raw file):
} if err := i.pacer.Pace(ctx); err != nil {
This should be lifted to the start of the loop, to account for the continue
statements. In particular, the !withDiff
fast path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(flushing some comments -- haven't finished reading)
just curious: Is the total number of catchup scans the same in the before and after runs? The time duration for the changefeeds look similar despite the lower cpu being given to the them in the after case.
Reviewed 16 of 16 files at r2, 2 of 2 files at r3, 6 of 16 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
-- commits
line 20 at r2:
how will this affect someone who starts using this in v22.2 and then upgrade to v23.1?
-- commits
line 46 at r4:
Add a reference to #88733.
Also, this is something we should consider backporting to v22.2 so worth pulling out into a separate PR/commit.
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 128 at r4 (raw file):
knobs TestingKnobs, ) error { txn := p.db.NewTxn(ctx, "changefeed backfill")
See #88733. That was suggesting overriding the priority at txn creation time. What this code is doing is overriding for each batch. The latter may be nicer to others in that the CreateTime is advanced, which means later batches will not out-compete others because their txn started earlier.
This can use a code comment.
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 137 at r4 (raw file):
stopwatchStart := timeutil.Now() var scanDuration, bufferDuration time.Duration targetBytesPerScan := changefeedbase.ScanRequestSize.Get(&p.settings.SV)
changefeed.backfill.scan_request_size of 16MB is too large. Can you add a TODO for cdc folks to reduce the default?
pkg/clusterversion/cockroach_versions.go
line 314 at r4 (raw file):
// something that was not true in older versions. // // TODO(irfansharif): Depends on v22.2 being minted first.
why do we need a cluster version for this change? This doesn't seem to need cluster-wide coordination.
@@ -108,7 +108,7 @@ var FrontierCheckpointMaxBytes = settings.RegisterByteSizeSetting( | |||
settings.TenantWritable, | |||
"changefeed.frontier_checkpoint_max_bytes", | |||
"controls the maximum size of the checkpoint as a total size of key bytes", | |||
1<<20, | |||
1<<20, // 1 MiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit:maybe exclude this file from this pr
(but if you feel strongly, keep it)
@@ -140,6 +141,12 @@ func (p *scanRequestScanner) exportSpan( | |||
r := roachpb.NewScan(remaining.Key, remaining.EndKey, false /* forUpdate */).(*roachpb.ScanRequest) | |||
r.ScanFormat = roachpb.BATCH_RESPONSE | |||
b.Header.TargetBytes = targetBytesPerScan | |||
b.AdmissionHeader = roachpb.AdmissionHeader{ | |||
Priority: int32(admissionpb.BulkNormalPri), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about adding an option to kvfeed library to control this priority? Do we ever envision if e.g. some system table scanners may want to run at an elevated priority?
Name: "admission-control/elastic-cdc", | ||
Owner: registry.OwnerAdmissionControl, | ||
// TODO(irfansharif): After two weeks of nightly baking time, reduce | ||
// this to a weekly cadence. This is a long-running test and serves only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own info: where can we access the results/perf numbers for this test?
@@ -29,6 +29,7 @@ func registerAdmission(r registry.Registry) { | |||
// over some latency threshold. Will be Useful to track over time. | |||
|
|||
registerElasticControlForBackups(r) | |||
registerElasticControlForCDC(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we use ForRangefeed
as a name (and as a file name of the controller)? It's not specific just to CDC -- c2c will benefit for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
// We've observed latency spikes during backfills because of its CPU/scan-heavy | ||
// nature -- it can elevate CPU scheduling latencies which in turn translates to | ||
// an increase in foreground latency. | ||
func registerElasticControlForCDC(r registry.Registry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider renaming ForRangefeed
stmtWithCursor := fmt.Sprintf(` | ||
CREATE CHANGEFEED FOR tpcc.order_line, tpcc.stock, tpcc.customer | ||
INTO 'null://' WITH cursor = '-%ds' | ||
`, int64(float64(i+1)*padDuration.Seconds())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a comment re cursor -- that part is important to trigger catchup scans.
} | ||
|
||
t.Status(fmt.Sprintf("setting performance baseline (<%s)", padDuration)) | ||
time.Sleep(padDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sleep is important -- small comment por favore?
AdmissionHeader: roachpb.AdmissionHeader{ | ||
// NB: AdmissionHeader is used only at the start of the range feed | ||
// stream since the initial catch-up scan is expensive. | ||
Priority: int32(admissionpb.BulkNormalPri), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as on rangefeed library -- consider making it an option
to support system rangefeeds running with higher priority if needed.
@@ -319,6 +330,10 @@ func (i *CatchUpIterator) CatchUpScan(outputFn outputEventFn, withDiff bool) err | |||
i.Next() | |||
} | |||
} | |||
|
|||
if err := i.pacer.Pace(ctx); err != nil { | |||
return errors.Wrap(err, "automatic pacing: %v") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this error handled and propagated through rangefeed stack?
Is this error ever bubbled up to the client (dist_sender_rangefeed)? If so, it needs to be handled - presumably in a retryable fashion.
Experiment data: rangefeed-catchup-scan.zip. Follow instructions here to visualize things locally: #89656 (comment). +cc @jayshrivastava. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 16 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
pkg/kv/kvserver/store.go
line 3106 at r4 (raw file):
tenID, _ := repl.TenantID() if pacer := s.cfg.KVAdmissionController.AdmitRangefeedRequest(tenID, args); pacer != nil { ctx = kvadmission.ContextWithPacer(ctx, pacer)
nit: what is the motivation for this context dance instead of passing the pacer directly as a parameter? Unlike the normal BatchRequest execution path there is only 1 caller along this code path.
I'm going to pull out the version-gate related things and backport it to 22.2 since it's not out yet, to give ourselves the flexibility to use this integration in 22.2 clusters under supervision. Similar to what we're planning to do for backups. I'm still not planning to backport the actual integration itself, not until it's had a few more weeks to bake, but including the right AdmissionHeaders in older versions seems fine. Edit: see #90093. |
This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (cockroachdb#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (cockroachdb#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers are safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None
90017: sql: remove unnecessary rowsorts from distsql_stats logic tests r=mgartner a=michae2 **sql: remove unnecessary rowsorts from distsql_stats logic tests** In distsql_stats and distsql_automatic_stats logic tests we frequently use queries like the following: ```sql SELECT ... FROM [SHOW STATISTICS FOR TABLE foo] ORDER BY column_names, created DESC ``` These queries have guaranteed order, and do not need the rowsort option. Epic: CRDB-20535 Release note: None **sql: bump testlogic --rewrite delay for queries with retry to 2 sec** This prevents some spurious rewrites in the distsql_automatic_stats logic test. Epic: CRDB-20535 Release note: None 90070: ui: add apply recommendations to table details r=maryliag a=maryliag On Table details page, we have a table with index recommendations, which right now is focused on drop unused indexes. This commit adds the button the apply the recommendations, when there is one. Fixes #86949 <img width="1235" alt="Screen Shot 2022-10-17 at 12 49 05 PM" src="https://user-images.githubusercontent.com/1017486/196237146-72814f16-05c4-4bba-a716-f6ba875c7b65.png"> <img width="1281" alt="Screen Shot 2022-10-17 at 12 49 11 PM" src="https://user-images.githubusercontent.com/1017486/196237156-777470de-3fd8-41ea-951f-d053bf6337a1.png"> Release note (ui change): Add apply button on Table Details page (db console only), when there is a recommendation to drop an unused index. 90093: changefeed,kvcoord: populate AC headers for backfill work r=irfansharif a=irfansharif This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers is safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None Release justification: Low-risk change that opens the door to a future backport in a non-.0 release to address cases where rangefeed catchup scans affect foreground latencies. Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: irfan sharif <[email protected]>
This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers are safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None
This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers are safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests). Release note: None
83f7d67
to
9f11345
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 29 of 33 files at r5, 5 of 7 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 145 at r4 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This is a good suggestion. I did something slightly different -- instead of treating all rangefeed catchup scans homogeneously, now system-internal rangefeeds (we use this all over the place in span configs, settings watchers, etc.) use into a higher priority; PTAL at uses + plumbing for
rangefeed.WithSystemTablePriority
. As for using a configurable pri at this level -- we might want to if we introduce notions of "higher priority" changefeeds, perhaps over more important data or for more important apps. I'll leave that as a TODO.
Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?
pkg/clusterversion/cockroach_versions.go
line 304 at r8 (raw file):
// supported in cloud storage and KMS. SupportAssumeRoleAuth
nit: stray diff here due to the empty line being removed
pkg/ts/catalog/chart_catalog.go
line 3551 at r8 (raw file):
"admission.admitted.sql-sql-response.locking", "admission.admitted.sql-sql-response.normal", "admission.errored.elastic-cpu.normal",
I must have missed something here. Why is normal priority getting classified as elastic traffic?
9f11345
to
f94d821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola, you might've hit the wrong reviewable button. bors will no longer let me merge unless the green stamp goes through -- it doesn't recognize the LGTM sticker.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 145 at r4 (raw file):
Previously, sumeerbhola wrote…
Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?
We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:
cockroach/pkg/kv/kvclient/rangefeed/scanner.go
Lines 24 to 30 in c1e4c93
// runInitialScan will attempt to perform an initial data scan. | |
// It will retry in the face of errors and will only return upon | |
// success, context cancellation, or an error handling function which indicates | |
// that an error is unrecoverable. The return value will be true if the context | |
// was canceled or if the OnInitialScanError function indicated that the | |
// RangeFeed should stop. | |
func (f *RangeFeed) runInitialScan( |
But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?
cockroach/pkg/kv/kvclient/rangefeed/db_adapter.go
Lines 181 to 192 in 9f11345
return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { | |
if err := txn.SetFixedTimestamp(ctx, asOf); err != nil { | |
return err | |
} | |
sp := span | |
var b kv.Batch | |
for { | |
b.Header.TargetBytes = targetScanBytes | |
b.Scan(sp.Key, sp.EndKey) | |
if err := txn.Run(ctx, &b); err != nil { | |
return err | |
} |
pkg/clusterversion/cockroach_versions.go
line 304 at r8 (raw file):
Previously, sumeerbhola wrote…
nit: stray diff here due to the empty line being removed
Was on purpose, none of the named versions above have line breaks between them.
pkg/ts/catalog/chart_catalog.go
line 3551 at r8 (raw file):
Previously, sumeerbhola wrote…
I must have missed something here. Why is normal priority getting classified as elastic traffic?
That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 16 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 145 at r4 (raw file):
Previously, irfansharif (irfan sharif) wrote…
We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:
cockroach/pkg/kv/kvclient/rangefeed/scanner.go
Lines 24 to 30 in c1e4c93
// runInitialScan will attempt to perform an initial data scan. // It will retry in the face of errors and will only return upon // success, context cancellation, or an error handling function which indicates // that an error is unrecoverable. The return value will be true if the context // was canceled or if the OnInitialScanError function indicated that the // RangeFeed should stop. func (f *RangeFeed) runInitialScan( But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?
cockroach/pkg/kv/kvclient/rangefeed/db_adapter.go
Lines 181 to 192 in 9f11345
return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { if err := txn.SetFixedTimestamp(ctx, asOf); err != nil { return err } sp := span var b kv.Batch for { b.Header.TargetBytes = targetScanBytes b.Scan(sp.Key, sp.EndKey) if err := txn.Run(ctx, &b); err != nil { return err }
That dbc.db.Txn()
method is doing
return db.TxnWithAdmissionControl(
ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)
so it will bypass admission control because of OTHER.
pkg/ts/catalog/chart_catalog.go
line 3551 at r8 (raw file):
Previously, irfansharif (irfan sharif) wrote…
That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.
Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. Want to merge this soon-ish after recent discussions with @jayshrivastava who wants to make use of this Pacer library up in changefeed code where we decode rows 1-by-1 one.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)
pkg/ccl/changefeedccl/kvfeed/scanner.go
line 145 at r4 (raw file):
Previously, sumeerbhola wrote…
That
dbc.db.Txn()
method is doingreturn db.TxnWithAdmissionControl( ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)
so it will bypass admission control because of OTHER.
Brilliant, shoudn't have just assumed (frequent occurrence like you've seen). Made these rangefeed catch up scans opt into AC, I think they should, and for the ones declared over system tables specify NormalPri (for everything else use BulkNormalPri). Amended the last commit -- PTAL.
pkg/ts/catalog/chart_catalog.go
line 3551 at r8 (raw file):
Previously, sumeerbhola wrote…
Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?
This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.
f94d821
to
f960408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 16 files at r9, 13 of 17 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)
pkg/kv/db.go
line 895 at r10 (raw file):
// the other variant (TxnWithAdmissionControl) the default, or maybe rename this // to be more explicit (TxnWithoutAdmissionControl) so new callers have to be // conscious about what they want.
+1 to TxnWithoutAdmissionControl
pkg/ts/catalog/chart_catalog.go
line 3551 at r8 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.
I agree that it is likely that the total slots will stay too large, and then this control will become ineffective.
f960408
to
0cf54c1
Compare
Pure code movement/renaming. We also renamed a cluster setting kv.store.admission.provisioned_bandwidth to kvadmission.store.provisioned_bandwidth. Release note
There's only ever one URL. Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
.. catch-up scans, introduce a private cluster setting (kvadmission.rangefeed_catchup_scan_elastic_control.enabled) to selectively switch off catch-up scan integration if needed, and plumb kvadmission.Pacer in explicitly to rangefeed catchup scan loop instead of opaquely through the surrounding context. Release note: None
0cf54c1
to
a042b60
Compare
bors r+ |
Build failed (retrying...): |
Build succeeded: |
90579: roachtest: reduce frequency of benchmark-only AC tests r=irfansharif a=irfansharif First four commits are from #89709 and should be ignored here. These tests will only serve as coarse-grained benchmarks for things AC cares about -- we don't need to run them nightly. They've spent ~2 weeks through our nightly CI suite without flaking so reduce frequency to a weekly cadence. We'll do this same thing for most tests added as part of #89208. Release note: None Co-authored-by: irfan sharif <[email protected]>
I'm just going to backport it, but not to the .0 point. In 22.2 this machinery is all switched off by default anyway. If the next few weeks of baking time surface anything specific to this catchup scan integration, we can quickly fix and backport those too. blathers backport 22.2 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 3414b03 to blathers/backport-release-22.2-89709: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Part of #65957. First commit is from #89656.
Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In #89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in #86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri). Experimentally we observed that during
initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something #89589 can help with. We leave admission
integration of parallel worker goroutines to future work.
Unlike export requests, rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to use at most 100ms of CPU time (it makes
for an ineffective controller otherwise), and to that end we introduce
the following component used within the rangefeed catchup iterator:
Release note: None