Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv,rangefeed: integrate catchup scans with elastic cpu #89709

Merged
merged 4 commits into from
Nov 4, 2022

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Oct 11, 2022

Part of #65957. First commit is from #89656.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In #89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in #86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri). Experimentally we observed that during
initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something #89589 can help with. We leave admission
integration of parallel worker goroutines to future work.

Unlike export requests, rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to use at most 100ms of CPU time (it makes
for an ineffective controller otherwise), and to that end we introduce
the following component used within the rangefeed catchup iterator:

// Pacer is used in tight loops (CPU-bound) for non-premptible
// elastic work. Callers are expected to invoke Pace() every loop
// iteration and Close() once done. Internally this type integrates
// with elastic CPU work queue, acquiring tokens for the CPU work
// being done, and blocking if tokens are unavailable. This allows
// for a form of cooperative scheduling with elastic CPU granters.
type Pacer struct
func (p *Pacer) Pace(ctx context.Context) error { ... }
func (p *Pacer) Close() { ... }

Release note: None

@irfansharif irfansharif requested a review from a team October 11, 2022 00:46
@irfansharif irfansharif requested review from a team as code owners October 11, 2022 00:46
@irfansharif irfansharif requested review from renatolabs and miretskiy and removed request for a team and renatolabs October 11, 2022 00:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

irfansharif commented Oct 11, 2022

Before (lack of throughput isolation, very poor latency isolation -- p99 latency hits 10+s):

image

After (throughput isolation, ok latency isolation -- p99 latency is ~60ms):

image

@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 33a2630 to 1de437b Compare October 11, 2022 03:33
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very promising!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


pkg/kv/kvserver/rangefeed/catchup_scan.go line 334 at r4 (raw file):

		}

		if err := i.pacer.Pace(ctx); err != nil {

This should be lifted to the start of the loop, to account for the continue statements. In particular, the !withDiff fast path.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(flushing some comments -- haven't finished reading)

just curious: Is the total number of catchup scans the same in the before and after runs? The time duration for the changefeeds look similar despite the lower cpu being given to the them in the after case.

Reviewed 16 of 16 files at r2, 2 of 2 files at r3, 6 of 16 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


-- commits line 20 at r2:
how will this affect someone who starts using this in v22.2 and then upgrade to v23.1?


-- commits line 46 at r4:
Add a reference to #88733.
Also, this is something we should consider backporting to v22.2 so worth pulling out into a separate PR/commit.


pkg/ccl/changefeedccl/kvfeed/scanner.go line 128 at r4 (raw file):

	knobs TestingKnobs,
) error {
	txn := p.db.NewTxn(ctx, "changefeed backfill")

See #88733. That was suggesting overriding the priority at txn creation time. What this code is doing is overriding for each batch. The latter may be nicer to others in that the CreateTime is advanced, which means later batches will not out-compete others because their txn started earlier.
This can use a code comment.


pkg/ccl/changefeedccl/kvfeed/scanner.go line 137 at r4 (raw file):

	stopwatchStart := timeutil.Now()
	var scanDuration, bufferDuration time.Duration
	targetBytesPerScan := changefeedbase.ScanRequestSize.Get(&p.settings.SV)

changefeed.backfill.scan_request_size of 16MB is too large. Can you add a TODO for cdc folks to reduce the default?


pkg/clusterversion/cockroach_versions.go line 314 at r4 (raw file):

	// something that was not true in older versions.
	//
	// TODO(irfansharif): Depends on v22.2 being minted first.

why do we need a cluster version for this change? This doesn't seem to need cluster-wide coordination.

@@ -108,7 +108,7 @@ var FrontierCheckpointMaxBytes = settings.RegisterByteSizeSetting(
settings.TenantWritable,
"changefeed.frontier_checkpoint_max_bytes",
"controls the maximum size of the checkpoint as a total size of key bytes",
1<<20,
1<<20, // 1 MiB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit:maybe exclude this file from this pr
(but if you feel strongly, keep it)

@@ -140,6 +141,12 @@ func (p *scanRequestScanner) exportSpan(
r := roachpb.NewScan(remaining.Key, remaining.EndKey, false /* forUpdate */).(*roachpb.ScanRequest)
r.ScanFormat = roachpb.BATCH_RESPONSE
b.Header.TargetBytes = targetBytesPerScan
b.AdmissionHeader = roachpb.AdmissionHeader{
Priority: int32(admissionpb.BulkNormalPri),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about adding an option to kvfeed library to control this priority? Do we ever envision if e.g. some system table scanners may want to run at an elevated priority?

Name: "admission-control/elastic-cdc",
Owner: registry.OwnerAdmissionControl,
// TODO(irfansharif): After two weeks of nightly baking time, reduce
// this to a weekly cadence. This is a long-running test and serves only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own info: where can we access the results/perf numbers for this test?

@@ -29,6 +29,7 @@ func registerAdmission(r registry.Registry) {
// over some latency threshold. Will be Useful to track over time.

registerElasticControlForBackups(r)
registerElasticControlForCDC(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we use ForRangefeed as a name (and as a file name of the controller)? It's not specific just to CDC -- c2c will benefit for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// We've observed latency spikes during backfills because of its CPU/scan-heavy
// nature -- it can elevate CPU scheduling latencies which in turn translates to
// an increase in foreground latency.
func registerElasticControlForCDC(r registry.Registry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider renaming ForRangefeed

stmtWithCursor := fmt.Sprintf(`
CREATE CHANGEFEED FOR tpcc.order_line, tpcc.stock, tpcc.customer
INTO 'null://' WITH cursor = '-%ds'
`, int64(float64(i+1)*padDuration.Seconds()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment re cursor -- that part is important to trigger catchup scans.

}

t.Status(fmt.Sprintf("setting performance baseline (<%s)", padDuration))
time.Sleep(padDuration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sleep is important -- small comment por favore?

AdmissionHeader: roachpb.AdmissionHeader{
// NB: AdmissionHeader is used only at the start of the range feed
// stream since the initial catch-up scan is expensive.
Priority: int32(admissionpb.BulkNormalPri),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on rangefeed library -- consider making it an option
to support system rangefeeds running with higher priority if needed.

@@ -319,6 +330,10 @@ func (i *CatchUpIterator) CatchUpScan(outputFn outputEventFn, withDiff bool) err
i.Next()
}
}

if err := i.pacer.Pace(ctx); err != nil {
return errors.Wrap(err, "automatic pacing: %v")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this error handled and propagated through rangefeed stack?
Is this error ever bubbled up to the client (dist_sender_rangefeed)? If so, it needs to be handled - presumably in a retryable fashion.

@irfansharif
Copy link
Contributor Author

Experiment data: rangefeed-catchup-scan.zip. Follow instructions here to visualize things locally: #89656 (comment). +cc @jayshrivastava.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 16 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


pkg/kv/kvserver/store.go line 3106 at r4 (raw file):

	tenID, _ := repl.TenantID()
	if pacer := s.cfg.KVAdmissionController.AdmitRangefeedRequest(tenID, args); pacer != nil {
		ctx = kvadmission.ContextWithPacer(ctx, pacer)

nit: what is the motivation for this context dance instead of passing the pacer directly as a parameter? Unlike the normal BatchRequest execution path there is only 1 caller along this code path.

@irfansharif
Copy link
Contributor Author

irfansharif commented Oct 17, 2022

I'm going to pull out the version-gate related things and backport it to 22.2 since it's not out yet, to give ourselves the flexibility to use this integration in 22.2 clusters under supervision. Similar to what we're planning to do for backups. I'm still not planning to backport the actual integration itself, not until it's had a few more weeks to bake, but including the right AdmissionHeaders in older versions seems fine.

Edit: see #90093.

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 17, 2022
This is an opportunistic change and something to backport to v22.2. In
v22.2 we introduced a disabled-by-default elastic CPU limiter (cockroachdb#86638) to
dynamically grant CPU time for background work like backups -- something
hope to enable-by-default in CC and under observation for select 22.2
clusters. Recently we found another use for this limiter -- rangefeed
catchup scans (cockroachdb#89709). It's unclear yet whether we want that
integration to make it back to v22.2 but this commit leaves that option
open by populating the right AC headers we'd need for the integration.
Populating these AC headers are safe -- the Rangefeed RPC is not hooked
into AC yet, so these headers are not looked at. For the initial scan
requests we're only setting a lower priority bit, something 22.1 nodes
already know to handle (they do so for all batch requests).

Release note: None
craig bot pushed a commit that referenced this pull request Oct 17, 2022
90017: sql: remove unnecessary rowsorts from distsql_stats logic tests r=mgartner a=michae2

**sql: remove unnecessary rowsorts from distsql_stats logic tests**

In distsql_stats and distsql_automatic_stats logic tests we frequently
use queries like the following:

```sql
SELECT ...
FROM [SHOW STATISTICS FOR TABLE foo]
ORDER BY column_names, created DESC
```

These queries have guaranteed order, and do not need the rowsort option.

Epic: CRDB-20535

Release note: None

**sql: bump testlogic --rewrite delay for queries with retry to 2 sec**

This prevents some spurious rewrites in the distsql_automatic_stats
logic test.

Epic: CRDB-20535

Release note: None

90070: ui: add apply recommendations to table details r=maryliag a=maryliag

On Table details page, we have a table with index
recommendations, which right now is focused on drop unused indexes. This commit adds the button the apply the recommendations, when there is one.

Fixes #86949
<img width="1235" alt="Screen Shot 2022-10-17 at 12 49 05 PM" src="https://user-images.githubusercontent.com/1017486/196237146-72814f16-05c4-4bba-a716-f6ba875c7b65.png">

<img width="1281" alt="Screen Shot 2022-10-17 at 12 49 11 PM" src="https://user-images.githubusercontent.com/1017486/196237156-777470de-3fd8-41ea-951f-d053bf6337a1.png">



Release note (ui change): Add apply button on Table Details page (db console only), when there is a recommendation to drop an unused index.

90093: changefeed,kvcoord: populate AC headers for backfill work r=irfansharif a=irfansharif

This is an opportunistic change and something to backport to v22.2. In v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to dynamically grant CPU time for background work like backups -- something hope to enable-by-default in CC and under observation for select 22.2 clusters. Recently we found another use for this limiter -- rangefeed catchup scans (#89709). It's unclear yet whether we want that integration to make it back to v22.2 but this commit leaves that option open by populating the right AC headers we'd need for the integration. Populating these AC headers is safe -- the Rangefeed RPC is not hooked into AC yet, so these headers are not looked at. For the initial scan requests we're only setting a lower priority bit, something 22.1 nodes already know to handle (they do so for all batch requests).

Release note: None
Release justification: Low-risk change that opens the door to a future backport in a non-.0 release to address cases where rangefeed catchup scans affect foreground latencies.

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Oct 17, 2022
This is an opportunistic change and something to backport to v22.2. In
v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to
dynamically grant CPU time for background work like backups -- something
hope to enable-by-default in CC and under observation for select 22.2
clusters. Recently we found another use for this limiter -- rangefeed
catchup scans (#89709). It's unclear yet whether we want that
integration to make it back to v22.2 but this commit leaves that option
open by populating the right AC headers we'd need for the integration.
Populating these AC headers are safe -- the Rangefeed RPC is not hooked
into AC yet, so these headers are not looked at. For the initial scan
requests we're only setting a lower priority bit, something 22.1 nodes
already know to handle (they do so for all batch requests).

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Oct 17, 2022
This is an opportunistic change and something to backport to v22.2. In
v22.2 we introduced a disabled-by-default elastic CPU limiter (#86638) to
dynamically grant CPU time for background work like backups -- something
hope to enable-by-default in CC and under observation for select 22.2
clusters. Recently we found another use for this limiter -- rangefeed
catchup scans (#89709). It's unclear yet whether we want that
integration to make it back to v22.2 but this commit leaves that option
open by populating the right AC headers we'd need for the integration.
Populating these AC headers are safe -- the Rangefeed RPC is not hooked
into AC yet, so these headers are not looked at. For the initial scan
requests we're only setting a lower priority bit, something 22.1 nodes
already know to handle (they do so for all batch requests).

Release note: None
@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 83f7d67 to 9f11345 Compare October 26, 2022 18:58
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 29 of 33 files at r5, 5 of 7 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This is a good suggestion. I did something slightly different -- instead of treating all rangefeed catchup scans homogeneously, now system-internal rangefeeds (we use this all over the place in span configs, settings watchers, etc.) use into a higher priority; PTAL at uses + plumbing for rangefeed.WithSystemTablePriority. As for using a configurable pri at this level -- we might want to if we introduce notions of "higher priority" changefeeds, perhaps over more important data or for more important apps. I'll leave that as a TODO.

Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?


pkg/clusterversion/cockroach_versions.go line 304 at r8 (raw file):

	// supported in cloud storage and KMS.
	SupportAssumeRoleAuth

nit: stray diff here due to the empty line being removed


pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

					"admission.admitted.sql-sql-response.locking",
					"admission.admitted.sql-sql-response.normal",
					"admission.errored.elastic-cpu.normal",

I must have missed something here. Why is normal priority getting classified as elastic traffic?

@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 9f11345 to f94d821 Compare October 28, 2022 14:24
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeerbhola, you might've hit the wrong reviewable button. bors will no longer let me merge unless the green stamp goes through -- it doesn't recognize the LGTM sticker.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)


pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, sumeerbhola wrote…

Slightly confused here. The catchup scan for the system-internal rangefeeds is using a higher priority, but not the initial scan for the corresponding changefeeds. Is there something about these system-internal ones that makes the latter not very important?

We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:

// runInitialScan will attempt to perform an initial data scan.
// It will retry in the face of errors and will only return upon
// success, context cancellation, or an error handling function which indicates
// that an error is unrecoverable. The return value will be true if the context
// was canceled or if the OnInitialScanError function indicated that the
// RangeFeed should stop.
func (f *RangeFeed) runInitialScan(

But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?

return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
if err := txn.SetFixedTimestamp(ctx, asOf); err != nil {
return err
}
sp := span
var b kv.Batch
for {
b.Header.TargetBytes = targetScanBytes
b.Scan(sp.Key, sp.EndKey)
if err := txn.Run(ctx, &b); err != nil {
return err
}


pkg/clusterversion/cockroach_versions.go line 304 at r8 (raw file):

Previously, sumeerbhola wrote…

nit: stray diff here due to the empty line being removed

Was on purpose, none of the named versions above have line breaks between them.


pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, sumeerbhola wrote…

I must have missed something here. Why is normal priority getting classified as elastic traffic?

That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 16 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We don't use changefeeds for system tables within CRDB, we use rangefeeds directly. Which is why only the catch up scan priority was bumped. Some of these system tables also use initial scans, using the following:

// runInitialScan will attempt to perform an initial data scan.
// It will retry in the face of errors and will only return upon
// success, context cancellation, or an error handling function which indicates
// that an error is unrecoverable. The return value will be true if the context
// was canceled or if the OnInitialScanError function indicated that the
// RangeFeed should stop.
func (f *RangeFeed) runInitialScan(

But they're just scan requests that don't use BulkNormalPri. The code for the scans above bottom out here, where it doesn't look like we specify an AC header. I've been assuming that means it gets treated as NormalPri/doesn't bypass AC -- is that true?

return dbc.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
if err := txn.SetFixedTimestamp(ctx, asOf); err != nil {
return err
}
sp := span
var b kv.Batch
for {
b.Header.TargetBytes = targetScanBytes
b.Scan(sp.Key, sp.EndKey)
if err := txn.Run(ctx, &b); err != nil {
return err
}

That dbc.db.Txn() method is doing

	return db.TxnWithAdmissionControl(
		ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)

so it will bypass admission control because of OTHER.


pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, irfansharif (irfan sharif) wrote…

That's not what's happening, what's happening is that we can now have normal pri requests wait in the elastic-cpu wait queue. So there are two priority classes now, bulk-normal and normal. The normal pri elastic-cpu work is what originates from system table rangefeed catchup scans.

Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL. Want to merge this soon-ish after recent discussions with @jayshrivastava who wants to make use of this Pacer library up in changefeed code where we decode rows 1-by-1 one.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jayshrivastava, @miretskiy, and @sumeerbhola)


pkg/ccl/changefeedccl/kvfeed/scanner.go line 145 at r4 (raw file):

Previously, sumeerbhola wrote…

That dbc.db.Txn() method is doing

	return db.TxnWithAdmissionControl(
		ctx, roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri, retryable)

so it will bypass admission control because of OTHER.

Brilliant, shoudn't have just assumed (frequent occurrence like you've seen). Made these rangefeed catch up scans opt into AC, I think they should, and for the ones declared over system tables specify NormalPri (for everything else use BulkNormalPri). Amended the last commit -- PTAL.


pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, sumeerbhola wrote…

Oh yes, it is because we are explicitly integrating with the elastic-cpu in AdmitRangefeedRequest by using elasticCPUWorkQueue.
This may be ok-ish to merge, but I should have picked up on this oddity earlier (or maybe I didn't because until the latest revision all rangefeeds were using bulk priority? I have forgotten part of the PR history). One could argue that normal-pri rangefeed requests should not be using this elastic work queue and instead they should be using a "pacer" that looks the same from the outside but tells them to stop after they have consumed 100ms, and then returns the cpu slot (since normal and higher priorities use the slot mechanism) and queues again for a cpu slot. That is, the part of the pacing we still want to use is to submit to admission again after doing 100ms of work.
Thoughts?

This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.

@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from f94d821 to f960408 Compare November 2, 2022 18:16
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 16 files at r9, 13 of 17 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @jayshrivastava, and @miretskiy)


pkg/kv/db.go line 895 at r10 (raw file):

// the other variant (TxnWithAdmissionControl) the default, or maybe rename this
// to be more explicit (TxnWithoutAdmissionControl) so new callers have to be
// conscious about what they want.

+1 to TxnWithoutAdmissionControl


pkg/ts/catalog/chart_catalog.go line 3551 at r8 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This current form I think is fine for now, especially considering in light of thread above that these catch up scans (both for user-defined rangefeeds and system-defined ones) were previously bypassing AC altogether which I think was a gap. I like the idea of introduce a Pacer that instead makes use of the slot mechanism for the reasons you mention, but this case here specifically is not compelling I think. I could be wrong. I want to re-visit if this becomes a problem. My reasoning is mostly to do with experimental intuition that we might not be decreasing slot count well enough given it's built atop the runnable-g's-per-p measure, which can be insensitive relative to more direct forms of measured scheduling latency.

I agree that it is likely that the total slots will stay too large, and then this control will become ineffective.

@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from f960408 to 0cf54c1 Compare November 3, 2022 22:10
Pure code movement/renaming. We also renamed a cluster setting
kv.store.admission.provisioned_bandwidth to
kvadmission.store.provisioned_bandwidth.

Release note
There's only ever one URL.

Release note: None
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
.. catch-up scans, introduce a private cluster setting
(kvadmission.rangefeed_catchup_scan_elastic_control.enabled) to
selectively switch off catch-up scan integration if needed, and plumb
kvadmission.Pacer in explicitly to rangefeed catchup scan loop instead
of opaquely through the surrounding context.

Release note: None
@irfansharif irfansharif force-pushed the 221010.elastic-cpu-rangefeed branch from 0cf54c1 to a042b60 Compare November 3, 2022 22:25
@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 4, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 4, 2022

Build succeeded:

@craig craig bot merged commit 2238d70 into cockroachdb:master Nov 4, 2022
craig bot pushed a commit that referenced this pull request Nov 4, 2022
90579: roachtest: reduce frequency of benchmark-only AC tests r=irfansharif a=irfansharif

First four commits are from #89709 and should be ignored here. These tests will only serve as coarse-grained benchmarks for things AC cares about -- we don't need to run them nightly. They've spent ~2 weeks through our nightly CI suite without flaking so reduce frequency to a weekly cadence. We'll do this same thing for most tests added as part of #89208.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@irfansharif irfansharif deleted the 221010.elastic-cpu-rangefeed branch November 4, 2022 20:58
@irfansharif
Copy link
Contributor Author

I'm just going to backport it, but not to the .0 point. In 22.2 this machinery is all switched off by default anyway. If the next few weeks of baking time surface anything specific to this catchup scan integration, we can quickly fix and backport those too.

blathers backport 22.2

@blathers-crl
Copy link

blathers-crl bot commented Nov 4, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 3414b03 to blathers/backport-release-22.2-89709: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants