kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

sumeerbhola · 2022-09-26T16:27:02Z

scanRequestScanner.exportScan does txn := p.db.NewTxn(ctx, "changefeed backfill") which creates a txn with (roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri). OTHER bypasses admission control. It should use a variant of NewTxnRootKV to create a txn with (roachpb.AdmissionHeader_ROOT_KV, admissionpb.BulkNormalPrii).

This will likely not fix the performance isolation problems with backfills, as discussed in (a) https://cockroachlabs.slack.com/archives/C9TGBJB44/p1663864200211769?thread_ts=1663855656.539239&cid=C9TGBJB44, (b) the 16MB value of changefeed.backfill.scan_request_size is likely to be too large and fixing will require integrating low priority ScanRequests with the elastic cpu behavior. But it may help a bit, and this change is tiny enough to backport.

@cockroachdb/admission-control

Jira issue: CRDB-19940

The text was updated successfully, but these errors were encountered:

irfansharif · 2022-10-18T16:52:32Z

There's a partial fix in #90093. Integration of rangefeed catch up scans will be done through #89709. For the initial scans I did try integrating the ScanRequests with the elastic CPU limiter but the benefits were negligible. What will be important to do is #90089. Finally, we have a roachtest now that can serve as a testbed for performance the aforementioned issues/PRs and for future work (#89656), so closing this issue.

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Sep 26, 2022

irfansharif self-assigned this Oct 18, 2022

irfansharif closed this as completed Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

sumeerbhola commented Sep 26, 2022 •

edited by cockroach-jira-scripts

Loading

irfansharif commented Oct 18, 2022

kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

Comments

sumeerbhola commented Sep 26, 2022 • edited by cockroach-jira-scripts Loading

irfansharif commented Oct 18, 2022

sumeerbhola commented Sep 26, 2022 •

edited by cockroach-jira-scripts

Loading