changefeedccl: Parallel workers default #89589

miretskiy · 2022-10-07T21:48:47Z

Current default is set to 8. This is bad when running
on small machines; Perhaps the default should be set to
something like max(1, NumCPU()>>3)

Jira issue: CRDB-20321

Epic CRDB-24463

The text was updated successfully, but these errors were encountered:

blathers-crl · 2022-10-07T21:48:50Z

cc @cockroachdb/cdc

blathers-crl · 2022-10-07T21:48:50Z

Hi @miretskiy, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

amruss · 2022-10-12T15:41:58Z

Removing GA blocker as NPROX is not backported

Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None

89659: changefeedccl: use numcpu >> 2 workers for event consumers r=jayshrivastava a=jayshrivastava Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes #89589 Epic: none Release note: None Co-authored-by: Jayant Shrivastava <[email protected]>

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

89709: kv,rangefeed: integrate catchup scans with elastic cpu r=irfansharif a=irfansharif Part of #65957. First commit is from #89656. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In #89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in #86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something #89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests, rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to use at most 100ms of CPU time (it makes for an ineffective controller otherwise), and to that end we introduce the following component used within the rangefeed catchup iterator: // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None Co-authored-by: irfan sharif <[email protected]>

Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None

miretskiy added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-cdc Change Data Capture T-cdc GA-blocker labels Oct 7, 2022

miretskiy assigned jayshrivastava Oct 7, 2022

jayshrivastava mentioned this issue Oct 10, 2022

changefeedccl: use numcpu >> 2 workers for event consumers #89659

Merged

irfansharif mentioned this issue Oct 11, 2022

kv,rangefeed: integrate catchup scans with elastic cpu #89709

Merged

amruss removed the GA-blocker label Oct 12, 2022

craig bot closed this as completed in ab840c7 Oct 14, 2022

irfansharif mentioned this issue Nov 24, 2022

release-22.2: kv,rangefeed: integrate catchup scans with elastic cpu #92439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: Parallel workers default #89589

changefeedccl: Parallel workers default #89589

miretskiy commented Oct 7, 2022 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Oct 7, 2022

blathers-crl bot commented Oct 7, 2022

amruss commented Oct 12, 2022

changefeedccl: Parallel workers default #89589

changefeedccl: Parallel workers default #89589

Comments

miretskiy commented Oct 7, 2022 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Oct 7, 2022

blathers-crl bot commented Oct 7, 2022

amruss commented Oct 12, 2022

miretskiy commented Oct 7, 2022 •

edited by exalate-issue-sync bot

Loading