-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: Parallel workers default #89589
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
Comments
miretskiy
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-cdc
Change Data Capture
T-cdc
GA-blocker
labels
Oct 7, 2022
cc @cockroachdb/cdc |
Hi @miretskiy, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Oct 10, 2022
Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Oct 11, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Removing GA blocker as NPROX is not backported |
jayshrivastava
added a commit
to jayshrivastava/cockroach
that referenced
this issue
Oct 12, 2022
Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Oct 14, 2022
89659: changefeedccl: use numcpu >> 2 workers for event consumers r=jayshrivastava a=jayshrivastava Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes #89589 Epic: none Release note: None Co-authored-by: Jayant Shrivastava <[email protected]>
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Oct 24, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Oct 28, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Oct 31, 2022
Previously, a default value of 8 was used for the kvevent parallel consumer. The reason for this was that we observed performance improvements in a 15 node 32 VCPU cluster when we increased this parameter to 8. After 8, the improvements were much smaller. The issue with a default of 8 is that that on smaller machines, 8 workers can be too much overhead, especially since the work is CPU intensive. This change updates the default to be runtime.NumCPU() >> 2 workers, which aligns with using 8 workers on 32 VCPU machines. Fixes cockroachdb#89589 Epic: none Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Nov 2, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Nov 3, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
craig bot
pushed a commit
that referenced
this issue
Nov 4, 2022
89709: kv,rangefeed: integrate catchup scans with elastic cpu r=irfansharif a=irfansharif Part of #65957. First commit is from #89656. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In #89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in #86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something #89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests, rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to use at most 100ms of CPU time (it makes for an ineffective controller otherwise), and to that end we introduce the following component used within the rangefeed catchup iterator: // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None Co-authored-by: irfan sharif <[email protected]>
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Nov 24, 2022
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cdc
Change Data Capture
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-cdc
Current default is set to 8. This is bad when running
on small machines; Perhaps the default should be set to
something like
max(1, NumCPU()>>3)
Jira issue: CRDB-20321
Epic CRDB-24463
The text was updated successfully, but these errors were encountered: