-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: pace DistSender rangefeed goroutines #98842
Labels
A-kv-rangefeed
Rangefeed infrastructure, server+client
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
Comments
erikgrinaker
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-kv-replication
labels
Mar 17, 2023
cc @cockroachdb/replication |
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Aug 24, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: * `distsender.rangefeed.post_catchup_ranges`: gauge keeping track of the number of ranges which have completed their catchup scan. * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to `crdb_internal.active_rangefeed` virtual table augment to indicate if the range is currently in catchup scan mode. Fixes cockroachdb#98842 Release note: None
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Aug 24, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: * `distsender.rangefeed.post_catchup_ranges`: gauge keeping track of the number of ranges which have completed their catchup scan. * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to `crdb_internal.active_rangefeed` virtual table augment to indicate if the range is currently in catchup scan mode. Fixes cockroachdb#98842 Release note: None
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Aug 28, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to `crdb_internal.active_rangefeed` virtual table augment to indicate if the range is currently in catchup scan mode. Fixes cockroachdb#98842 Release note (enterprise change): Pace rangefeed goroutine creation rate to improve scheduler latency. Improve observability by adding additional metrics indicating the reason for rangefeed restart as well as additional column in the `crdb_internal.active_rangefeed` table to indicate if the range is currently in catchup scan mode.
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Sep 7, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: distsender.rangefeed.retry.<reason>: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to crdb_internal.active_rangefeed virtual table augment to indicate if the range is currently in catchup scan mode. Fixes cockroachdb#98842 Release note (enterprise change): Pace rangefeed goroutine creation rate to improve scheduler latency. Improve observability by adding additional metrics indicating the reason for rangefeed restart as well as additional column in the crdb_internal.active_rangefeed table to indicate if the range is currently in catchup scan mode.
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Sep 7, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: distsender.rangefeed.retry.<reason>: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to crdb_internal.active_rangefeed virtual table augment to indicate if the range is currently in catchup scan mode. Fixes cockroachdb#98842 Release note (enterprise change): Pace rangefeed goroutine creation rate to improve scheduler latency. Improve observability by adding additional metrics indicating the reason for rangefeed restart as well as additional column in the crdb_internal.active_rangefeed table to indicate if the range is currently in catchup scan mode.
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Sep 8, 2023
Improve rangefeed observability by introducing new counters: * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Fixes cockroachdb#98842 Release note (ops change): Improve rangefeed observability by adding additional metrics indicating the reason for rangefeed restart.
craig bot
pushed a commit
that referenced
this issue
Sep 8, 2023
108824: sql: implement datetime builtins r=rafiss,chrisseto a=annrpom Previously, `make_date`, `make_timestamp`, and `make_timestamptz` built-ins were not implemented. In addition, a new signature for `date_trunc` was not implemented. This was inadequate because it caused an incompatibility issue for tools that needed these datetime functions. To address this, this patch adds said datetime built-ins. Note that behaviors do not align with postgres when negative years are inpits, as we adhere to ISO 8601 standard. Epic: none Fixes: #108448 Release note (sql change): Datetime built-ins (make_date, make_timestamp, and make_timestamptz) are implemented - allowing for the creation of timestamps, timestamps with time zones, and dates. In addition, date_trunc now allows for a timestamp to be truncated in a provided timezone (to a provided precision). 109346: kvcoord: Pace rangefeed client goroutine creation r=miretskiy a=miretskiy Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate. This change results in nice and smooth growth in goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup. This change also improves observability in rangefeed client by introducing new counters: * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Observability also enhanced by adding a column to `crdb_internal.active_rangefeed` virtual table augment to indicate if the range is currently in catchup scan mode. Fixes #98842 Release note (enterprise change): Pace rangefeed goroutine creation rate to improve scheduler latency. Improve observability by adding additional metrics indicating the reason for rangefeed restart as well as additional column in the `crdb_internal.active_rangefeed` table to indicate if the range is currently in catchup scan mode. 109694: ui: all xhr paths from db console are now relative r=santamaura,zachlite,j82w a=dhartunian This commit removes all prefix `/` characters from request paths in the DB Console and Cluster UI codebases. This ensures that if the DB Console is proxied at a subpath the requests continue to work as expected and are relative to the correct base path. Resolves: cockroachdb/helm-charts#228 Resolves: #91429 Epic CRDB-21265 Release note (ops change, ui change): The DB Console now constructs client-side requests using relative URLs instead of absolute ones. This enables proxying of the DB Console at arbitrary subpaths. 109774: sql,keys: disregard checks in *ZoneConfig*Batch r=rafiss a=annrpom Previously, a bug occured where a transactional schema change, `ALTER RANGE default CONFIGURE ZONE...`, statement would produce a CPut failure, even though both statements do take effect. This was due to a guard in the code that blocked us from adding the first zone config to the uncommitted cache, causing the expValues being updated to the KV batch to be the same for both statements. This patch addresses the issue by removing the check and allowing for `default` and psuedotables (like system ranges) to be added to the uncommitted cache. Epic: none Release note (bug fix): two `ALTER RANGE default CONFIGURE ZONE` statements on the same line no longer displays an error. Fixes: #108253 110250: changefeedccl: Fix data race in lagging spans metric r=miretskiy a=miretskiy Fix a race bug in lagging spans metric. Fixes: #110235 Release note: None Co-authored-by: Annie Pompa <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: David Hartunian <[email protected]>
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Jul 21, 2024
Improve rangefeed observability by introducing new counters: * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Fixes cockroachdb#98842 Release note (ops change): Improve rangefeed observability by adding additional metrics indicating the reason for rangefeed restart. # Conflicts: # pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Jul 23, 2024
Improve rangefeed observability by introducing new counters: * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc). Fixes cockroachdb#98842 Release note (ops change): Improve rangefeed observability by adding additional metrics indicating the reason for rangefeed restart. # Conflicts: # pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-rangefeed
Rangefeed infrastructure, server+client
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
When the DistSender processes a RangeFeed request across a key span, it spawns goroutines to process each individual range:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
Lines 206 to 210 in 736a67e
For large spans, this can put a lot of pressure on the Go scheduler. Furthermore, most of these goroutines will immediately block on this semaphore:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
Lines 565 to 566 in 736a67e
We should instead use this semaphore to pace the spawning of goroutines, significantly reducing the scheduler pressure.
Jira issue: CRDB-25544
Epic CRDB-26372
The text was updated successfully, but these errors were encountered: