Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: pace DistSender rangefeed goroutines #98842

Closed
erikgrinaker opened this issue Mar 17, 2023 · 1 comment · Fixed by #109346
Closed

kvcoord: pace DistSender rangefeed goroutines #98842

erikgrinaker opened this issue Mar 17, 2023 · 1 comment · Fixed by #109346
Assignees
Labels
A-kv-rangefeed Rangefeed infrastructure, server+client C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Mar 17, 2023

When the DistSender processes a RangeFeed request across a key span, it spawns goroutines to process each individual range:

// Spawn a child goroutine to process this feed.
g.GoCtx(func(ctx context.Context) error {
return ds.partialRangeFeed(ctx, rr, eventProducer, sri.rs, sri.startAfter,
sri.token, withDiff, &catchupSem, rangeCh, eventCh, cfg)
})

For large spans, this can put a lot of pressure on the Go scheduler. Furthermore, most of these goroutines will immediately block on this semaphore:

catchupSem.SetLimit(maxConcurrentCatchupScans(&ds.st.SV))
catchupRes, err := catchupSem.Begin(ctx)

We should instead use this semaphore to pace the spawning of goroutines, significantly reducing the scheduler pressure.

Jira issue: CRDB-25544

Epic CRDB-26372

@erikgrinaker erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv-replication labels Mar 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 17, 2023

cc @cockroachdb/replication

@exalate-issue-sync exalate-issue-sync bot assigned aliher1911 and unassigned pav-kv May 16, 2023
@erikgrinaker erikgrinaker added the A-kv-rangefeed Rangefeed infrastructure, server+client label Jul 18, 2023
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Aug 24, 2023
Acquire catchup scan quota prior to goroutine creation
in order to pace the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine
scheduler, which in turn reduces the impact on SQL
latency during changefeed startup.

This change also improves observability in rangefeed
client by introducing new counters:
 * `distsender.rangefeed.post_catchup_ranges`: gauge keeping track
    of the number of ranges which have completed their catchup scan.
 * `distsender.rangefeed.retry.<reason>`: counter keeping track
    of the number of ranges that ecountered a retryable error of
    particular type (e.g. slow counsumer, range split, etc).

Observability also enhanced by adding a column to
 `crdb_internal.active_rangefeed` virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes cockroachdb#98842

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Aug 24, 2023
Acquire catchup scan quota prior to goroutine creation
in order to pace the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine
scheduler, which in turn reduces the impact on SQL
latency during changefeed startup.

This change also improves observability in rangefeed
client by introducing new counters:
 * `distsender.rangefeed.post_catchup_ranges`: gauge keeping track
    of the number of ranges which have completed their catchup scan.
 * `distsender.rangefeed.retry.<reason>`: counter keeping track
    of the number of ranges that ecountered a retryable error of
    particular type (e.g. slow counsumer, range split, etc).

Observability also enhanced by adding a column to
 `crdb_internal.active_rangefeed` virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes cockroachdb#98842

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Aug 28, 2023
Acquire catchup scan quota prior to goroutine creation
in order to pace the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine
scheduler, which in turn reduces the impact on SQL
latency during changefeed startup.

This change also improves observability in rangefeed
client by introducing new counters:
 * `distsender.rangefeed.retry.<reason>`: counter keeping track
    of the number of ranges that ecountered a retryable error of
    particular type (e.g. slow counsumer, range split, etc).

Observability also enhanced by adding a column to
 `crdb_internal.active_rangefeed` virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes cockroachdb#98842

Release note (enterprise change): Pace rangefeed goroutine creation
rate to improve scheduler latency.  Improve observability by adding
additional metrics indicating the reason for rangefeed restart
as well as additional column in the `crdb_internal.active_rangefeed`
table to indicate if the range is currently in catchup scan mode.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 7, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace
the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine scheduler,
which in turn reduces the impact on SQL latency during changefeed
startup.

This change also improves observability in rangefeed client by
introducing new counters:

distsender.rangefeed.retry.<reason>: counter keeping track of the number
of ranges that ecountered a retryable error of particular type (e.g.
slow counsumer, range split, etc).
Observability also enhanced by adding a column to
crdb_internal.active_rangefeed virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes cockroachdb#98842

Release note (enterprise change): Pace rangefeed goroutine creation
rate to improve scheduler latency. Improve observability by adding
additional metrics indicating the reason for rangefeed restart
as well as additional column in the crdb_internal.active_rangefeed
table to indicate if the range is currently in catchup scan mode.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 7, 2023
Acquire catchup scan quota prior to goroutine creation in order to pace
the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine scheduler,
which in turn reduces the impact on SQL latency during changefeed
startup.

This change also improves observability in rangefeed client by
introducing new counters:

distsender.rangefeed.retry.<reason>: counter keeping track of the number
of ranges that ecountered a retryable error of particular type (e.g.
slow counsumer, range split, etc).
Observability also enhanced by adding a column to
crdb_internal.active_rangefeed virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes cockroachdb#98842

Release note (enterprise change): Pace rangefeed goroutine creation
rate to improve scheduler latency. Improve observability by adding
additional metrics indicating the reason for rangefeed restart
as well as additional column in the crdb_internal.active_rangefeed
table to indicate if the range is currently in catchup scan mode.
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 8, 2023
Improve rangefeed observability by introducing new counters:
 * `distsender.rangefeed.retry.<reason>`: counter keeping track of the
   number of ranges that ecountered a retryable error of particular type
(e.g. slow counsumer, range split, etc).

Fixes cockroachdb#98842

Release note (ops change): Improve rangefeed observability by adding
additional metrics indicating the reason for rangefeed restart.
craig bot pushed a commit that referenced this issue Sep 8, 2023
108824: sql: implement datetime builtins r=rafiss,chrisseto a=annrpom

Previously, `make_date`, `make_timestamp`, and `make_timestamptz`
built-ins were not implemented. In addition, a new signature for
`date_trunc` was not implemented. This was inadequate because it caused
an incompatibility issue for tools that needed these datetime functions.
To address this, this patch adds said datetime built-ins. Note that behaviors
do not align with postgres when negative years are inpits, as we adhere
to ISO 8601 standard.

Epic: none
Fixes: #108448

Release note (sql change): Datetime built-ins (make_date,
make_timestamp, and make_timestamptz) are implemented - allowing for
the creation of timestamps, timestamps with time zones, and dates. In
addition, date_trunc now allows for a timestamp to be truncated in a
provided timezone (to a provided precision).

109346: kvcoord: Pace rangefeed client goroutine creation r=miretskiy a=miretskiy

Acquire catchup scan quota prior to goroutine creation in order to pace the goroutine creation rate.

This change results in nice and smooth growth in
goroutine count, thus reducing the pressure on goroutine scheduler, which in turn reduces the impact on SQL latency during changefeed startup.

This change also improves observability in rangefeed client by introducing new counters:
 * `distsender.rangefeed.retry.<reason>`: counter keeping track of the number of ranges that ecountered a retryable error of particular type (e.g. slow counsumer, range split, etc).

Observability also enhanced by adding a column to
 `crdb_internal.active_rangefeed` virtual table augment to indicate
if the range is currently in catchup scan mode.

Fixes #98842

Release note (enterprise change): Pace rangefeed goroutine creation
rate to improve scheduler latency.  Improve observability by adding
additional metrics indicating the reason for rangefeed restart
as well as additional column in the `crdb_internal.active_rangefeed`
table to indicate if the range is currently in catchup scan mode.

109694: ui: all xhr paths from db console are now relative r=santamaura,zachlite,j82w a=dhartunian

This commit removes all prefix `/` characters from request paths in the DB Console and Cluster UI codebases. This ensures that if the DB Console is proxied at a subpath the requests continue to work as expected and are relative to the correct base path.

Resolves: cockroachdb/helm-charts#228
Resolves: #91429

Epic CRDB-21265

Release note (ops change, ui change): The DB Console now constructs client-side requests using relative URLs instead of absolute ones. This enables proxying of the DB Console at arbitrary subpaths.

109774: sql,keys: disregard checks in *ZoneConfig*Batch r=rafiss a=annrpom

Previously, a bug occured where a transactional schema change,
 `ALTER RANGE default CONFIGURE ZONE...`, statement would
produce a CPut failure, even though both statements do take effect.
This was due to a guard in the code that blocked us from adding the
first zone config to the uncommitted cache, causing the expValues
being updated to the KV batch to be the same for both statements.
This patch addresses the issue by removing the check and allowing
for `default` and psuedotables (like system ranges) to be added to
the uncommitted cache.

Epic: none
Release note (bug fix): two `ALTER RANGE default CONFIGURE ZONE` 
statements on the same line no longer displays an error. 
Fixes: #108253

110250: changefeedccl: Fix data race in lagging spans metric r=miretskiy a=miretskiy

Fix a race bug in lagging spans metric.

Fixes: #110235

Release note: None

Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: David Hartunian <[email protected]>
@craig craig bot closed this as completed in 8d321d4 Sep 8, 2023
wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this issue Jul 21, 2024
Improve rangefeed observability by introducing new counters:
 * `distsender.rangefeed.retry.<reason>`: counter keeping track of the
   number of ranges that ecountered a retryable error of particular type
(e.g. slow counsumer, range split, etc).

Fixes cockroachdb#98842

Release note (ops change): Improve rangefeed observability by adding
additional metrics indicating the reason for rangefeed restart.

# Conflicts:
#	pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this issue Jul 23, 2024
Improve rangefeed observability by introducing new counters:
 * `distsender.rangefeed.retry.<reason>`: counter keeping track of the
   number of ranges that ecountered a retryable error of particular type
(e.g. slow counsumer, range split, etc).

Fixes cockroachdb#98842

Release note (ops change): Improve rangefeed observability by adding
additional metrics indicating the reason for rangefeed restart.

# Conflicts:
#	pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-rangefeed Rangefeed infrastructure, server+client C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants