rangefeed: registrations metric is not always drained on processor termination #106126

aliher1911 · 2023-07-04T18:44:37Z

Rangefeeds maintain a metric kv.rangefeed.registrations which shows how many range feeds are active.
This metric is a gauge increased when registration is successfully created and must be decreased when registration is removed.

In practice, registrations could be terminated by client (when stream is closed from kv client side) or by server (when replica is removed due to rebalancing or split/merge operations).
In first case registration will terminate its output loop, which will trigger unregistration request to processor and it will perform a cleanup as a part of its work loop. Processor will then wind down itself if that was the last registration.
However, if replica decides to terminate rangefeeds, it will send stop request to processor, which will in turn terminate its registrations. Registrations will update their state and close their output loop, which would trigger unregistration request to processor, but it won't be processed because processor's work loop is already terminated.

Environment:

CockroachDB version 23.1 but likely earlier versions as well.

Additional context
Metrics issue makes investigations problematic.

Jira issue: CRDB-29415

Epic CRDB-39959

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-07-04T18:44:40Z

cc @cockroachdb/replication

aliher1911 · 2023-07-04T18:59:36Z

There's a similar issue with memory budget, but processor releases all budget unconditionally on termination without waiting for registrations to drain.

110602: roachtest: add c2c/weekly/kv50 roachtest r=adityamaru a=msbutler This patch adds a new weekly c2c roachtest that tests our 23.2 performance requirements under a given cluster configuraiton. The workload: - kv0, 2TB initial scan, 50% writes, 2000 max qps, insert row size uniformly between 1 and 4096 bytes. The cluster specs: - 8 nodes in each src and dest clusters, 8 vcpus, pdSize 1 TB Release note: none Epic: none 110959: rangefeed: fix kv.rangefeed.registrations metric r=erikgrinaker a=aliher1911 Previously when rangefeed processor was stopped by replica, registrations metric was not correctly decreased leading to metric creep on changefeed restarts or range splits. This commit fixes metric decrease for such cases. Epic: none Fixes: #106126 Release note: None Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]>

erikgrinaker · 2023-10-24T09:57:42Z

#110959 did not appear to fully fix this, since restarting rangefeeds on 23.2 (e.g. by flipping kv.rangefeed.scheduler.enabled) leaks kv.rangefeed.registrations (shows 125, while there are only 12 outputLoop goroutines running).

aliher1911 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-replication Relating to Raft, consensus, and coordination. labels Jul 4, 2023

blathers-crl bot added the T-kv-replication label Jul 4, 2023

exalate-issue-sync bot assigned aliher1911 Jul 5, 2023

aliher1911 mentioned this issue Sep 20, 2023

rangefeed: fix kv.rangefeed.registrations metric #110959

Merged

craig bot closed this as completed in cdafd14 Sep 22, 2023

erikgrinaker reopened this Oct 24, 2023

erikgrinaker assigned erikgrinaker and unassigned aliher1911 Oct 24, 2023

exalate-issue-sync bot unassigned erikgrinaker Jan 2, 2024

exalate-issue-sync bot added T-kv KV Team and removed T-kv-replication labels Jun 28, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Incoming in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rangefeed: registrations metric is not always drained on processor termination #106126

rangefeed: registrations metric is not always drained on processor termination #106126

aliher1911 commented Jul 4, 2023 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jul 4, 2023

aliher1911 commented Jul 4, 2023

erikgrinaker commented Oct 24, 2023

rangefeed: registrations metric is not always drained on processor termination #106126

rangefeed: registrations metric is not always drained on processor termination #106126

Comments

aliher1911 commented Jul 4, 2023 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jul 4, 2023

aliher1911 commented Jul 4, 2023

erikgrinaker commented Oct 24, 2023

aliher1911 commented Jul 4, 2023 •

edited by exalate-issue-sync bot

Loading