-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rangefeed: registrations metric is not always drained on processor termination #106126
Comments
cc @cockroachdb/replication |
There's a similar issue with memory budget, but processor releases all budget unconditionally on termination without waiting for registrations to drain. |
110602: roachtest: add c2c/weekly/kv50 roachtest r=adityamaru a=msbutler This patch adds a new weekly c2c roachtest that tests our 23.2 performance requirements under a given cluster configuraiton. The workload: - kv0, 2TB initial scan, 50% writes, 2000 max qps, insert row size uniformly between 1 and 4096 bytes. The cluster specs: - 8 nodes in each src and dest clusters, 8 vcpus, pdSize 1 TB Release note: none Epic: none 110959: rangefeed: fix kv.rangefeed.registrations metric r=erikgrinaker a=aliher1911 Previously when rangefeed processor was stopped by replica, registrations metric was not correctly decreased leading to metric creep on changefeed restarts or range splits. This commit fixes metric decrease for such cases. Epic: none Fixes: #106126 Release note: None Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]>
#110959 did not appear to fully fix this, since restarting rangefeeds on 23.2 (e.g. by flipping |
Rangefeeds maintain a metric
kv.rangefeed.registrations
which shows how many range feeds are active.This metric is a gauge increased when registration is successfully created and must be decreased when registration is removed.
In practice, registrations could be terminated by client (when stream is closed from kv client side) or by server (when replica is removed due to rebalancing or split/merge operations).
In first case registration will terminate its output loop, which will trigger unregistration request to processor and it will perform a cleanup as a part of its work loop. Processor will then wind down itself if that was the last registration.
However, if replica decides to terminate rangefeeds, it will send stop request to processor, which will in turn terminate its registrations. Registrations will update their state and close their output loop, which would trigger unregistration request to processor, but it won't be processed because processor's work loop is already terminated.
Environment:
Additional context
Metrics issue makes investigations problematic.
Jira issue: CRDB-29415
Epic CRDB-39959
The text was updated successfully, but these errors were encountered: