Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: db console commit latency graph is incorrect #119246

Closed
jayshrivastava opened this issue Feb 15, 2024 · 1 comment · Fixed by #120787
Closed

changefeedccl: db console commit latency graph is incorrect #119246

jayshrivastava opened this issue Feb 15, 2024 · 1 comment · Fixed by #120787
Assignees
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc

Comments

@jayshrivastava
Copy link
Contributor

jayshrivastava commented Feb 15, 2024

In a multi node cluster, it looks like the commit latency graph aggregates by SUM instead of MAX

See https://cockroachlabs.slack.com/archives/C9TGBJB44/p1707994822096629

This is what it should look like
image

This is what shows up in the changefeeds page
image

It seems that there is no aggregator supplied to the graph in the db console, so it may be aggregating by SUM by default when it should be aggregating by max.

Jira issue: CRDB-36112

@jayshrivastava jayshrivastava added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture T-cdc labels Feb 15, 2024
Copy link

blathers-crl bot commented Feb 15, 2024

cc @cockroachdb/cdc

rharding6373 added a commit to rharding6373/cockroach that referenced this issue Mar 21, 2024
Previously, the commit latency in the changefeed dashboard in the
db console would be aggregated by sum across all nodes. This was
confusing for users who might see unexpectedly high commit latency.

In this change, we use max aggregation for the commit latency so
that users see the max commit latency from all the nodes instead
of the sum. This provides more useful observability into changefeed
behavior.

Fixes: cockroachdb#119246
Fixes: cockroachdb#112947
Epic: None

Release note (ui change): The "Commit Latency" chart in the changefeed
dashboard now aggregates by max instead of by sum for multi-node
changefeeds. This more accurately reflects the amount of time for
events to be acknowledged by the downstream sink.
craig bot pushed a commit that referenced this issue Mar 27, 2024
119933: kv: add ability to verify pipelined replicated shared/exclusive locks  r=nvanbenschoten a=arulajmani

Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs #117978

Release note: None

120787: ui: use max aggregator for commit latency on changefeed dashboard r=rharding6373 a=rharding6373

Previously, the commit latency in the changefeed dashboard in the db console would be aggregated by sum across all nodes. This was confusing for users who might see unexpectedly high commit latency.

In this change, we use max aggregation for the commit latency so that users see the max commit latency from all the nodes instead of the sum. This provides more useful observability into changefeed behavior.

Fixes: #119246
Fixes: #112947
Epic: None

Release note (ui change): The "Commit Latency" chart in the changefeed dashboard now aggregates by max instead of by sum for multi-node changefeeds. This more accurately reflects the amount of time for events to be acknowledged by the downstream sink.

121217: backupccl: fix data race with admission pacer r=msbutler a=aadityasondhi

We now use one pacer per fileSSTSink.

Fixes #121199.
Fixes #121202.
Fixes #121201.
Fixes #121200.
Fixes #121198.
Fixes #121197.
Fixes #121196.
Fixes #121195.
Fixes #121194.
Fixes #121193.
Fixes #121192.
Fixes #121191.
Fixes #121190.
Fixes #121189.
Fixes #121188.
Fixes #121187.

Release note: None

121222: optbuilder: fix recently introduced nil pointer in error case r=yuzefovich a=yuzefovich

This commit fixes a recently introduced nil pointer internal error when attempting to CALL not a procedure that is specified not by its name. `ResolvableFunctionReference` might not have `ReferenceByName`, so this commit switches to using `FunctionReference` that is always set.

Fixes: #121095.

Release note: None

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in 072fd4e Mar 27, 2024
blathers-crl bot pushed a commit that referenced this issue Mar 27, 2024
Previously, the commit latency in the changefeed dashboard in the
db console would be aggregated by sum across all nodes. This was
confusing for users who might see unexpectedly high commit latency.

In this change, we use max aggregation for the commit latency so
that users see the max commit latency from all the nodes instead
of the sum. This provides more useful observability into changefeed
behavior.

Fixes: #119246
Fixes: #112947
Epic: None

Release note (ui change): The "Commit Latency" chart in the changefeed
dashboard now aggregates by max instead of by sum for multi-node
changefeeds. This more accurately reflects the amount of time for
events to be acknowledged by the downstream sink.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants