-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui: add snapshots dashboard to metrics page #86599
Labels
A-kv-observability
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
Santamaura
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv-observability
labels
Aug 22, 2022
Santamaura
added a commit
to Santamaura/cockroach
that referenced
this issue
Aug 24, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - queue.replicate.purgatory - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
Santamaura
added a commit
to Santamaura/cockroach
that referenced
this issue
Aug 25, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
AlexTalks
pushed a commit
to AlexTalks/cockroach
that referenced
this issue
Aug 26, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
Santamaura
added a commit
to Santamaura/cockroach
that referenced
this issue
Aug 29, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
Santamaura
added a commit
to Santamaura/cockroach
that referenced
this issue
Sep 1, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
Santamaura
added a commit
to Santamaura/cockroach
that referenced
this issue
Sep 1, 2022
This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - range.snapshots.unknown.rcvd-bytes - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves cockroachdb#86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability
craig bot
pushed a commit
that referenced
this issue
Sep 6, 2022
86702: ui: add decommissioning relevant graphs to metrics replication dashboard r=Santamaura a=Santamaura This change adds new graphs to the metrics replication dashboard. New metrics visualized on the dashboard can be used to help triage decommissioning issues. Metrics visualized include: - queue.replicate.addreplica.(success|error) - queue.replicate.removereplica.(success|error) - queue.replicate.replacedeadreplica.(success|error) - queue.replicate.removedeadreplica.(success|error) - queue.replicate.replacedecommissioningreplica.(success|error) - queue.replicate.removedecommissioningreplica.(success|error) - range.snapshots.recv-queue - queue.replicate.purgatory - range.snapshots.rebalancing.rcvd-bytes - range.snapshots.recovery.rcvd-bytes Release justification: low risk, high benefit changes to existing functionality. Resolves #86599 Release note (ui change): introduce new graphs on metrics replication dashboard to improve decommissioning observability 86988: kvserver: lazily translate Spans to LockUpdates instead of pre-alloca… r=shralex a=shralex …ting Previously, we called LocksAsLockUpdates before calling ResolveIntents, which pre-allocated memory for all LockUpdates. In this PR we change the interface of ResolveIntents to avoid this memory allocation and perform the translation of Span to LockUpdate lazily, as we iterate over them in ResolveIntents. Release justification: stability change that may help avoid OOM. Release note: None Resolves: #77219 Jira issue: https://cockroachlabs.atlassian.net/browse/CRDB-13478 Co-authored-by: Santamaura <[email protected]> Co-authored-by: shralex <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-observability
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Is your feature request related to a problem? Please describe.
Part of #85445.
Describe the solution you'd like
As part of the effort to improve decommissioning observability, one thing that could help is adding another dashboard with some useful metrics:
Success/error counts by allocator action
queue.replicate.addreplica.(success|error)
queue.replicate.removereplica.(success|error)
queue.replicate.replacedeadreplica.(success|error)
queue.replicate.removedeadreplica.(success|error)
queue.replicate.replacedecommissioningreplica.(success|error)
queue.replicate.removedecommissioningreplica.(success|error)
Snapshots queued and in-progress
range.snapshots.send-queue
range.snapshots.recv-queue
range.snapshots.send-in-progress
range.snapshots.recv-in-progress
range.snapshots.send-total-in-progress
range.snapshots.recv-total-in-progress
Queue metrics
queue.replicate.process.(success|failure)
queue.replicate.purgatory
queue.replicate.processingnanos
Transferred bytes
Note: Might make sense to visualize these as rates
range.snapshots.unknown.rcvd-bytes
range.snapshots.unknown.sent-bytes
range.snapshots.rebalancing.rcvd-bytes
range.snapshots.rebalancing.sent-bytes
range.snapshots.recovery.rcvd-bytes
range.snapshots.recovery.sent-bytes
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Jira issue: CRDB-18834
Epic CRDB-10792
The text was updated successfully, but these errors were encountered: