-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Improved metrics #8645
Comments
We already have metrics for normal and preemptive snapshots applied as well as a metric for the number of snapshots generated. |
Peter and I were just talking about recording the number of ticks per second (should be a constant, but will tell us when we're skipping ticks because |
We do have metrics that are being tracked for normal and pre-emptive snapshots, but they currently aren't graphed or logged (as far as I can tell). I think they should be exposed in the "advanced internals" section of the node graphs. Ticks is a good idea as well. |
FYI: all metrics should be in raw count of things, not rates. pre-emptive snapshots are on grafana as of earlier this morning: http://monitoring.gce.cockroachdb.com:3000/dashboard/db/cockroachinternals |
number of ticks per second or the inverse, seconds per tick, would be my vote for top priority metric It's good to have snapshot metrics in the admin UI for transient test clusters. number of elections (or term changes) is a good one to track. It's a partial proxy for other issues, such as dropped Raft messages and lease holder instability. |
|
All messages?
#8257 already introduced this. Is there something missing that you would like added?
Is there some underlying ideology behind why different things are graphed in different places? Some things are in the cockroach admin UI, some things are in grafana (which is currently not publicly available), and some things are in both. My working belief is that all these metrics should be graphs in the admin UI, since grafana needs to be independently deployed, and these metrics are most useful for debugging changes locally. Admin UI is the lowest overhead way to achieve that. In either case, anyone who wants to read the raw metrics log can also do so from |
We've been focusing on grafana recently because it's quicker to iterate on, can show more history, and works even when cockroachdb is having problems. Once we've settled on the metrics that are useful to graph we can sync those back to the admin UI. |
@arjunravinarayan yes I'd like to see the total volume of traffic which raft is processing. Byte counts too unless those are available somewhere else. |
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
Add additional metrics for Raft messages to help in debugging: total messages sent and received, transport queue length, dropped messages, and ticks. Expose these metrics in the Admin UI, under the "Advanced Internals" section. Closes cockroachdb#8645.
The storage layer (specifically, each individual store) suffers from a lack of metrics. To kick this off, #8257 introduced metrics for tracking time spent by each store in
processRaft
. We also decided to add in metrics on heartbeats sent and received by each store ahead of addressing #6107. This is an issue to collect ideas and suggestions about what other metrics folks might find helpful in debugging (cc @cockroachdb/stability, @cuongdo, @mberhault ). Here are some initial metrics that I am planning to add in advance of #6107:If there are other metrics that would be useful to you, or if there is a specific format you would prefer, let's track that in this issue.
The text was updated successfully, but these errors were encountered: