-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: add metric and log when raft.Storage returns an error #113245
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this!
pkg/kv/kvserver/metrics.go
Outdated
@@ -1418,6 +1418,12 @@ cache will already have moved on to newer entries. | |||
Measurement: "Bytes", | |||
Unit: metric.Unit_BYTES, | |||
} | |||
metaRaftStorageError = metric.Metadata{ | |||
Name: "raft.storage.error", | |||
Help: "Number of calls to the raft.Storage API that returned an error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the "raft.Storage API" part seems a bit too low-level for users, consider perhaps "Number of Raft storage errors".
0618e95
to
ffba1b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvserver/metrics.go
line 1423 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: the "raft.Storage API" part seems a bit too low-level for users, consider perhaps "Number of Raft storage errors".
Done
bors r=erikgrinaker |
Build failed: |
The raft.storage.error metric is incremented on an error, and the error is logged every 30s (across all replicas). This was motivated by a test cluster that slowed to a crawl because of deliberate data loss, but was hard to diagnose. The metric could be used for alerting, since we don't expect to see transient errors. Informs cockroachdb#113053 Epic: none Release note: None
ffba1b6
to
063ebce
Compare
bors r=erikgrinaker |
Build succeeded: |
The raft.storage.error metric is incremented on an error, and the error is logged every 30s (across all replicas).
This was motivated by a test cluster that slowed to a crawl because of deliberate data loss, but was hard to diagnose. The metric could be used for alerting, since we don't expect to see transient errors.
Informs #113053
Epic: none
Release note: None