-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: add IOThreshold to metrics #87424
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-starter
Might be suitable for a starter project for new employees or team members.
Comments
tbg
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv-replication
labels
Sep 6, 2022
cc @cockroachdb/replication |
kvoli
added
the
E-starter
Might be suitable for a starter project for new employees or team members.
label
Sep 6, 2022
KaiSun314
added a commit
to KaiSun314/cockroach
that referenced
this issue
Sep 8, 2022
Resolves cockroachdb#87424. Previously, only the unnormalized values of the LSM L0 sub-level and file counts is exposed externally, not the store's IOThreshold. This was inadequate because it is tedious to normalize and compare the LSM L0 sub-level and file counts (as they require dividing by different numbers). To address this, this patch adds a metric `admission.io.overload` tracking the store's IOThreshold. Release note (ops change): Added a metric `admission.io.overload` which tracks the store's IOThreshold.
cc @joshimhoff want to track this as it closes since I think you'll want to add alerting on this new metric as soon as it becomes available. @kvoli planning backport to 22.2 or not? |
I think a backport to 22.2 would be a good idea. I'll work with @KaiSun314 after it's merged to backport. |
KaiSun314
added a commit
to KaiSun314/cockroach
that referenced
this issue
Sep 12, 2022
Resolves cockroachdb#87424. Previously, only the unnormalized values of the LSM L0 sub-level and file counts is exposed externally, not the store's IOThreshold. This was inadequate because it is tedious to normalize and compare the LSM L0 sub-level and file counts (as they require dividing by different numbers). To address this, this patch adds a metric `admission.io.overload` tracking the store's IOThreshold. Release note (ops change): Added a metric `admission.io.overload` which tracks the store's IOThreshold.
craig bot
pushed a commit
that referenced
this issue
Sep 19, 2022
87656: kvserver: add `admission.io.overload` metric r=kvoli a=KaiSun314 Resolves #87424. Previously, only the unnormalized values of the LSM L0 sub-level and file counts is exposed externally, not the store's IOThreshold. This was inadequate because it is tedious to normalize and compare the LSM L0 sub-level and file counts (as they require dividing by different numbers). To address this, this patch adds a metric `admission.io.overload` tracking the store's IOThreshold. Release note (ops change): Added a metric `admission.io.overload` which tracks the store's IOThreshold. Note: I am not certain on the placement of the `admission.io.overload` metric, but it seems reasonable. Justification: 1. Pretty much all updates to store metrics outside of ComputeMetrics in pkg/kv/kvstore/store.go seem to be increments (e.g. Inc(1) ) 2. ComputeMetrics (there is a comment in the code: ComputeMetrics immediately computes the current value of store metrics which cannot be computed incrementally) updates the store metrics via updateCapacityGauges , updateReplicationGauges , updateEngineMetrics , and updateEnvStats. Looking at all these functions, updateReplicationGauges (where we are currently updating the IOOverload metric) seems to make the most sense to update the IOOverload metric 3. There seem to be two other admission metrics: - RaftPausedFollowerCount - updated in updateReplicationGauges - RaftPausedFollowerDroppedMsgs.Inc - updated in pkg/kv/kvserver/replica_raft.go via r.store.Metrics().RaftPausedFollowerDroppedMsgs.Inc(1) Co-authored-by: Kai Sun <[email protected]>
KaiSun314
added a commit
to KaiSun314/cockroach
that referenced
this issue
Sep 19, 2022
Resolves cockroachdb#87424. Previously, only the unnormalized values of the LSM L0 sub-level and file counts is exposed externally, not the store's IOThreshold. This was inadequate because it is tedious to normalize and compare the LSM L0 sub-level and file counts (as they require dividing by different numbers). To address this, this patch adds a metric `admission.io.overload` tracking the store's IOThreshold. Release note (ops change): Added a metric `admission.io.overload` which tracks the store's IOThreshold.
22 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-starter
Might be suitable for a starter project for new employees or team members.
We should expose a store's IOThreshold as a store metric.
This is straightforward.
Noticed during #86775 that SRE is alerting on a naked FileCount.
Jira issue: CRDB-19342
The text was updated successfully, but these errors were encountered: