-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: expose pebble flush utilization #89459
metrics: expose pebble flush utilization #89459
Conversation
347c019
to
e88da5f
Compare
dd28d67
to
a06dc63
Compare
a06dc63
to
fe20b47
Compare
cc @jbowens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
Reviewed 7 of 10 files at r1, 2 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @coolcom200)
pkg/kv/kvserver/metrics.go
line 1668 at r3 (raw file):
metaPebbleFlushUtilization = metric.Metadata{ Name: "pebble.flush.utilization",
let's name this storage.flush.utilization
to match other metrics added post-transition to Pebble
pkg/kv/kvserver/metrics.go
line 1669 at r3 (raw file):
metaPebbleFlushUtilization = metric.Metadata{ Name: "pebble.flush.utilization", Help: "The percentage of time spent flushing in the pebble flush loop",
how about "The percentage of time the storage engine is actively flushing memtables to disk."
fe20b47
to
a2be410
Compare
Create a new `GaugeFloat64` metric for pebble’s flush utilization. This metric is not cumulative, rather, it is the metric over an interval. This interval is determined by the `interval` parameter of the `Node.startComputePeriodicMetrics` method. In order to compute the metric over an interval the previous value of the metric must be stored. As a result, a map is constructed that takes a pointer to a store and maps it to a pointer to storage metrics: `make(map[*kvserver.Store]*storage.Metrics)`. This map is passed to `node.computeMetricsPeriodically` which gets the store to calculate its metrics and then updates the previous metrics in the map. Refactor `store.go`'s metric calculation by separating `ComputeMetrics(ctx context.Context, tick int) error` into two methods: * `ComputeMetrics(ctx context.Context) error` * `ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)` Both methods call the `computeMetrics` which contains the common code between the two calls. Before this, the process for retrieving metrics instantaneous was to pass a tick value such as `-1` or `0` to the `ComputeMetrics(ctx context.Context, tick int)` however it can be done with a call to `ComputeMetrics(ctx context.Context)` The `store.ComputeMetricsPeriodically` method will also return the latest storage metrics. These metrics are used to update the mapping between stores and metrics used for computing the metric delta over an interval. Release note: None
a2be410
to
8f1d48f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)
pkg/kv/kvserver/metrics.go
line 1668 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
let's name this
storage.flush.utilization
to match other metrics added post-transition to Pebble
Done. Also changed the metaPebbleFlushUtilization
to metaStorageFlushUtilization
.
pkg/kv/kvserver/metrics.go
line 1669 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
how about "The percentage of time the storage engine is actively flushing memtables to disk."
Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
TFTR! |
bors r=jbowens |
Build succeeded: |
90082: metrics: expose pebble fsync latency r=jbowens,tbg a=coolcom200 Given that pebble produces fsync latency as a prometheus histogram, create a new histogram `ManualWindowHistogram` that implements windowing to ensure that the data can be exported in Cockroach DB. This histogram does not collect values over time and expose them, instead, it allows for the cumulative and windowed metrics to be replaced. This means that the client is responsible for managing the replacement of the data to ensure that it is exported properly. Additionally, introduce a new struct `MetricsForInterval` to store the previous metrics. In order to perform subtraction between histograms they need to be converted to `prometheusgo.Metric`s which means that they cannot be stored on the `pebble.Metrics`. Hence this means that either these metrics need to be added to `storage.Metrics` which is confusing since that means there are now two metrics that represent the same data in different formats: ```go storage.pebble.Metrics.FsyncLatency prometheus.Histogram storage.FsyncLatency prometheusgo.Metric ``` Or a new struct is created that will contain the metrics needed to compute the metrics over an interval. The new struct was chosen since it was easier to understand. Release note: None Depends on: #89459 cockroachdb/pebble#2014 Epic: CRDB-17515 Co-authored-by: Leon Fattakhov <[email protected]>
Create a new
GaugeFloat64
metric for pebble’s flush utilization. Thismetric is not cumulative, rather, it is the metric over an interval.
This interval is determined by the
interval
parameter of theNode.startComputePeriodicMetrics
method.In order to compute the metric over an interval the previous value of
the metric must be stored. As a result, a map is constructed that takes
a pointer to a store and maps it to a pointer to storage metrics:
make(map[*kvserver.Store]*storage.Metrics)
. This map is passed tonode.computeMetricsPeriodically
which gets the store to calculate itsmetrics and then updates the previous metrics in the map.
Refactor
store.go
's metric calculation by separatingComputeMetrics(ctx context.Context, tick int) error
into two methods:ComputeMetrics(ctx context.Context) error
ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)
Both methods call the
computeMetrics
which contains the common codebetween the two calls. Before this, the process for retrieving metrics
instantaneous was to pass a tick value such as
-1
or0
to theComputeMetrics(ctx context.Context, tick int)
however it can bedone with a call to
ComputeMetrics(ctx context.Context)
The
store.ComputeMetricsPeriodically
method will also return thelatest storage metrics. These metrics are used to update the mapping
between stores and metrics used for computing the metric delta over an
interval.
Release Note: None
Resolves part of #85755
Depends on #88972, cockroachdb/pebble#2001
Epic: CRDB-17515