metrics: expose pebble flush utilization #89459

coolcom200 · 2022-10-06T01:00:56Z

Create a new GaugeFloat64 metric for pebble’s flush utilization. This
metric is not cumulative, rather, it is the metric over an interval.
This interval is determined by the interval parameter of the
Node.startComputePeriodicMetrics method.

In order to compute the metric over an interval the previous value of
the metric must be stored. As a result, a map is constructed that takes
a pointer to a store and maps it to a pointer to storage metrics:
make(map[*kvserver.Store]*storage.Metrics). This map is passed to
node.computeMetricsPeriodically which gets the store to calculate its
metrics and then updates the previous metrics in the map.

Refactor store.go's metric calculation by separating
ComputeMetrics(ctx context.Context, tick int) error into two methods:

ComputeMetrics(ctx context.Context) error
ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)

Both methods call the computeMetrics which contains the common code
between the two calls. Before this, the process for retrieving metrics
instantaneous was to pass a tick value such as -1 or 0 to the
ComputeMetrics(ctx context.Context, tick int) however it can be
done with a call to ComputeMetrics(ctx context.Context)

The store.ComputeMetricsPeriodically method will also return the
latest storage metrics. These metrics are used to update the mapping
between stores and metrics used for computing the metric delta over an
interval.

Release Note: None

Resolves part of #85755
Depends on #88972, cockroachdb/pebble#2001
Epic: CRDB-17515

cockroach-teamcity · 2022-10-06T01:01:05Z

This change is

knz · 2022-10-14T08:06:39Z

cc @jbowens

jbowens

Looks great

Reviewed 7 of 10 files at r1, 2 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @coolcom200)

pkg/kv/kvserver/metrics.go line 1668 at r3 (raw file):

	metaPebbleFlushUtilization = metric.Metadata{
		Name:        "pebble.flush.utilization",

let's name this storage.flush.utilization to match other metrics added post-transition to Pebble

pkg/kv/kvserver/metrics.go line 1669 at r3 (raw file):

	metaPebbleFlushUtilization = metric.Metadata{
		Name:        "pebble.flush.utilization",
		Help:        "The percentage of time spent flushing in the pebble flush loop",

how about "The percentage of time the storage engine is actively flushing memtables to disk."

Create a new `GaugeFloat64` metric for pebble’s flush utilization. This metric is not cumulative, rather, it is the metric over an interval. This interval is determined by the `interval` parameter of the `Node.startComputePeriodicMetrics` method. In order to compute the metric over an interval the previous value of the metric must be stored. As a result, a map is constructed that takes a pointer to a store and maps it to a pointer to storage metrics: `make(map[*kvserver.Store]*storage.Metrics)`. This map is passed to `node.computeMetricsPeriodically` which gets the store to calculate its metrics and then updates the previous metrics in the map. Refactor `store.go`'s metric calculation by separating `ComputeMetrics(ctx context.Context, tick int) error` into two methods: * `ComputeMetrics(ctx context.Context) error` * `ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)` Both methods call the `computeMetrics` which contains the common code between the two calls. Before this, the process for retrieving metrics instantaneous was to pass a tick value such as `-1` or `0` to the `ComputeMetrics(ctx context.Context, tick int)` however it can be done with a call to `ComputeMetrics(ctx context.Context)` The `store.ComputeMetricsPeriodically` method will also return the latest storage metrics. These metrics are used to update the mapping between stores and metrics used for computing the metric delta over an interval. Release note: None

coolcom200

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)

pkg/kv/kvserver/metrics.go line 1668 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

let's name this storage.flush.utilization to match other metrics added post-transition to Pebble

Done. Also changed the metaPebbleFlushUtilization to metaStorageFlushUtilization.

pkg/kv/kvserver/metrics.go line 1669 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

how about "The percentage of time the storage engine is actively flushing memtables to disk."

Updated!

jbowens

Reviewable status: complete! 1 of 0 LGTMs obtained

coolcom200 · 2022-10-17T20:04:23Z

TFTR!

coolcom200 · 2022-10-17T20:07:18Z

bors r=jbowens

craig · 2022-10-17T21:23:19Z

Build succeeded:

Bazel Essential CI (Cockroach)

90082: metrics: expose pebble fsync latency r=jbowens,tbg a=coolcom200 Given that pebble produces fsync latency as a prometheus histogram, create a new histogram `ManualWindowHistogram` that implements windowing to ensure that the data can be exported in Cockroach DB. This histogram does not collect values over time and expose them, instead, it allows for the cumulative and windowed metrics to be replaced. This means that the client is responsible for managing the replacement of the data to ensure that it is exported properly. Additionally, introduce a new struct `MetricsForInterval` to store the previous metrics. In order to perform subtraction between histograms they need to be converted to `prometheusgo.Metric`s which means that they cannot be stored on the `pebble.Metrics`. Hence this means that either these metrics need to be added to `storage.Metrics` which is confusing since that means there are now two metrics that represent the same data in different formats: ```go storage.pebble.Metrics.FsyncLatency prometheus.Histogram storage.FsyncLatency prometheusgo.Metric ``` Or a new struct is created that will contain the metrics needed to compute the metrics over an interval. The new struct was chosen since it was easier to understand. Release note: None Depends on: #89459 cockroachdb/pebble#2014 Epic: CRDB-17515 Co-authored-by: Leon Fattakhov <[email protected]>

coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch 4 times, most recently from 347c019 to e88da5f Compare October 12, 2022 19:32

coolcom200 changed the title ~~metrics: expose pebble flush utilization~~ metrics: expose pebble flush utilization and fsync latency Oct 13, 2022

coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch 2 times, most recently from dd28d67 to a06dc63 Compare October 13, 2022 19:15

coolcom200 changed the title ~~metrics: expose pebble flush utilization and fsync latency~~ metrics: expose pebble flush utilization Oct 13, 2022

coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from a06dc63 to fe20b47 Compare October 13, 2022 20:48

coolcom200 marked this pull request as ready for review October 14, 2022 02:54

coolcom200 requested a review from a team October 14, 2022 02:54

coolcom200 requested review from a team as code owners October 14, 2022 02:54

knz requested a review from a team October 14, 2022 08:05

jbowens reviewed Oct 14, 2022

View reviewed changes

coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from fe20b47 to a2be410 Compare October 17, 2022 14:28

coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from a2be410 to 8f1d48f Compare October 17, 2022 14:31

coolcom200 commented Oct 17, 2022

View reviewed changes

coolcom200 requested a review from jbowens October 17, 2022 17:31

coolcom200 mentioned this pull request Oct 17, 2022

metrics: expose pebble fsync latency #90082

Merged

jbowens approved these changes Oct 17, 2022

View reviewed changes

craig bot merged commit 55aca65 into cockroachdb:master Oct 17, 2022

coolcom200 mentioned this pull request Oct 24, 2022

storage: export info in pebble.InternalIntervalMetrics as CockroachDB metrics #85755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: expose pebble flush utilization #89459

metrics: expose pebble flush utilization #89459

coolcom200 commented Oct 6, 2022 •

edited

Loading

cockroach-teamcity commented Oct 6, 2022

knz commented Oct 14, 2022

jbowens left a comment

coolcom200 left a comment

jbowens left a comment

coolcom200 commented Oct 17, 2022 •

edited

Loading

coolcom200 commented Oct 17, 2022

craig bot commented Oct 17, 2022

metrics: expose pebble flush utilization #89459

metrics: expose pebble flush utilization #89459

Conversation

coolcom200 commented Oct 6, 2022 • edited Loading

cockroach-teamcity commented Oct 6, 2022

knz commented Oct 14, 2022

jbowens left a comment

Choose a reason for hiding this comment

coolcom200 left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

coolcom200 commented Oct 17, 2022 • edited Loading

coolcom200 commented Oct 17, 2022

craig bot commented Oct 17, 2022

coolcom200 commented Oct 6, 2022 •

edited

Loading

coolcom200 commented Oct 17, 2022 •

edited

Loading