Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: expose pebble flush utilization #89459

Merged

Conversation

coolcom200
Copy link
Contributor

@coolcom200 coolcom200 commented Oct 6, 2022

Create a new GaugeFloat64 metric for pebble’s flush utilization. This
metric is not cumulative, rather, it is the metric over an interval.
This interval is determined by the interval parameter of the
Node.startComputePeriodicMetrics method.

In order to compute the metric over an interval the previous value of
the metric must be stored. As a result, a map is constructed that takes
a pointer to a store and maps it to a pointer to storage metrics:
make(map[*kvserver.Store]*storage.Metrics). This map is passed to
node.computeMetricsPeriodically which gets the store to calculate its
metrics and then updates the previous metrics in the map.

Refactor store.go's metric calculation by separating
ComputeMetrics(ctx context.Context, tick int) error into two methods:

  • ComputeMetrics(ctx context.Context) error
  • ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)

Both methods call the computeMetrics which contains the common code
between the two calls. Before this, the process for retrieving metrics
instantaneous was to pass a tick value such as -1 or 0 to the
ComputeMetrics(ctx context.Context, tick int) however it can be
done with a call to ComputeMetrics(ctx context.Context)

The store.ComputeMetricsPeriodically method will also return the
latest storage metrics. These metrics are used to update the mapping
between stores and metrics used for computing the metric delta over an
interval.

Release Note: None

Resolves part of #85755
Depends on #88972, cockroachdb/pebble#2001
Epic: CRDB-17515

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@coolcom200 coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch 4 times, most recently from 347c019 to e88da5f Compare October 12, 2022 19:32
@coolcom200 coolcom200 changed the title metrics: expose pebble flush utilization metrics: expose pebble flush utilization and fsync latency Oct 13, 2022
@coolcom200 coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch 2 times, most recently from dd28d67 to a06dc63 Compare October 13, 2022 19:15
@coolcom200 coolcom200 changed the title metrics: expose pebble flush utilization and fsync latency metrics: expose pebble flush utilization Oct 13, 2022
@coolcom200 coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from a06dc63 to fe20b47 Compare October 13, 2022 20:48
@coolcom200 coolcom200 marked this pull request as ready for review October 14, 2022 02:54
@coolcom200 coolcom200 requested a review from a team October 14, 2022 02:54
@coolcom200 coolcom200 requested review from a team as code owners October 14, 2022 02:54
@knz knz requested a review from a team October 14, 2022 08:05
@knz
Copy link
Contributor

knz commented Oct 14, 2022

cc @jbowens

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

Reviewed 7 of 10 files at r1, 2 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @coolcom200)


pkg/kv/kvserver/metrics.go line 1668 at r3 (raw file):

	metaPebbleFlushUtilization = metric.Metadata{
		Name:        "pebble.flush.utilization",

let's name this storage.flush.utilization to match other metrics added post-transition to Pebble


pkg/kv/kvserver/metrics.go line 1669 at r3 (raw file):

	metaPebbleFlushUtilization = metric.Metadata{
		Name:        "pebble.flush.utilization",
		Help:        "The percentage of time spent flushing in the pebble flush loop",

how about "The percentage of time the storage engine is actively flushing memtables to disk."

@coolcom200 coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from fe20b47 to a2be410 Compare October 17, 2022 14:28
Create a new `GaugeFloat64` metric for pebble’s flush utilization. This
metric is not cumulative, rather, it is the metric over an interval.
This interval is determined by the `interval` parameter of the
`Node.startComputePeriodicMetrics` method.

In order to compute the metric over an interval the previous value of
the metric must be stored. As a result, a map is constructed that takes
a pointer to a store and maps it to a pointer to storage metrics:
`make(map[*kvserver.Store]*storage.Metrics)`. This map is passed to
`node.computeMetricsPeriodically` which gets the store to calculate its
metrics and then updates the previous metrics in the map.

Refactor `store.go`'s metric calculation by separating
`ComputeMetrics(ctx context.Context, tick int) error` into two methods:

* `ComputeMetrics(ctx context.Context) error`
* `ComputeMetricsPeriodically(ctx context.Context, prevMetrics
  *storage.Metrics, tick int) (m storage.Metrics, err error)`

Both methods call the `computeMetrics` which contains the common code
between the two calls. Before this, the process for retrieving metrics
instantaneous was to pass a tick value such as `-1` or `0` to the
`ComputeMetrics(ctx context.Context, tick int)` however it can be
done with a call to `ComputeMetrics(ctx context.Context)`

The `store.ComputeMetricsPeriodically` method will also return the
latest storage metrics. These metrics are used to update the mapping
between stores and metrics used for computing the metric delta over an
interval.

Release note: None
@coolcom200 coolcom200 force-pushed the 85755-pebble-iim-as-crdb-metrics branch from a2be410 to 8f1d48f Compare October 17, 2022 14:31
Copy link
Contributor Author

@coolcom200 coolcom200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)


pkg/kv/kvserver/metrics.go line 1668 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

let's name this storage.flush.utilization to match other metrics added post-transition to Pebble

Done. Also changed the metaPebbleFlushUtilization to metaStorageFlushUtilization.


pkg/kv/kvserver/metrics.go line 1669 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

how about "The percentage of time the storage engine is actively flushing memtables to disk."

Updated!

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@coolcom200
Copy link
Contributor Author

coolcom200 commented Oct 17, 2022

TFTR!

@coolcom200
Copy link
Contributor Author

bors r=jbowens

@craig
Copy link
Contributor

craig bot commented Oct 17, 2022

Build succeeded:

@craig craig bot merged commit 55aca65 into cockroachdb:master Oct 17, 2022
craig bot pushed a commit that referenced this pull request Oct 24, 2022
90082: metrics: expose pebble fsync latency r=jbowens,tbg a=coolcom200

Given that pebble produces fsync latency as a prometheus histogram,
create a new histogram `ManualWindowHistogram` that implements windowing
to ensure that the data can be exported in Cockroach DB. This histogram
does not collect values over time and expose them, instead, it allows
for the cumulative and windowed metrics to be replaced. This means that
the client is responsible for managing the replacement of the data to
ensure that it is exported properly.

Additionally, introduce a new struct `MetricsForInterval` to store the
previous metrics. In order to perform subtraction between histograms
they need to be converted to `prometheusgo.Metric`s which means that
they cannot be stored on the `pebble.Metrics`. Hence this means that
either these metrics need to be added to `storage.Metrics` which is
confusing since that means there are now two metrics that represent the
same data in different formats:

```go
storage.pebble.Metrics.FsyncLatency prometheus.Histogram
storage.FsyncLatency prometheusgo.Metric
```
Or a new struct is created that will contain the metrics needed to
compute the metrics over an interval. The new struct was chosen since it
was easier to understand.


Release note: None
Depends on: #89459 cockroachdb/pebble#2014
Epic: CRDB-17515


Co-authored-by: Leon Fattakhov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants