Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: expose fsync latency as prometheus histogram #2014

Merged

Conversation

coolcom200
Copy link
Contributor

@coolcom200 coolcom200 commented Oct 17, 2022

Previously, the fsync latency was computed using an HDR histogram
however this histogram generated too many buckets of various widths. As
a result, a prometheus histogram was chosen instead. In addition, the
metrics were being accumulated and stored on each LogWriter and then
merged into the cumulative metrics on each WAL rotation. However, with
this new promethesus histogram the writes occur to the same histogram
and are separate from the lifetime of the WAL.

Additonally, remove all the callback listeners and configurations for
the callback based approach introduced in: 5fdb3ea

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@coolcom200 coolcom200 force-pushed the 85755-fsync-latency-prometheus branch from 86f3e92 to ab2d6a8 Compare October 17, 2022 17:40
Copy link
Contributor Author

@coolcom200 coolcom200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 10 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions


db.go line 346 at r2 (raw file):

			// Can be nil.
			metrics struct {
				fsyncLatency prometheus.Histogram

Moved this outside of the record.LogWriterMetrics since LogWriterMetrics can be added and subtracted but doing so with this histogram doesn't make sense.


record/log_writer_test.go line 389 at r2 (raw file):

		Name:        "",
		Help:        "",
		ConstLabels: nil,

This needs to be deleted.

Code quote:

		Namespace:   "",
		Subsystem:   "",
		Name:        "",
		Help:        "",
		ConstLabels: nil,

record/log_writer_test.go line 431 at r2 (raw file):

}

func ValueAtQuantileWindowed(histogram *prometheusgo.Histogram, q float64) float64 {

The ValueAtQuantileWindowed is from cockroachdb so I am not sure if this is a test that makes sense to perform the test here for checking for valid quantiles given that prometheus histograms don't support pulling these details by default.

Code quote:

	require.LessOrEqual(t, float64(syncLatency/(2*time.Microsecond)),
		ValueAtQuantileWindowed(writeTo.Histogram, 90))
	require.LessOrEqual(t, int64(syncLatency/2), int64(m.WriteThroughput.WorkDuration))
}

func ValueAtQuantileWindowed(histogram *prometheusgo.Histogram, q float64) float64 {

@coolcom200 coolcom200 marked this pull request as ready for review October 17, 2022 19:07
@coolcom200 coolcom200 requested review from a team and jbowens October 17, 2022 19:08
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 10 files at r1, 1 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @coolcom200)


db.go line 346 at r2 (raw file):

Previously, coolcom200 (Leon Fattakhov) wrote…

Moved this outside of the record.LogWriterMetrics since LogWriterMetrics can be added and subtracted but doing so with this histogram doesn't make sense.

Ack, makes sense


metrics.go line 247 at r2 (raw file):

	// FsyncLatencyBuckets are prometheus histogram buckets suitable for a histogram
	// that records latencies for fsyncs.
	FsyncLatencyBuckets = prometheus.LinearBuckets(0, float64(100*time.Microsecond), 50)

is there an exponential bucket division that maintains at most 100-200u divisions between [0,5ms]? with a max of say ~10s.


open.go line 428 at r2 (raw file):

		logWriterConfig := record.LogWriterConfig{
			WALMinSyncInterval: d.opts.WALMinSyncInterval,
			WALFsync:           d.mu.log.metrics.fsyncLatency,

nit: how about WALFsyncLatency


record/log_writer.go line 455 at r2 (raw file):

		f.Lock()
		if synced {
			f.fsyncLatency.Observe(float64(syncLatency))

can we keep the nil check? eg, if synced && f.fsyncLatency != nil

it's convenient for tests that don't need the histogram, and for other uses of the LogWriter outside of Pebble that also don't need the histogram. Cockroach uses the LogWriter for its encryption-at-rest file registry

@coolcom200 coolcom200 force-pushed the 85755-fsync-latency-prometheus branch 3 times, most recently from 6bfa06a to a6d9a65 Compare October 17, 2022 22:56
Copy link
Contributor Author

@coolcom200 coolcom200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 7 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens)


metrics.go line 247 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

is there an exponential bucket division that maintains at most 100-200u divisions between [0,5ms]? with a max of say ~10s.

This is possible as there are the following helpful functions:

  1. ExponentialBucketsRange(min, max float64, count int)
  2. ExponentialBuckets(start, factor float64, count int)

With a bit of fiddling we can fine tune the amount of buckets and the sizes. The main question here is what should the start/min value be? 100us? (as in the first bucket is 0 - 100us and the rest follow the distribution)


record/log_writer.go line 455 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

can we keep the nil check? eg, if synced && f.fsyncLatency != nil

it's convenient for tests that don't need the histogram, and for other uses of the LogWriter outside of Pebble that also don't need the histogram. Cockroach uses the LogWriter for its encryption-at-rest file registry

Done.


record/log_writer_test.go line 389 at r2 (raw file):

Previously, coolcom200 (Leon Fattakhov) wrote…

This needs to be deleted.

Done.


record/log_writer_test.go line 431 at r2 (raw file):

Previously, coolcom200 (Leon Fattakhov) wrote…

The ValueAtQuantileWindowed is from cockroachdb so I am not sure if this is a test that makes sense to perform the test here for checking for valid quantiles given that prometheus histograms don't support pulling these details by default.

@jbowens Do you have any thoughts on this?

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @coolcom200)


metrics.go line 247 at r2 (raw file):

Previously, coolcom200 (Leon Fattakhov) wrote…

This is possible as there are the following helpful functions:

  1. ExponentialBucketsRange(min, max float64, count int)
  2. ExponentialBuckets(start, factor float64, count int)

With a bit of fiddling we can fine tune the amount of buckets and the sizes. The main question here is what should the start/min value be? 100us? (as in the first bucket is 0 - 100us and the rest follow the distribution)

100u is a good starting bucket, but I imagine we'll need to experiment with adjusting start/factor values to get the right distribution.

@coolcom200 coolcom200 force-pushed the 85755-fsync-latency-prometheus branch from a6d9a65 to 058a32c Compare October 18, 2022 19:14
Copy link
Contributor Author

@coolcom200 coolcom200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens)


metrics.go line 247 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

100u is a good starting bucket, but I imagine we'll need to experiment with adjusting start/factor values to get the right distribution.

Decided to go with Linear Buckets for the first 0 - 5ms and then exponential buckets from 5ms onwards

Previously, the fsync latency was computed using an HDR histogram
however this histogram generated too many buckets of various widths. As
a result, a prometheus histogram was chosen instead. In addition, the
metrics were being accumulated and stored on each `LogWriter` and then
merged into the cumulative metrics on each WAL rotation. However, with
this new promethesus histogram the writes occur to the same histogram
and are separate from the lifetime of the WAL.

Additonally, remove all the callback listeners and configurations for
the callback based approach introduced in:
5fdb3ea
@coolcom200 coolcom200 force-pushed the 85755-fsync-latency-prometheus branch from 058a32c to fbe8309 Compare October 18, 2022 19:17
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 2 of 10 files at r1, 4 of 7 files at r3, 1 of 2 files at r4, 1 of 2 files at r5, all commit messages.
Reviewable status: 9 of 10 files reviewed, 1 unresolved discussion (waiting on @coolcom200)

@coolcom200
Copy link
Contributor Author

TFTR!

@coolcom200 coolcom200 merged commit 0090519 into cockroachdb:master Oct 19, 2022
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Oct 24, 2022
90082: metrics: expose pebble fsync latency r=jbowens,tbg a=coolcom200

Given that pebble produces fsync latency as a prometheus histogram,
create a new histogram `ManualWindowHistogram` that implements windowing
to ensure that the data can be exported in Cockroach DB. This histogram
does not collect values over time and expose them, instead, it allows
for the cumulative and windowed metrics to be replaced. This means that
the client is responsible for managing the replacement of the data to
ensure that it is exported properly.

Additionally, introduce a new struct `MetricsForInterval` to store the
previous metrics. In order to perform subtraction between histograms
they need to be converted to `prometheusgo.Metric`s which means that
they cannot be stored on the `pebble.Metrics`. Hence this means that
either these metrics need to be added to `storage.Metrics` which is
confusing since that means there are now two metrics that represent the
same data in different formats:

```go
storage.pebble.Metrics.FsyncLatency prometheus.Histogram
storage.FsyncLatency prometheusgo.Metric
```
Or a new struct is created that will contain the metrics needed to
compute the metrics over an interval. The new struct was chosen since it
was easier to understand.


Release note: None
Depends on: #89459 cockroachdb/pebble#2014
Epic: CRDB-17515


Co-authored-by: Leon Fattakhov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants