-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metric: expose pebble fsync latency with a callback #2001
metric: expose pebble fsync latency with a callback #2001
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, 7 of 7 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions
options.go
line 762 at r2 (raw file):
WALMinSyncInterval func() time.Duration OnMetrics MetricCallbacks
This needs a comment
record/log_writer.go
line 307 at r2 (raw file):
// NewLogWriter returns a new LogWriter. func NewLogWriter(w io.Writer, logNum base.FileNum, logWriterConfig LogWriterConfig) *LogWriter {
Should we move logNum
into the logWriterConfig
struct?
record/log_writer.go
line 334 at r2 (raw file):
f := &r.flusher f.minSyncInterval = logWriterConfig.WALMinSyncIntervalCallback f.onFsyncLatencyMetric = logWriterConfig.LogWriterFsyncLatencyCallback
This assignment used to be gaurded with a lock however it doesn't seem like this is needed in the NewLogWriter
func
Code quote:
f := &r.flusher
f.minSyncInterval = logWriterConfig.WALMinSyncIntervalCallback
f.onFsyncLatencyMetric = logWriterConfig.LogWriterFsyncLatencyCallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great! some nits here and there, mostly around naming
Reviewed 4 of 4 files at r1, 7 of 7 files at r2, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @coolcom200)
options.go
line 762 at r2 (raw file):
Previously, coolcom200 (Leon Fattakhov) wrote…
This needs a comment
And how about MetricEventListener
for the field name as well. The fields of the struct are sorted alphabetically, so we should also slot it in appropriately
options.go
line 801 at r2 (raw file):
// MetricCallbacks is a struct that contains the callbacks that pebble will call // in order to update clients. type MetricCallbacks struct {
how about MetricEventListener
to mirror EventListener
?
options.go
line 802 at r2 (raw file):
// in order to update clients. type MetricCallbacks struct { LogWriterFsyncLatency func(duration int64)
let's call this WALFysncLatency
, and let's add a comment explaining that since this event is called on every WAL fsync, implementations must have very low overhead to avoid impacting performance.
record/log_writer.go
line 230 at r2 (raw file):
type durationFunc func() time.Duration type recordValueFunc func(value int64)
let's make this take a time.Duration
now. then if the user subscribing to fsync events wants to truncate precision to the microsecond, they can call .Microsecond()
themselves.
record/log_writer.go
line 299 at r2 (raw file):
type LogWriterConfig struct { WALMinSyncIntervalCallback durationFunc LogWriterFsyncLatencyCallback recordValueFunc
nit: can we drop the Callback
suffix from these fields? and rename LogWriterFsyncLatencyCallback
to OnFsync
?
record/log_writer.go
line 307 at r2 (raw file):
Previously, coolcom200 (Leon Fattakhov) wrote…
Should we move
logNum
into thelogWriterConfig
struct?
Let's keep the required parameters like logNum
here, and LogWriterConfig
can hold the optional parameters.
record/log_writer.go
line 334 at r2 (raw file):
Previously, coolcom200 (Leon Fattakhov) wrote…
This assignment used to be gaurded with a lock however it doesn't seem like this is needed in the
NewLogWriter
func
Yeah, that's right 👍
record/log_writer_test.go
line 299 at r2 (raw file):
return minSyncInterval }, LogWriterFsyncLatencyCallback: nil,
nit: you can omit this line (unspecified fields in a struct value default to their zero value, in this case nil
)
record/log_writer_test.go
line 384 at r2 (raw file):
err := syncLatencyMicros.RecordValue(duration) if err != nil { t.Fatal("Ran into error when recording values to histogram")
nit: no requirement to use it, but just an fyi, there's a require.NoError(t, err)
func we use frequently in tests. both forms are totally fine, but you may find you prefer the shorthand of require.NoError
e88f67e
to
11c50de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jbowens)
options.go
line 801 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
how about
MetricEventListener
to mirrorEventListener
?
Done.
options.go
line 802 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
let's call this
WALFysncLatency
, and let's add a comment explaining that since this event is called on every WAL fsync, implementations must have very low overhead to avoid impacting performance.
Done.
record/log_writer.go
line 230 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
let's make this take a
time.Duration
now. then if the user subscribing to fsync events wants to truncate precision to the microsecond, they can call.Microsecond()
themselves.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @coolcom200)
options.go
line 802 at r4 (raw file):
// MetricEventListener is a struct that contains the callbacks that pebble will // call when metric related events occur
nitiest nit: missing a period
options.go
line 804 at r4 (raw file):
// call when metric related events occur type MetricEventListener struct { // WALFysncLatency is invoked after each WAL Fsync. As a result, this function
How about
// WALFysncLatency is invoked with the duration of every WAL Fsync. It is invoked
// synchronously within the WAL's flush loop, so this function should not run for an
// excessive amount of time to avoid stalling writes.
The fsync latency was computed using an HDR histogram however this histogram generates too many buckets of various widths. In order to allow clients to store the fsync latency in their preferred format, a callback was introduced. This callback is defined on the `pebble.Options` which enables clients to provide a callback function which will be triggered each time pebble produces the fsync latency metrics. Additionally the `record.NewLogWriter` method was updated to take in a configuration struct which configures the `LogWriter` with the specified callbacks.
11c50de
to
f011973
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!!!
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens)
options.go
line 802 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nitiest nit: missing a period
Done.
options.go
line 804 at r4 (raw file):
Previously, jbowens (Jackson Owens) wrote…
How about
// WALFysncLatency is invoked with the duration of every WAL Fsync. It is invoked // synchronously within the WAL's flush loop, so this function should not run for an // excessive amount of time to avoid stalling writes.
Done.
88974: sql: add support for `DELETE FROM ... USING` r=faizaanmadhani a=faizaanmadhani See commit messages for details. Resolves: #40963 89459: metrics: expose pebble flush utilization r=jbowens a=coolcom200 Create a new `GaugeFloat64` metric for pebble’s flush utilization. This metric is not cumulative, rather, it is the metric over an interval. This interval is determined by the `interval` parameter of the `Node.startComputePeriodicMetrics` method. In order to compute the metric over an interval the previous value of the metric must be stored. As a result, a map is constructed that takes a pointer to a store and maps it to a pointer to storage metrics: `make(map[*kvserver.Store]*storage.Metrics)`. This map is passed to `node.computeMetricsPeriodically` which gets the store to calculate its metrics and then updates the previous metrics in the map. Refactor `store.go`'s metric calculation by separating `ComputeMetrics(ctx context.Context, tick int) error` into two methods: * `ComputeMetrics(ctx context.Context) error` * `ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)` Both methods call the `computeMetrics` which contains the common code between the two calls. Before this, the process for retrieving metrics instantaneous was to pass a tick value such as `-1` or `0` to the `ComputeMetrics(ctx context.Context, tick int)` however it can be done with a call to `ComputeMetrics(ctx context.Context)` The `store.ComputeMetricsPeriodically` method will also return the latest storage metrics. These metrics are used to update the mapping between stores and metrics used for computing the metric delta over an interval. Release Note: None Resolves part of #85755 Depends on #88972, cockroachdb/pebble#2001 Epic: CRDB-17515 89656: roachtest: introduce admission-control/elastic-cdc r=irfansharif a=irfansharif Part of #89208. This test sets up a 3-node CRDB cluster on 8vCPU machines running 1000-warehouse TPC-C, and kicks off a few changefeed backfills concurrently. We've observed latency spikes during backfills because of its CPU/scan-heavy nature -- it can elevate CPU scheduling latencies which in turn translates to an increase in foreground latency. Also in this commit: routing std{err,out} from prometheus/grafana setup that roachtests do to the logger in scope. Release note: None Co-authored-by: Faizaan Madhani <[email protected]> Co-authored-by: Leon Fattakhov <[email protected]> Co-authored-by: irfan sharif <[email protected]>
The fsync latency was computed using an HDR histogram however this
histogram generates too many buckets of various widths. In order to
allow clients to store the fsync latency in their preferred format, a
callback was introduced. This callback is defined on the
pebble.Options
which enables clients to provide a callback functionwhich will be triggered each time pebble produces the fsync latency
metrics.
Additionally the
record.NewLogWriter
method was updated to take in aconfiguration struct which configures the
LogWriter
with the specifiedcallbacks.