-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: compute InternalIntervalMetrics
without making calls to GetInternalIntervalMetrics
#88972
metrics: compute InternalIntervalMetrics
without making calls to GetInternalIntervalMetrics
#88972
Conversation
a9d9a7a
to
c72f41d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 6 of 8 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @coolcom200 and @irfansharif)
-- commits
line 18 at r2:
nit: necessary
-- commits
line 23 at r2:
nit: goroutine
pkg/util/admission/granter.go
line 651 at r2 (raw file):
} // IntervalMetrics exposes metrics about internal subsystems, that can
Let's change the name to storeInternalIntervalMetrics
(also making it package private).
Also simplify the comment to
// storeInternalIntervalMetrics exposes metrics about internal subsystems in the store,
// that are useful for admission control. These represent the metrics over ...
pkg/util/admission/granter.go
line 659 at r2 (raw file):
type IntervalMetrics struct { // LogWriter metrics. LogWriter struct {
make all of these package private
pkg/util/admission/granter.go
line 686 at r2 (raw file):
// StoreAndIntervalMetrics is a wrapper around StoreMetrics that also contains // the IntervalMetrics type StoreAndIntervalMetrics struct {
storeAndIntervalMetrics since it should be package private.
pkg/util/admission/granter.go
line 687 at r2 (raw file):
// the IntervalMetrics type StoreAndIntervalMetrics struct { *StoreMetrics
StoreMetrics is being passed by value to pebbleMetricsTick
. We should keep it as value and not a pointer.
But let's step back for a minute. The diff computation in StoreGrantCoordinators
isn't a good place since it has to keep state about each store and all the slice index complexity (and not quite correct, since the index is not required to be stable every time we fetch the metrics).
We could move it to GrantCoordinator.pebbleMetricsTick
since there is a GrantCoordinator per store. But this isn't really the role of GrantCoordinator
, which isn't supposed to know about the details about store metrics. The ioLoadListener
already converts cumulative to deltas for all the stuff it computes over, so that is the right place. Currently it does it in a somewhat adhoc manner and I would suggest something simpler for your change.
Define the following in io_load_listener.go (only the flush throughput for now, since that's all we use in admission control adjustTokensInner
. We can add more later if needed)
// storeInternalMetrics can be used to represent both cumulative and interval state for internal store subsystems.
type storeInternalMetrics {
// Flush loop metrics.
Flush struct {
// WriteThroughput is the flushing throughput.
WriteThroughput pebble.ThroughputMetric
}
}
type ioLoadListenerState {
...
// Cumulative
cumStoreInternalMetrics storeInternalMetrics
}
Put this early in ioLoadListernerState with all the other cumulative fields.
In ioLoadListener.pebbleMetricsTick
in the initialization if-block copy this out. In the non-initialization path, do the subtraction and pass the delta as an explicit parameter to ioLoadListener.adjustTokens
.
There are various tests that are currently calling pebbleMetricsTick
and those tests are assuming they should pass a delta for this flush throughput, but now the callee assumes it is being given a cumulative. Luckily most of these tests were passing an empty delta, so they are not affected. The only non-superficial test change will be in TestIOLoadListener
. See where that test has the following code
var flushBytes, flushWorkSec, flushIdleSec int
if d.HasArg("flush-bytes") {
d.ScanArgs(t, "flush-bytes", &flushBytes)
d.ScanArgs(t, "flush-work-sec", &flushWorkSec)
d.ScanArgs(t, "flush-idle-sec", &flushIdleSec)
}
flushMetric := pebble.ThroughputMetric{
Bytes: int64(flushBytes),
WorkDuration: time.Duration(flushWorkSec) * time.Second,
IdleDuration: time.Duration(flushIdleSec) * time.Second,
}
im := &pebble.InternalIntervalMetrics{}
im.Flush.WriteThroughput = flushMetric
Let's keep the testdata files unchanged, so that the testdata is still providing us interval values. So the test code needs to convert these interval values to cumulative. It can do so easily by keeping additional 3 variables cumFlushBytes, cumFlushWorkSec, cumFlushIdleSec
declared above the call to datadriven.RunTest
. Then add what the testcase provided to the current cumulative values and pass those in. Whenever the testcase does "init", to create a new ioLoadListener, set these cumulative values back to 0.
c72f41d
to
ae1cd0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r1, 8 of 8 files at r2, 7 of 7 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @sumeerbhola)
pkg/util/admission/granter.go
line 651 at r2 (raw file):
Previously, sumeerbhola wrote…
Let's change the name to
storeInternalIntervalMetrics
(also making it package private).
Also simplify the comment to
// storeInternalIntervalMetrics exposes metrics about internal subsystems in the store,
// that are useful for admission control. These represent the metrics over ...
Done.
pkg/util/admission/granter.go
line 659 at r2 (raw file):
Previously, sumeerbhola wrote…
make all of these package private
Done.
pkg/util/admission/granter.go
line 686 at r2 (raw file):
Previously, sumeerbhola wrote…
storeAndIntervalMetrics since it should be package private.
Done.
pkg/util/admission/granter.go
line 687 at r2 (raw file):
Previously, sumeerbhola wrote…
StoreMetrics is being passed by value to
pebbleMetricsTick
. We should keep it as value and not a pointer.But let's step back for a minute. The diff computation in
StoreGrantCoordinators
isn't a good place since it has to keep state about each store and all the slice index complexity (and not quite correct, since the index is not required to be stable every time we fetch the metrics).We could move it to
GrantCoordinator.pebbleMetricsTick
since there is a GrantCoordinator per store. But this isn't really the role ofGrantCoordinator
, which isn't supposed to know about the details about store metrics. TheioLoadListener
already converts cumulative to deltas for all the stuff it computes over, so that is the right place. Currently it does it in a somewhat adhoc manner and I would suggest something simpler for your change.Define the following in io_load_listener.go (only the flush throughput for now, since that's all we use in admission control
adjustTokensInner
. We can add more later if needed)// storeInternalMetrics can be used to represent both cumulative and interval state for internal store subsystems. type storeInternalMetrics { // Flush loop metrics. Flush struct { // WriteThroughput is the flushing throughput. WriteThroughput pebble.ThroughputMetric } }
type ioLoadListenerState { ... // Cumulative cumStoreInternalMetrics storeInternalMetrics }
Put this early in ioLoadListernerState with all the other cumulative fields.
In
ioLoadListener.pebbleMetricsTick
in the initialization if-block copy this out. In the non-initialization path, do the subtraction and pass the delta as an explicit parameter toioLoadListener.adjustTokens
.There are various tests that are currently calling
pebbleMetricsTick
and those tests are assuming they should pass a delta for this flush throughput, but now the callee assumes it is being given a cumulative. Luckily most of these tests were passing an empty delta, so they are not affected. The only non-superficial test change will be inTestIOLoadListener
. See where that test has the following codevar flushBytes, flushWorkSec, flushIdleSec int if d.HasArg("flush-bytes") { d.ScanArgs(t, "flush-bytes", &flushBytes) d.ScanArgs(t, "flush-work-sec", &flushWorkSec) d.ScanArgs(t, "flush-idle-sec", &flushIdleSec) } flushMetric := pebble.ThroughputMetric{ Bytes: int64(flushBytes), WorkDuration: time.Duration(flushWorkSec) * time.Second, IdleDuration: time.Duration(flushIdleSec) * time.Second, } im := &pebble.InternalIntervalMetrics{} im.Flush.WriteThroughput = flushMetric
Let's keep the testdata files unchanged, so that the testdata is still providing us interval values. So the test code needs to convert these interval values to cumulative. It can do so easily by keeping additional 3 variables
cumFlushBytes, cumFlushWorkSec, cumFlushIdleSec
declared above the call todatadriven.RunTest
. Then add what the testcase provided to the current cumulative values and pass those in. Whenever the testcase does "init", to create a new ioLoadListener, set these cumulative values back to 0.
Oh this is a way better way to organize the code! Thank you for bringing this up I have updated it!
ae1cd0f
to
a6b9572
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 7 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @coolcom200, @irfansharif, and @sumeerbhola)
-- commits
line 23 at r4:
Reminder to update the PR description to be the same as this commit.
pkg/util/admission/io_load_listener.go
line 175 at r4 (raw file):
// store, that are useful for admission control. These represent the metrics // over the interval of time from the last call to retrieve these metrics. These // are not cumulative, unlike Metrics.
s/storeInternalIntervalMetrics/storeInternalMetrics
And change the comment to say something like:
These can represent either cumulative metrics, or over an interval of time. The latter is possible since the data representation here allows for subtraction between two cumulative storeInternalMetrics.
Actually, since you are calling adjustTokensInner
with pebble.ThroughputMetric, just remove this wrapper struct. It is not adding any value, and we are not preserving the wrapping through the method calls. Then we simply have cumFlushWriteThroughput
as a field in ioLoadListenerState
.
800310e
to
1e07034
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @coolcom200, @irfansharif, and @sumeerbhola)
pkg/util/admission/io_load_listener.go
line 313 at r5 (raw file):
io.diskBW.bytesWritten = metrics.DiskStats.BytesWritten io.diskBW.incomingLSMBytes = cumLSMIncomingBytes io.setCumulativeStoreInternalIntervalMetrics(metrics)
isn't this simply io.cumFlushWriteThroughput = metrics.Flush.WriteThroughput
?
Remove dependence on the `GetInternalIntervalMetrics` as this method resets the interval following each call. This results in issues if two clients interact with `GetInternalIntervalMetrics` as they will reset the other client’s interval. Ex: Client A calls every 5 seconds and Client B calls every 10 seconds. Client B will end up getting metrics at a 5-second interval since `GetInternalIntervalMetrics` resets the interval on each call. The `admission.StoreMetrics` are updated to remove the `InternalIntervalField` as these metrics need to be computed based off the current metrics `admission.StoreMetrics.Metrics` and the previous metrics. The storage of the previous metrics is left up to the client. Removing the field was chosen instead of setting it to a `nil` value as it would be confusing to return the `StoreMetrics` and require the client to populate the field when they compute the necessary metrics. For the admission control use case, the flush metrics are computed as part of `ioLoadListener.adjustTokens` with the previous metrics being initialized in `ioLoadListener.pebbleMetricsTick` and stored on the `ioLoadListenerState` as `cumFlushWriteThroughput`. Additionally, make a copy of the `pebble.InternalIntervalMetrics` in order to move away from using the `pebble` version of the interval metrics as these metrics are now the responsibility of the caller to compute. Release note: None
1e07034
to
ce3aff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)
Previously, sumeerbhola wrote…
Reminder to update the PR description to be the same as this commit.
Done.
pkg/util/admission/io_load_listener.go
line 175 at r4 (raw file):
Previously, sumeerbhola wrote…
s/storeInternalIntervalMetrics/storeInternalMetrics
And change the comment to say something like:
These can represent either cumulative metrics, or over an interval of time. The latter is possible since the data representation here allows for subtraction between two cumulative storeInternalMetrics.Actually, since you are calling
adjustTokensInner
with pebble.ThroughputMetric, just remove this wrapper struct. It is not adding any value, and we are not preserving the wrapping through the method calls. Then we simply havecumFlushWriteThroughput
as a field inioLoadListenerState
.
Done.
pkg/util/admission/io_load_listener.go
line 313 at r5 (raw file):
Previously, sumeerbhola wrote…
isn't this simply
io.cumFlushWriteThroughput = metrics.Flush.WriteThroughput
?
Yes it is thank you for pointing this out! Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
bors r+ |
Build succeeded: |
88974: sql: add support for `DELETE FROM ... USING` r=faizaanmadhani a=faizaanmadhani See commit messages for details. Resolves: #40963 89459: metrics: expose pebble flush utilization r=jbowens a=coolcom200 Create a new `GaugeFloat64` metric for pebble’s flush utilization. This metric is not cumulative, rather, it is the metric over an interval. This interval is determined by the `interval` parameter of the `Node.startComputePeriodicMetrics` method. In order to compute the metric over an interval the previous value of the metric must be stored. As a result, a map is constructed that takes a pointer to a store and maps it to a pointer to storage metrics: `make(map[*kvserver.Store]*storage.Metrics)`. This map is passed to `node.computeMetricsPeriodically` which gets the store to calculate its metrics and then updates the previous metrics in the map. Refactor `store.go`'s metric calculation by separating `ComputeMetrics(ctx context.Context, tick int) error` into two methods: * `ComputeMetrics(ctx context.Context) error` * `ComputeMetricsPeriodically(ctx context.Context, prevMetrics *storage.Metrics, tick int) (m storage.Metrics, err error)` Both methods call the `computeMetrics` which contains the common code between the two calls. Before this, the process for retrieving metrics instantaneous was to pass a tick value such as `-1` or `0` to the `ComputeMetrics(ctx context.Context, tick int)` however it can be done with a call to `ComputeMetrics(ctx context.Context)` The `store.ComputeMetricsPeriodically` method will also return the latest storage metrics. These metrics are used to update the mapping between stores and metrics used for computing the metric delta over an interval. Release Note: None Resolves part of #85755 Depends on #88972, cockroachdb/pebble#2001 Epic: CRDB-17515 89656: roachtest: introduce admission-control/elastic-cdc r=irfansharif a=irfansharif Part of #89208. This test sets up a 3-node CRDB cluster on 8vCPU machines running 1000-warehouse TPC-C, and kicks off a few changefeed backfills concurrently. We've observed latency spikes during backfills because of its CPU/scan-heavy nature -- it can elevate CPU scheduling latencies which in turn translates to an increase in foreground latency. Also in this commit: routing std{err,out} from prometheus/grafana setup that roachtests do to the logger in scope. Release note: None Co-authored-by: Faizaan Madhani <[email protected]> Co-authored-by: Leon Fattakhov <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Remove dependence on the
GetInternalIntervalMetrics
as this method resets the interval following each call. This results in issues if two clients interact withGetInternalIntervalMetrics
as they will reset the other client’s interval. Ex: Client A calls every 5 seconds and Client B calls every 10 seconds. Client B will end up getting metrics at a 5-second interval sinceGetInternalIntervalMetrics
resets the interval on each call.The
admission.StoreMetrics
are updated to remove theInternalIntervalField
as these metrics need to be computed based off the current metricsadmission.StoreMetrics.Metrics
and the previous metrics. The storage of the previous metrics is left up to the client. Removing the field was chosen instead of setting it to anil
value as it would be confusing to return theStoreMetrics
and require the client to populate the field when they compute the necessary metrics.For the admission control use case, the flush metrics are computed as part of
ioLoadListener.adjustTokens
with the previous metrics being initialized inioLoadListener.pebbleMetricsTick
and stored on theioLoadListenerState
ascumFlushWriteThroughput
.Additionally, make a copy of the
pebble.InternalIntervalMetrics
in order to move away from using thepebble
version of the interval metrics as these metrics are now the responsibility of the caller to compute.Release note: None
Epic: CRDB-17515