Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedulerlatency: improve percentile computation #89829

Closed
irfansharif opened this issue Oct 12, 2022 · 1 comment · Fixed by #94193
Closed

schedulerlatency: improve percentile computation #89829

irfansharif opened this issue Oct 12, 2022 · 1 comment · Fixed by #94193
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Oct 12, 2022

Is your feature request related to a problem? Please describe.

The way we're computing scheduling latencies at specific percentiles (typically p99) is naive thanks to yours truly. It just picks the mid point of the bucket we've found for the given percentile, instead of linearly interpolating. Experimentally we've noticed that this makes us slightly insensitive to changes in admission.elastic_cpu.scheduler_latency_target -- a target of 1ms behaves no differently than, say, 950us.

// TODO(irfansharif): Implement proper linear interpolation instead and then
// write test comparing against it against metric.ValueAtQuantileWindowed.
// If pmax, we'll return the bucket max. If pmin, we'll return the bucket
// min. If the 90th and 100th percentile values lie within some bucket, and
// we're looking for 99.9th percentile, we'll interpolate to the 99% point
// between bucket start and end. Because we're doing this naive mid-point
// thing, it makes for a confusing difference when comparing the p99
// computed off of go.scheduler_latency vs.
// admission.scheduler_latency_listener.p99_nanos.
}

This is partly due to how the bucket boundaries are configured in the Go runtime, something we have no say over (yet):

// bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
// bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
// bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
// bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
// bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)

+cc @cockroachdb/admission-control.

Jira issue: CRDB-20459

@irfansharif irfansharif added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 12, 2022
@irfansharif
Copy link
Contributor Author

@aadityasondhi expressed interest in picking this up, assigning.

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Dec 22, 2022
This patch improves the percentile calculation by using linear
interpolation between bucket boundaries instead of using the mid-point.
The patch also adds tests to compare values of the scheduler histograms
to those found in util/metric.go which are based on the prometheus
implementation to maintain consistency among metrics.

Fixes cockroachdb#89829

Release note: None
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Dec 22, 2022
This patch improves the percentile calculation by using linear
interpolation between bucket boundaries instead of using the mid-point.
The patch also adds tests to compare values of the scheduler histograms
to those found in util/metric.go which are based on the prometheus
implementation to maintain consistency among metrics.

Fixes cockroachdb#89829

Release note: None
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Jan 13, 2023
This patch improves the percentile calculation by using linear
interpolation between bucket boundaries instead of using the mid-point.
The patch also adds tests to compare values of the scheduler histograms
to those found in util/metric.go which are based on the prometheus
implementation to maintain consistency among metrics.

Fixes cockroachdb#89829.

Release note: None
craig bot pushed a commit that referenced this issue Jan 19, 2023
94193: schedulerlatency: implement linear interpolation for percentiles r=aadityasondhi a=aadityasondhi

This patch improves the percentile calculation by using linear interpolation between bucket boundaries instead of using the mid-point. The patch also adds tests to compare values of the scheduler histograms to those found in util/metric.go which are based on the prometheus implementation to maintain consistency among metrics.

Fixes #89829

Release note: None

95231: upgrades: modify InjectLegacyTable to handle dynamically assigned IDs r=postamar a=andyyang890

This patch modifies the `InjectLegacyTable` function, which is used by
upgrade tests to inject an old version of a table descriptor, to handle
the case where the table has a dynamically assigned ID.

Epic: None

Release note: None

Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
@craig craig bot closed this as completed in d82e4ac Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants