Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: refactor histogram bucket generation and testing #107388

Merged
merged 1 commit into from
Aug 10, 2023

Conversation

ericharmeling
Copy link
Contributor

This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation.

This refactor should make it easier to add additional metric categories, distributions, and bucket types.

Part of #97144.

Release note: None

@ericharmeling ericharmeling requested review from jmcarp, abarganier and a team July 21, 2023 20:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from bf1cb97 to 81440e9 Compare July 24, 2023 14:04
Copy link
Collaborator

@jmcarp jmcarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! I have a non-blocking question about how to integrate this with #104302. If we want to implement a rule like "optionally enable native histograms for all histograms following an exponential distribution" or "optionally enable native histograms for all histograms that represent a size", would we want to make distribution or histogram type public? We might also want to pass a StaticBucketConfig to NewHistogram rather than a []float64 of buckets. Or should those details not be exposed elsewhere?

@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from 81440e9 to f920b0c Compare July 24, 2023 21:10
@ericharmeling
Copy link
Contributor Author

If we want to implement a rule like "optionally enable native histograms for all histograms following an exponential distribution" or "optionally enable native histograms for all histograms that represent a size", would we want to make distribution or histogram type public?

I like those rules! We'd be defining said rules within the metrics package, right? In general, I don't think we want to export types outside their package until we have a reason to do so. So I've just made these types "private" for now - is that what you were suggesting?

We might also want to pass a StaticBucketConfig to NewHistogram rather than a []float64 of buckets. Or should those details not be exposed elsewhere?

I like the idea of keeping bucket generation in its own file (histogram_buckets.go), and then just using the []float64 bucket vars exported from that file when calling NewHistogram. But again, we can do that without exporting StaticBucketConfig.

I imagine the configs (distribution, min, max, count) for these histograms will continue to be per-category of metric (e.g., IO Latency metrics). And the fit of that category will be tested by (forthcoming) integration tests, which ideally will be able to suggest a better fit, or a new category. Does that sound sensible?

@jmcarp
Copy link
Collaborator

jmcarp commented Jul 26, 2023

I like the idea of keeping bucket generation in its own file (histogram_buckets.go), and then just using the []float64 bucket vars exported from that file when calling NewHistogram.

What I'm wondering is: how should we tell NewHistogram whether or not to expose native histograms alongside conventional histograms? If we want to use a rule like "expose native histograms if distribution is exponential", then NewHistogram needs to know the distribution type of the buckets, not just the []float of bucket bounds. In that case I could imagine passing a StaticBucketConfig to HistogramOptions instead of a []float.

Or we could do something more explicit, like adding SupportsNativeHistogram to HistogramOptions and setting to true as appropriate.

Yet another option: we could expose native histograms for many or even all histograms, then decide whether we want to use conventional or native buckets on the prometheus side.

What's your preference? I also don't want to hold up this PR, so we can also discuss in #104302.

@ericharmeling ericharmeling requested review from a team as code owners July 26, 2023 18:06
@ericharmeling ericharmeling requested a review from a team July 26, 2023 18:06
@ericharmeling ericharmeling requested review from a team as code owners July 26, 2023 18:06
@ericharmeling ericharmeling requested a review from a team July 26, 2023 18:06
@ericharmeling ericharmeling requested a review from a team as a code owner July 26, 2023 18:06
@ericharmeling ericharmeling requested review from rhu713, miretskiy and msirek and removed request for a team July 26, 2023 18:06
@ericharmeling
Copy link
Contributor Author

If we want to use a rule like "expose native histograms if distribution is exponential", then NewHistogram needs to know the distribution type of the buckets, not just the []float of bucket bounds. In that case I could imagine passing a StaticBucketConfig to HistogramOptions instead of a []float.

That makes sense. The only issue I can see is that we might want to pass non-generated, static buckets (for tests mainly) to HistogramOptions. But I suppose we can have both Buckets []float and BucketConfig StaticBucketConfig in there? And just use the generator when Buckets is nil and we have a non-zero StaticBucketConfig. Does that make sense? I've pushed a commit with that refactor.

@ericharmeling
Copy link
Contributor Author

Or we could do something more explicit, like adding SupportsNativeHistogram to HistogramOptions and setting to true as appropriate.

I don't really have a preference on where the SupportsNativeHistogram is set/evaluated, but here looks good to me: https://github.com/cockroachdb/cockroach/pull/104302/files#diff-7d9143ffc52f8bb21b9a9f1dfdcef120129a14453e54e126e9ba6e3da4b54941R298. With the distribution exposed via HistogramOptions.BucketConfig, it'd be nice to just have the nativeHistogramsEnabled env var check + the distribution check.

Yet another option: we could expose native histograms for many or even all histograms, then decide whether we want to use conventional or native buckets on the prometheus side.

This wouldn't be a terrible idea since native histograms are so space-efficient... We'd just want the bucket sizes to be big enough to make up for any precision lost for the metrics that don't follow exponential distributions. Any other cons to using this approach?

@shermanCRL shermanCRL requested review from samiskin and removed request for miretskiy July 26, 2023 18:20
@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from 82680c0 to 203ec16 Compare July 26, 2023 18:53
Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR @abarganier!

I think I got to all your comments. Reworked the StaticBucketConfigs map. PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @msirek, @rhu713, and @samiskin)


pkg/ccl/sqlproxyccl/metrics.go line 240 at r3 (raw file):

don't we have these static buckets already assigned to public vars in the metric package?

On master, we have exported vars for the static buckets (of type []float64), to pass to NewHistogram here. But we are now passing bucket configs to NewHistogram.

If the concern is with the performance of map lookups, we can use ints instead of strings as the keys for O(1) lookups? I went ahead and made that change, but happy to rethink that.


pkg/util/metric/histogram_buckets.go line 26 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: comments for this and other newly types created in this file.

Done.


pkg/util/metric/histogram_buckets.go line 32 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: if we're gonna leave these commented out, maybe add a bit more explanation about what we plan to do with these in the future.

I removed them, and instead added a to-do.


pkg/util/metric/histogram_buckets.go line 44 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: a bit more explanation about what that means/its affects would be helpful.

I actually think we might as well just take this precision testing stuff out and save it for the actual roachtest PR (especially if that work takes longer than expected to get merged).


pkg/util/metric/histogram_buckets.go line 50 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: for these and the other bucket configs, some comments converting these min/max values into units more.easily parsed (E.g. ms, s, kb, etc.) would be helpful for inspection/readability later on.

Done.


pkg/util/metric/histogram_buckets.go line 116 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: I think generally the pattern we take is assigning the results of this call to a package-level variable at initialization. Do we need it to be called each time getBuckets is called? It could lead to some inconsistent state if the env var value is changed while the bucket vars are still being initialized.

I actually think we might as well just take this precision testing stuff out and save it for the actual roachtest PR (especially if that work takes longer than expected to get merged).

But I'll follow the recommended pattern in that forthcoming PR.


pkg/util/metric/histogram_buckets.go line 114 at r3 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: can this function be defined on the staticBucketConfig type itself? Then we could call config.GetBuckets()?

Done.

@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from 95bc8d9 to 0d24080 Compare August 1, 2023 19:37
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments! Just one last thing re: the map usage, lmk what you think.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @jmcarp, @msirek, @rhu713, and @samiskin)


pkg/ccl/sqlproxyccl/metrics.go line 240 at r3 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

don't we have these static buckets already assigned to public vars in the metric package?

On master, we have exported vars for the static buckets (of type []float64), to pass to NewHistogram here. But we are now passing bucket configs to NewHistogram.

If the concern is with the performance of map lookups, we can use ints instead of strings as the keys for O(1) lookups? I went ahead and made that change, but happy to rethink that.

It wasn't so much about about the performance of the map lookups, but more the convenience/intuitiveness the API. Maybe I'm missing something, but what purpose does the map serve if its elements are essentially vars defined at initialization anyway?

Could we not just, for example, have BucketConfig: metric.IOLatencyBuckets instead of BucketConfig: metric.StaticBucketConfigs[metric.IOLatencyBuckets] here and elsewhere, removing the need for all these diffs?

I guess I'm not seeing the histType type as necessary. Could each type not just be an exported package-level variable instead?


pkg/util/metric/histogram_buckets.go line 44 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

I actually think we might as well just take this precision testing stuff out and save it for the actual roachtest PR (especially if that work takes longer than expected to get merged).

SGTM!


pkg/util/metric/histogram_buckets.go line 50 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Done.

Thanks!


pkg/util/metric/histogram_buckets.go line 116 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

I actually think we might as well just take this precision testing stuff out and save it for the actual roachtest PR (especially if that work takes longer than expected to get merged).

But I'll follow the recommended pattern in that forthcoming PR.

Sounds good, thanks.

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @jmcarp, @msirek, @rhu713, and @samiskin)


pkg/ccl/sqlproxyccl/metrics.go line 240 at r3 (raw file):

Previously, abarganier (Alex Barganier) wrote…

It wasn't so much about about the performance of the map lookups, but more the convenience/intuitiveness the API. Maybe I'm missing something, but what purpose does the map serve if its elements are essentially vars defined at initialization anyway?

Could we not just, for example, have BucketConfig: metric.IOLatencyBuckets instead of BucketConfig: metric.StaticBucketConfigs[metric.IOLatencyBuckets] here and elsewhere, removing the need for all these diffs?

I guess I'm not seeing the histType type as necessary. Could each type not just be an exported package-level variable instead?

Also, we could leave the Buckets member of metric.HistogramOptions as the same name, instead of changing it to BucketConfig, and completely remove the need for any of these diffs altogether. The type of Buckets in the options should be enough to convey to the user that they are configs and not []float64

@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from 0d24080 to 166901c Compare August 9, 2023 15:23
Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I've updated the PR following your latest feedback.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @jmcarp, @msirek, @rhu713, and @samiskin)


pkg/ccl/sqlproxyccl/metrics.go line 240 at r3 (raw file):

It wasn't so much about about the performance of the map lookups, but more the convenience/intuitiveness the API. Maybe I'm missing something, but what purpose does the map serve if its elements are essentially vars defined at initialization anyway?

Ah okay. I think my main motivation for using a map for StaticBucketConfigs here is to simplify the addition of new bucket types. When we introduce a new bucket type, we'd just add it to the map, and then it will automatically be part of the test loop in pkg/util/metric/histogram_buckets_test.go. Declaring an explicit var for each would require us to declare an exported var for each, and then append each one individually to a slice that replaces the StaticBucketConfigs map for the test loop. If you forget to append the exported var to the slice, then its generation could go untested.

On reflection though, this motivation is a bit weak, and I'm pretty ambivalent about it. I've updated the PR as suggested. See the latest commit.

I guess I'm not seeing the histType type as necessary. Could each type not just be an exported package-level variable instead?

Removed the type and exported each as a package-level var as suggested, in the latest commit.

Also, we could leave the Buckets member of metric.HistogramOptions as the same name, instead of changing it to BucketConfig, and completely remove the need for any of these diffs altogether. The type of Buckets in the options should be enough to convey to the user that they are configs and not []float64

This PR doesn't remove the original Buckets []float64 from metric.HistogramOptions, since we have some tests that use a custom (non-config-generated) bucket (see #107388 (comment)).

@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from 166901c to f9358b8 Compare August 9, 2023 19:55
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling, @jmcarp, @msirek, @rhu713, and @samiskin)


pkg/ccl/sqlproxyccl/metrics.go line 240 at r3 (raw file):

This PR doesn't remove the original Buckets []float64 from metric.HistogramOptions, since we have some tests that use a custom (non-config-generated) bucket (see #107388 (comment)).

Ack, nevermind on that one then.

This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
@ericharmeling ericharmeling force-pushed the refactor-histogram-buckets branch from f9358b8 to c748c25 Compare August 10, 2023 17:34
@ericharmeling
Copy link
Contributor Author

bors r+

@ericharmeling ericharmeling removed request for a team, rhu713, msirek and samiskin August 10, 2023 19:47
@craig
Copy link
Contributor

craig bot commented Aug 10, 2023

Build succeeded:

@craig craig bot merged commit 618c73a into cockroachdb:master Aug 10, 2023
@jmcarp
Copy link
Collaborator

jmcarp commented Aug 31, 2023

blathers backport 23.1

@blathers-crl
Copy link

blathers-crl bot commented Aug 31, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from c748c25 to blathers/backport-release-23.1-107388: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants