Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Micrometer metrics bridge] Default bucket boundaries assume milliseconds base unit #8841

Closed
zeitlinger opened this issue Jun 30, 2023 · 5 comments · Fixed by #8856
Closed
Labels
bug Something isn't working

Comments

@zeitlinger
Copy link
Member

Describe the bug

The default bucket boundaries assume that the base time unit is milliseconds.

The problematic lines:

double time = TimeUtils.nanosToUnit(nanos, baseTimeUnit);
otelHistogram.record(time, attributes);

First, the value is converted to the base unit (e.g. seconds) and then this value is recorded into the histogram, which assumes that it receives milliseconds.

As a workaround, you can override the default bucket boundaries.

Steps to reproduce

  1. Set the base time unit to seconds in Micrometer bridge.
  2. Use a histogram.

How to fix

I can give it a try - but I'm unsure about the approach.

  • The histogram bucket API uses doubles, which can be regarded as "milliseconds" or "the base unit that's in use"
  • I'm unsure if the bug is limited to the micrometer bridge
@mateuszrzeszutek
Copy link
Member

First, the value is converted to the base unit (e.g. seconds) and then this value is recorded into the histogram, which assumes that it receives milliseconds.

  • The histogram bucket API uses doubles, which can be regarded as "milliseconds" or "the base unit that's in use"

In the OTel histogram API it's "the base unit that's in use".
OTel histograms are generic histogram instruments -- they are not timers, they can record more than just durations. That's why it is the responsibility of the user to ensure that the values are converted to the correct unit that matches the instrument definition.

Honestly I'm not completely sure what my preferred solution for this problem would be. By default Micrometer Timers have no buckets at all, unless you configure serviceLevelObjectives via the builder. Perhaps what we should do is preserve that behavior in the bridge as well -- and pass an empty bucket list if the user has not explicitly configured them via Micrometer API. At least it would be consistent with how "plain" Micrometer behaves.

@zeitlinger
Copy link
Member Author

By default Micrometer Timers have no buckets at all

Non-SLO buckets are added here: https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/distribution/DistributionStatisticConfig.java#L113

This line is used with the default configuration.
But maybe I misread how it's working.

At least it would be consistent with how "plain" Micrometer behaves.

Being consistent with plain micrometer by default is a good idea.

This leads to other questions

  • If you want histograms, should you configure them just for micrometer, or use the otel default bucket boundaries - (scaled by base unit)
  • Should we support configuring micrometer histograms for library use case - (this is simple to do, I've already tried that locally)?
  • Should we support configuring micrometer histograms for agent - (this is hard to do, unless we use a config file like for metric views)?

@mateuszrzeszutek
Copy link
Member

Non-SLO buckets are added here: https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/distribution/DistributionStatisticConfig.java#L113

Yes, but only when supportsAggregablePercentiles is true -- and we set it to false everywhere in the OTel bridge. Effectively it's just SLOs.

  • If you want histograms, should you configure them just for micrometer, or use the otel default bucket boundaries - (scaled by base unit)

I'm thinking it should work exactly as if it was "just Micrometer": if you have a Timer/DistributionSummary, and you want them to emit some buckets, you have to set them yourself when building the instrument.
(or alternatively you could use the SDK views, because that's always an option with OTel; but I think that's not really relevant here)

  • Should we support configuring micrometer histograms for library use case - (this is simple to do, I've already tried that locally)?
  • Should we support configuring micrometer histograms for agent - (this is hard to do, unless we use a config file like for metric views)?

I don't quite get what you mean by "configuring micrometer histograms". There should be no meaningful difference between the library/javaagent mode, it's the same code after all. I.e. if you want buckets you use the instrument builder to set the SLOs.

@zeitlinger
Copy link
Member Author

supportsAggregablePercentiles

I think I was mistakes about what is done in micrometer vs. the otel micrometer registry.
After some debugging, I saw that the otel buckets are actually used (at least for http.server.requests from micrometer).

Once I reached that conclusion, I adjusted the otel bucket boundaries instead - leading me to a different question: Should the otel bucket boundaries accommodate for seconds as the base unit?

I think this is only relevant in combination with micrometer, because the agent uses explicit buckets when using seconds. Micrometer, however, doesn't use explicit bucket boundaries for http.server.requests.

I'm thinking it should work exactly as if it was "just Micrometer"

micrometer is not setting the bucket boundaries for http.server.requests - yet it works out of the box.

@mateuszrzeszutek
Copy link
Member

mateuszrzeszutek commented Jul 4, 2023

micrometer is not setting the bucket boundaries for http.server.requests - yet it works out of the box.

Yeah -- by default micrometer histograms (Timer and DistributionSummary) don't have buckets at all. This is exactly what I implemented in #8856.

Once I reached that conclusion, I adjusted the otel bucket boundaries instead - leading me to a different question: Should the otel bucket boundaries accommodate for seconds as the base unit?

If you're talking about the default explicit histogram buckets, there was a discussion about that in open-telemetry/opentelemetry-specification#3509 -- in short, we're not doing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants