Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Prometheus histogram buckets calculation #26903

Closed
exekias opened this issue Jul 15, 2021 · 10 comments
Closed

Adjust Prometheus histogram buckets calculation #26903

exekias opened this issue Jul 15, 2021 · 10 comments
Labels
enhancement Team:Integrations Label for the Integrations team

Comments

@exekias
Copy link
Contributor

exekias commented Jul 15, 2021

We want to use the same algorithm to translate Prometheus histograms into Elasticsearch histograms. We have agreed on the approach to something like this:

  • for +Inf "le", use the preceding bucket's value
  • for the first bucket only: if it has a negative "le", use the value as-is; otherwise use half its value (midpoint to zero)
  • for all other buckets, use the midpoint from that bucket's value to the preceding bucket's

I few changes are need to comply with this implementation in Beats, mainly:

  • Account for negative initial buckets
  • Use the preceding bucket's value for +Inf "le"
@exekias exekias added enhancement Team:Integrations Label for the Integrations team labels Jul 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic
Copy link

botelastic bot commented Jul 15, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 15, 2022
@axw
Copy link
Member

axw commented Jul 17, 2022

Still relevant.

@botelastic botelastic bot removed the Stalled label Jul 17, 2022
@gizas
Copy link
Contributor

gizas commented Jan 17, 2023

Hello @axw , @exekias ,

The Prometheus Beat histogram implementation is solely focusing on centroids: https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/prometheus/collector/histogram.go#L48

From what I udnerstand, both types of Histograms we support (either TDigest or HDR), both support only count and values and the only difference is how we calculate the values.

So Is there any assumption why we solely focus on centroids?
(elastic/apm-agent-python#1165 (comment))

cc @tetianakravchenko , @jsoriano

@axw
Copy link
Member

axw commented Jan 18, 2023

@gizas in my reply under the comment you linked:

Is that storage format optimizing for tdigest or hdrhistogram-based percentile aggregations?

t-digest. Prometheus histograms are general and may have negative values, which won't work with HDRHistogram.

i.e. the primary reason for calculating centroids is that we assume the use of T-Digest because HDRHistogram does not support negative values.

@gizas
Copy link
Contributor

gizas commented Jan 18, 2023

Ohh thank you @axw ! Really helpful.

So checking the reference Building Histogram: While this means the field can technically be aggregated with either algorithm, in practice the user should chose one algorithm and index data in that manner

So this is a choice that users make on the tool of that produces the metrics. So if customer sends HDRHistogram metrics, will this mean that we can not support it? Can we catch this case with relevant error handling?

@axw
Copy link
Member

axw commented Jan 18, 2023

So this is a choice that users make on the tool of that produces the metrics. So if customer sends HDRHistogram metrics, will this mean that we can not support it?

Not sure if I understand your question. We support it by converting the histogram to T-Digest. There will be some loss of precision, but it'll still be supported.

@gizas
Copy link
Contributor

gizas commented Jan 18, 2023

Ok we support it because the format of data are the same in both types, but yes I am worried about this loss of precision. (this is actually what the can not support it was referring to). Thank you again

@tetianakravchenko
Copy link
Contributor

@gizas should we close this issue? in favor of elastic/integrations#5042 ?

@gizas
Copy link
Contributor

gizas commented Feb 6, 2023

Closing as per comment above, duplicate of elastic/integrations#5042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants