Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update histogram protocol #152

Closed

Conversation

yzhuge
Copy link

@yzhuge yzhuge commented Oct 22, 2020

This is a PR for open-telemetry/opentelemetry-specification#982. Main changes

  • Allow both integer (uint64) and double as bucket counts. Integer is preferred as it can be efficiently encoded by protobuf using varint, where leading zero bits are optimized out.
  • Added min and max fields (all fields are optional in proto3), per discussion "Exact min and max values are still in demand for histogram data points"
  • Added negative number support for exponential histograms.
  • Added optional num_of_linear_subbuckets fields in exponential histogram to efficiently represent "log linear" family histograms such as HdrHistogram, Circllhisto, or DDSketch BitwiseLinearlyInterpolatedMapping.
  • Added optional HistogramProducer enum. A consumer may use it as a hint to generate a histogram in the producer's original format.

Terminology note: For exponential histograms, "base" is used for log base, and exponent base, consistent with standard math terminology. "Reference" is used for the multiplier on exponential scale, consistent with common usage in log scale unit such as deci bell.

Compared to custom protocol for DDSketch (https://github.com/DataDog/sketches-java/blob/master/src/main/proto/DDSketch.proto), a DDsketch created using the logarithm method can be represented as reference=1, base=gamma, index_offset=contiguousBinIndexOffset. DDSketch created using log approximation methods such as quadratic or cubic methods has to use the explicit bound encoding.

To encode quadratic or cubic methods is simple. We can just add an "approximation method" field. But this would require all backend consumers of this protocol to properly decode bucket bounds encoded this way. While linear subbuckets is easy to understand and implement, quadratic or cubic methods are not. There is no doc on mathematical description of the exact formula used. In fact, there are many quadratic and cubic approximation methods for log. None could be considered "canonical". Simply say "quadratic" or "cubic" does not tell the backend how to process the data. Thus I hesitate to include such methods into standard.

The proposed protocol gives users two options for efficient exponential histogram encoding:

I consider this to be a good balance on choice and complexity in a standard.

}

repeated BucketCount bucket_counts = 5;
message BucketCounts {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see if it says anywhere, I assume that these counts are not cumulative in all cases?

@brian-brazil
Copy link
Contributor

I think you've opened this against the wrong proto and repository. This is a draft OpenMetrics proto, not an OpenTelemetry proto.

If you'd like to see what the final OpenMetrics proto will look like see #151, no changes are envisioned beyond comment text at this very late stage of defining OpenMetrics.

@yzhuge
Copy link
Author

yzhuge commented Oct 23, 2020

sorry getting to the wrong repo. Closing

@yzhuge yzhuge closed this Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants