Update histogram protocol #152

yzhuge · 2020-10-22T18:46:57Z

This is a PR for open-telemetry/opentelemetry-specification#982. Main changes

Allow both integer (uint64) and double as bucket counts. Integer is preferred as it can be efficiently encoded by protobuf using varint, where leading zero bits are optimized out.
Added min and max fields (all fields are optional in proto3), per discussion "Exact min and max values are still in demand for histogram data points"
Added negative number support for exponential histograms.
Added optional num_of_linear_subbuckets fields in exponential histogram to efficiently represent "log linear" family histograms such as HdrHistogram, Circllhisto, or DDSketch BitwiseLinearlyInterpolatedMapping.
Added optional HistogramProducer enum. A consumer may use it as a hint to generate a histogram in the producer's original format.

Terminology note: For exponential histograms, "base" is used for log base, and exponent base, consistent with standard math terminology. "Reference" is used for the multiplier on exponential scale, consistent with common usage in log scale unit such as deci bell.

Compared to custom protocol for DDSketch (https://github.com/DataDog/sketches-java/blob/master/src/main/proto/DDSketch.proto), a DDsketch created using the logarithm method can be represented as reference=1, base=gamma, index_offset=contiguousBinIndexOffset. DDSketch created using log approximation methods such as quadratic or cubic methods has to use the explicit bound encoding.

To encode quadratic or cubic methods is simple. We can just add an "approximation method" field. But this would require all backend consumers of this protocol to properly decode bucket bounds encoded this way. While linear subbuckets is easy to understand and implement, quadratic or cubic methods are not. There is no doc on mathematical description of the exact formula used. In fact, there are many quadratic and cubic approximation methods for log. None could be considered "canonical". Simply say "quadratic" or "cubic" does not tell the backend how to process the data. Thus I hesitate to include such methods into standard.

The proposed protocol gives users two options for efficient exponential histogram encoding:

Cpu optimized histogram, using log-linear format. Multiple vendors can generate such format, including hdrHistogram, Circllhisto, and DDSketch's "fast" option (https://github.com/DataDog/sketches-java/blob/master/src/main/java/com/datadoghq/sketch/ddsketch/mapping/BitwiseLinearlyInterpolatedMapping.java)
Memory optimized histogram, using standard math logarithm function. DDSketch "memoryOptimized" option produces this. Note that this option only reduces histogram size by about 30%, but is often many times more expensive on cpu cost.

I consider this to be a good balance on choice and complexity in a standard.

jbarciauskas · 2020-10-22T20:58:11Z

proto/openmetrics_data_model.proto

  }

-  repeated BucketCount bucket_counts = 5;
+  message BucketCounts {


I can't see if it says anywhere, I assume that these counts are not cumulative in all cases?

brian-brazil · 2020-10-22T22:21:47Z

I think you've opened this against the wrong proto and repository. This is a draft OpenMetrics proto, not an OpenTelemetry proto.

If you'd like to see what the final OpenMetrics proto will look like see #151, no changes are envisioned beyond comment text at this very late stage of defining OpenMetrics.

yzhuge · 2020-10-23T00:26:11Z

sorry getting to the wrong repo. Closing

yzhuge added 4 commits October 21, 2020 18:39

update histogram protocol

217b415

minor touch up

c33e783

update comments

5414ba5

update comments

9b5366a

yzhuge mentioned this pull request Oct 22, 2020

Metrics Histogram instrument default: any histogram sketch open-telemetry/opentelemetry-specification#982

Closed

jbarciauskas reviewed Oct 22, 2020

View reviewed changes

yzhuge closed this Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update histogram protocol #152

Update histogram protocol #152

yzhuge commented Oct 22, 2020 •

edited

Loading

jbarciauskas Oct 22, 2020

brian-brazil commented Oct 22, 2020

yzhuge commented Oct 23, 2020

Update histogram protocol #152

Update histogram protocol #152

Conversation

yzhuge commented Oct 22, 2020 • edited Loading

jbarciauskas Oct 22, 2020

Choose a reason for hiding this comment

brian-brazil commented Oct 22, 2020

yzhuge commented Oct 23, 2020

yzhuge commented Oct 22, 2020 •

edited

Loading