Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of Prometheus <-> OTLP datamodel specification. #2017

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions specification/metrics/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -988,6 +988,125 @@ For comparison, see the simple logic used in
[statsd sums](https://github.com/statsd/statsd/blob/master/stats.js#L281)
where all points are added, and lost points are ignored.

## Prometheus Compatibility

**Status**: [Experimental](../document-status.md)

This section denotes how to convert from prometheus scraped metrics to the
OpenTelemtery metric data model and how to create prometheus metrics from
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenTelemtery metric data model and how to create prometheus metrics from
OpenTelemetry metric data model and how to create Prometheus metrics from

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, shouldn't Prometheus be uppercase throughout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was hastily written, will go through and fix.

OpenTelemetry metric data.

### Label Mapping

Prometheus metric labels are split in OpenTelemetry across Resource attributes
and Metric data stream attributes. Some labels are used within metric
families to denote semantics which open-telemetry captures within the structure
Copy link

@hdost hdost Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional open-telemetry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, good catch!

of a data point. When mapping from prometheus to OpenTelemetry, any label
that is not explicitly called out as being handled specially will be included
in the set of attributes for a metric data stream.

Here is a table of the sett of prometheus labels that are lifted into Resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here is a table of the sett of prometheus labels that are lifted into Resource
Here is a table of the set of prometheus labels that are lifted into Resource

attributes when converting into OpenTelemetry.

| Prometheus Label | OTLP Resource Attribute | Description |
| -------------------- | ----------------------- | ----------- |
| `job` | `service.name` | [Semantic convention](../resource/semantic_conventions/README.md#Service) |
| `host` | `host.name` | [Semantic convention](../resource/semantic_conventions/host.md) |
| `instance` | `instance` | ... |
| `port` | `port` | ... |
| `__scheme__` | `scheme` | ... |

Next, this set of attributes are "special" and used when converting from a
metric family to a specific OTLP metric data point:

- `le` label is used to identify histogram bucket boundaries and counts.
- `quantile` label is used to identify quantile points in summary metrics.
- `__name__` is used to identify the metric name of the data point.
- `__metrics_path__` is ignored in OTLP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for __name__ (which is not part of the OpenMetrics protocol, has built-in meaning in the relabeling machinery, and is special in the Prometheus Remote Write protocol), I believe all the other __-prefixed labels are meant for the relabeling machinery to have access to, they're just "not exported".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collector seems to use __schema__, but not sure if that's a PRW detail or not.


Additionally, in Prometheus metric labels must match the following regex: `[a-zA-Z_:]([a-zA-Z0-9_:])*`. Metrics
from OpenTelemetry with unsupported Attribute names should replace invalid characters with the `_` character. This
Copy link
Member

@reyang reyang Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we take a special case here:

if the first character of the label key is a digit ([0-9]), prepend a _ instead of replace it with _

For example:

OpenTelemetry attribute name Prometheus label key
"9123" "_9123"

Rather than:

OpenTelemetry attribute name Prometheus label key
"9123" "_123"

may cause ambiguity in scenarios where mulitple similar-named attributes share invalid characters at the same
location. This is considered an unsupported case, and is highly unlikely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
location. This is considered an unsupported case, and is highly unlikely.
location. This is considered as an unsupported case, and is highly unlikely.


### Prometheus Metric points to OTLP

Prometheus allows metrics to reported in "metric family" groups. While
OpenTelemetry can assume some shape/structure to metric family groups, any
metric belonging to a family that is not treated specially should be exported
as its own metric stream. For example, while histograms are expected to
live in a metric family with metrics `{name}_sum`, `{name}_count` and `{name}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Prometheus best practices for Histogram and Summaries, we should include unit:

  1. If there is no unit (empty string, null string), histogram should use {name}, {name}_count and {name}_sum.
  2. If there is a valid unit, histogram should use {name}_{unit}, {name}_{unit}_count and {name}_{unit}_sum.

any other metric also reported in the family should be exported as an
independent metric in OpenTelemetry.

TODO - A bit about detecting/using start_time.

Prometheus Counter becomes an OTLP Sum.

Prometheus Gauge becomes an OTLP Gauge.

Prometheus Unknown becomes an OTLP Gauge.

Prometheus Histogram becomes an OTLP Histogram.

Prometheus Summary becomes an OTLP Summary.

Prometheus Gauge Histogram is dropped (TBD).

Prometheus Stateset is dropped (TBD).

Prometheus Info is dropped (TBD).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered Prometheus Info are typically represented with a constant 1 I could see this easily mapping into a OTel Gauge. As if I remember correctly they're already exported as Gauges. (So maybe it doesn't need to be specified)

Copy link

@hdost hdost Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok looking at Java client it used to be a part of Gauge metric family, but now it's been moved to Unknown. So based on our reference either way it should be a gauge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify it in some way. I think coming in as a gauge works, we just need a way to do OpenMetrics -> OTLP -> OpenMetrics here.


### OTLP Metric points to Prometheus

OpenTelemetry Gauge becomes a Prometheus Gauge.

TODO: Example Gauge Conversions

OpenTelemetry Sum follows this logic:

- If the aggregation temporality is cumulative and the sum is monotonic,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify this on the reverse that Prometheus Counter becomes a cumulative monotonic OTLP Sum ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

then it becomes a Prometheus Sum.
- Otherwise the Sum becomes a Prometheus Gauge.

TODO: Example Sum Conversions

OpenTelemetry Histogram becomes a metric family with the following:

- A single `{name}_count` metric denoting the count field of the histogram.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus doesn't have a separate notion of "unit", based on the Prometheus Metric and Label naming convention, it seems the general guidance is to append unit as a suffix. For example:

OpenTelemetry metric name (from instrument/view) OpenTelemetry metric unit Prometheus metric name
http.server.duration milliseconds http_server_duration_milliseconds
http.server.active_requests N/A http_server_active_requests

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that prometheus is moving the direction of OpenMetrics which does support Units as a concept https://github.com/OpenObservability/OpenMetrics we may want to suggest that path, so not changing the metrics name but instead is metadata on the metric.

All attributes of the histogram point are converted to prometheus labels.
- `{name}_sum` metric denoting the sum field of the histogram, reported
only if the sum is positive and monotonic. All attributes of the histogram
point are converted to prometheus labels.
- A series of `{name}` metric points that contain all attributes of the
histogram point recorded as labels. Additionally, a label, denoted as `le`
is added denoting a bucket boundary, and having its value be the stringified
floating point value of bucket boundaries, starting form lowest to highest,
and all being non-negative. The value of each point is the sum of the count
of all histogram buckets up the the boundary reported in the `le` label.
These points will include a single exemplar that falls within `le` label and
no other `le` labelled point.

_Note: OpenTelemetry DELTA histograms are not exported to prometheus._

TODO: Example Histogram conversion

OpenTelemetry Summary becomes a metric family with the following:

- A single `{name}_count` metric denoting the count field of the summary.
All attributes of the summary point are converted to prometheus labels.
- `{name}_sum` metric denoting the sum field of the summary, reported
only if the sum is positive and monotonic. All attributes of the summary
point are converted to prometheus labels.
- A series of `{name}` metric points that contain all attributes of the
summary point recorded as labels. Additionally, a label, denoted as
`quantile` is added denoting a reported qunatile point, and having its value
be the stringified floating point value of quantiles (between 0.0 and 1.0),
starting from lowest to highest, and all being non-negative. The value of
each point is the computed value of the quantile point.

TODO: Example Summary conversion

## Footnotes

<a name="otlpdatapointfn">[1]</a>: OTLP supports data point kinds that do not
Expand Down