diff --git a/content/en/blog/2024/prometheus-compatibility-survey/dots-vs-underscores.png b/content/en/blog/2024/prometheus-compatibility-survey/dots-vs-underscores.png new file mode 100644 index 000000000000..6cd61a2a8bb6 Binary files /dev/null and b/content/en/blog/2024/prometheus-compatibility-survey/dots-vs-underscores.png differ diff --git a/content/en/blog/2024/prometheus-compatibility-survey/index.md b/content/en/blog/2024/prometheus-compatibility-survey/index.md new file mode 100644 index 000000000000..83a6fa8e2cca --- /dev/null +++ b/content/en/blog/2024/prometheus-compatibility-survey/index.md @@ -0,0 +1,198 @@ +--- +title: Insights from the Prometheus Compatibility Survey +linkTitle: Prometheus Compatibility Survey +date: 2024-07-25 +author: '[David Ashpole](https://github.com/dashpole) (Google)' +issue: https://github.com/open-telemetry/sig-end-user/issues/24 +sig: End-User SIG +cSpell:ignore: Ashpole +--- + +[Prometheus](https://prometheus.io/) and OpenTelemetry are two of the most +active and popular projects in the +[CNCF observability landscape](https://landscape.cncf.io/guide#observability-and-analysis--observability). +The two communities have been working together since the early days of +OpenTelemetry to improve the compatibility between the two projects. The +OpenTelemetry Prometheus SIG has been leading this effort, with the active +participation of maintainers from both OpenTelemetry and Prometheus. + +At this point, there is a +[detailed, experimental specification](/docs/specs/otel/compatibility/prometheus_and_openmetrics/) +describing how to convert between the +[OpenTelemetry metrics data model](/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model) +and +[Prometheus metric formats](https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md). +It has been used to implement Prometheus +[(pull) exporters for OpenTelemetry SDKs](https://pkg.go.dev/go.opentelemetry.io/otel/exporters/prometheus), +[OTLP export from Prometheus libraries](https://prometheus.github.io/client_java/otel/otlp/), +[OTLP ingestion for the Prometheus server](https://prometheus.io/docs/prometheus/latest/feature_flags/#otlp-receiver), +and the OpenTelemetry Collector's +[Prometheus Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver), +[Prometheus Remote Write exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter), +and +[Prometheus (pull) exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter). + +One of the most challenging areas to reconcile is that OpenTelemetry metric +names are changed when exporting to Prometheus. Today, the OpenTelemetry +`http.server.request.duration` metric, with unit `s`, is translated to +`http_server_request_duration_seconds` in Prometheus. Some users are familiar +with the Prometheus naming conventions, and appreciate the consistency this +translation provides with existing metrics in the Prometheus ecosystem. Other +users are confused when querying for the original OpenTelemetry name does not +return any results. + +Prometheus is working on support for UTF-8 characters in metric names as part of +its +[2024 roadmap](https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/#support-utf-8-metric-and-label-names), +which potentially allows preserving dots in metric names. To better understand +what users want their Prometheus query experience to look like, +[the OTel x Prometheus Working Group](https://cloud-native.slack.com/archives/C01LSCJBXDZ) +[ran a survey](https://github.com/open-telemetry/sig-end-user/tree/main/end-user-surveys/otel-prom-interoperability) +with the help of the [OpenTelemetry End User SIG](/community/end-user/). +Deciding on the default translation approach is one of the last remaining +blockers for stabilizing the compatibility specification. + +The survey received 86 responses (and 5 spam), and contained many helpful pieces +of feedback. Thank you to everyone that participated! The questions and raw +results can be found +[here](https://github.com/open-telemetry/sig-end-user/blob/main/end-user-surveys/otel-prom-interoperability/otel-prom-interoperability-survey.csv). + +## Overall takeaways + +- A majority (60%) prefer leaving the dots in the metric name, rather than + translating to underscores. +- A slight majority (54%) prefer having the unit in the name, but only 37% think + it should be required. +- Respondents who prefer units in the metric name are likely to also prefer + translating dots to underscores. +- The best predictors of the "units and underscores" group are Prometheus server + experts and being an SRE. +- The best predictors of the "no units and dots" group are OpenTelemetry library + experts and being a developer. + +## Who took the survey + +Survey respondents were mostly from large (>1000 employees) companies (52%) in +the Technology industry (71%). Respondents were more likely to consider +themselves experts with Prometheus-related topics than with +OpenTelemetry-related topics, and were evenly distributed across roles. Nearly +all respondents (>90%) stored metrics in the Prometheus server or another open +source Prometheus backend, and nearly all use PromQL to query their metrics. + +## Sentiment on the Current State + +Overall, respondents were neutral on the question of whether OpenTelemetry was +easy to use with Prometheus, and considered the current translation between +OpenTelemetry and Prometheus somewhat confusing. This was consistent regardless +of their opinions on units or delimiters. + +## Dots and Underscores + +OpenTelemetry [specifies](/docs/specs/semconv/general/attribute-naming/) that +conventions should use dots as the namespace delimiter, and underscores as the +delimiter between "multi-word-dot-delimited components" (for example, +`http.response.status_code`). On the other hand, Prometheus +[uses underscores](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels) +as its delimiter. + +Currently, when exporting in Prometheus format from an OpenTelemetry SDK, all +dots are changed to underscores to comply with the Prometheus requirements. We +wanted to learn whether OpenTelemetry users who used these exporters preferred +to keep the dots in the original metric name, or liked the consistency with +existing Prometheus metrics of translating to underscores. + +Of users who indicated they used OpenTelemetry for metrics, and PromQL as their +query language, 60% preferred keeping the original OpenTelemetry metric name +including dots, and 40% want metric names that match Prometheus conventions with +only underscores. + +![Dots vs underscores pie chart](dots-vs-underscores.png) + +When we asked about specific example PromQL queries or alerts, the results +roughly aligned with the results above. Around 42% of users only selected +queries with dots, and around 39% only selected queries that had underscores. +The final 19% selected a mix of queries that included dots or underscores, +indicating they are likely OK with either approach. + +## Units in Metric Names + +OpenTelemetry [specifies](/docs/specs/semconv/general/metrics/#units) that units +should not generally be included in the metric name. Prometheus conventions +[recommend](https://prometheus.io/docs/practices/naming/#metric-names) that the +unit be included as a suffix of the metric name. OpenMetrics goes a step further +and +[requires this unit suffix](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#unit). +Currently, when exporting in Prometheus format from an OpenTelemetry SDK, the +unit is added as a suffix to the metric name. + +Of users who indicated they used OpenTelemetry for metrics, and PromQL as their +query language, 37% thought units should be a required suffix for metric names, +and 46% thought units should not be added to metric names. The final 17% +preferred the unit in the metric name, but didn't think it should be required. + +![Units in metric name pie chart](units-in-metric-name.png) + +When we asked about specific example PromQL queries or alerts, the results were +somewhat more favorable to including the unit in the metric name compared with +the question above. Around 45% of users only selected queries that included the +unit, and around 28% only selected queries that excluded the unit. The final 27% +selected a mix of queries that included or excluded the unit, indicating they +are likely OK with either approach. + +## Trends + +### Correlation between Unit and Delimiter Preferences + +Preferences generally split into two groups: Those that want to preserve the +original OpenTelemetry metric names, including dots, and without a unit suffix, +and those that prefer changing the name to match Prometheus conventions. 57% of +respondents who want to require units in metric names want to also want to +change dots to underscores. 77% of respondents who don't want units in metric +names prefer dots in metric names. + +### Group Differences + +The best predictors of a preference for units required in the name and changing +dots to underscores were having a role of SRE, and being an expert with the +Prometheus server configuration. For example, 88% of SRE respondents preferred +translating dots to underscores. + +The best predictors of a preference for preserving the OpenTelemetry name with +dots, and without units were having the role of developer, and being an expert +with OpenTelemetry libraries. For example, 88% of developers preferred not +translating dots to underscores. + +## Other feedback + +The most common challenge for all respondents was the instability of +OpenTelemetry instrumentation, and confusion over the conversion logic. +Respondents who preferred OpenTelemetry's conventions listed Prometheus' current +lack of support for OpenTelemetry concepts (resource, scope, delta temporality, +and unit metadata) as their most significant challenge. Respondents who +preferred Prometheus' conventions listed OpenTelemetry's new concepts as +confusing, and were unhappy that OpenTelemetry had deviated from Prometheus' +existing conventions. + +For the most part, this feedback aligns with the future plans in the +OpenTelemetry and Prometheus communities. The OpenTelemetry semantic conventions +SIG is working on stabilizing conventions for a a wide variety of +instrumentation. The OpenTelemetry Prometheus interoperability SIG is working on +incorporating the results of this survey into the compatibility specification. +The Prometheus community has +[ambitious plans](https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/) +to add support for OpenTelemetry concepts. + +## Keep in touch + +Thanks again to everyone who participated in the survey! We rely on your +feedback to help guide the future development of OpenTelemetry and to ensure it +continues to meet your evolving needs. We will post upcoming surveys in the +following avenues: + +- [#otel-sig-end-user Slack channel](https://cloud-native.slack.com/archives/C01RT3MSWGZ) + – you can also reach out to us here! +- [End user resources page](/community/end-user/) + +You can provide further feedback or participate in discussions concerning +OpenTelemetry and Prometheus interoperability in the +[#otel-prometheus-wg Slack channel](https://cloud-native.slack.com/archives/C01LSCJBXDZ). diff --git a/content/en/blog/2024/prometheus-compatibility-survey/units-in-metric-name.png b/content/en/blog/2024/prometheus-compatibility-survey/units-in-metric-name.png new file mode 100644 index 000000000000..5ddf85fb695f Binary files /dev/null and b/content/en/blog/2024/prometheus-compatibility-survey/units-in-metric-name.png differ diff --git a/static/refcache.json b/static/refcache.json index 8c7fd6012a92..3771cb508318 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -335,6 +335,10 @@ "StatusCode": 200, "LastSeen": "2024-01-30T16:15:05.306086-05:00" }, + "https://cloud-native.slack.com/archives/C01LSCJBXDZ": { + "StatusCode": 200, + "LastSeen": "2024-06-13T19:50:17.347862467Z" + }, "https://cloud-native.slack.com/archives/C01N5UCHTEH": { "StatusCode": 200, "LastSeen": "2024-01-30T05:18:56.992279-05:00" @@ -5563,6 +5567,10 @@ "StatusCode": 206, "LastSeen": "2024-01-30T16:06:15.993792-05:00" }, + "https://landscape.cncf.io/guide#observability-and-analysis--observability": { + "StatusCode": 200, + "LastSeen": "2024-06-12T14:28:14.584941196Z" + }, "https://laravel.com/docs/10.x/installation": { "StatusCode": 200, "LastSeen": "2024-01-30T05:18:34.641048-05:00" @@ -7879,6 +7887,10 @@ "StatusCode": 206, "LastSeen": "2024-06-18T13:27:46.505689-04:00" }, + "https://prometheus.github.io/client_java/otel/otlp/": { + "StatusCode": 206, + "LastSeen": "2024-06-13T19:50:16.163446592Z" + }, "https://prometheus.io": { "StatusCode": 206, "LastSeen": "2024-01-18T19:07:18.12399-05:00" @@ -7887,6 +7899,14 @@ "StatusCode": 206, "LastSeen": "2024-01-18T19:07:18.145976-05:00" }, + "https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/": { + "StatusCode": 206, + "LastSeen": "2024-06-18T14:24:38.978819371Z" + }, + "https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/#support-utf-8-metric-and-label-names": { + "StatusCode": 206, + "LastSeen": "2024-06-12T14:28:16.265327643Z" + }, "https://prometheus.io/docs/alerting/latest/alertmanager/": { "StatusCode": 206, "LastSeen": "2024-01-30T16:14:18.042312-05:00"