Skip to content

Commit

Permalink
Merge pull request #38 from prometheus/arve/otel-resource-attribute-p…
Browse files Browse the repository at this point in the history
…romotion

Add OTel resource attribute promotion proposal
  • Loading branch information
aknuds1 authored Jul 15, 2024
2 parents eabf062 + 9191fcb commit 3ab2580
Showing 1 changed file with 89 additions and 0 deletions.
89 changes: 89 additions & 0 deletions proposals/2024-07-15-otel-resource-attribute-promotion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# OTel resource attribute promotion

* **Owners:**
* Arve Knudsen [@aknuds1](https://github.com/aknuds1) [[email protected]](mailto:[email protected])

* **Implementation Status:** Partially implemented

* **Related Issues and PRs:**
* [WIP: OTLP Translator prometheusremotewrite: Support resource attribute promotion](https://github.com/prometheus/prometheus/pull/14200)

* **Other docs or links:**

> This proposal collects the requirements and implementation proposals for supporting OTel resource attribute promotion to labels.
## Why

Currently, Prometheus encodes OpenTelemetry (OTel for short) [resource attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md) as labels of the `target_info` metric.
OTel resource attributes model metadata about the environment producing metrics received by the backend (e.g. Prometheus).
Typically, OTel users want to include some of these attributes (as `target_info` labels) in their Prometheus query results, to correlate them with entities of theirs (e.g. K8s pods). This is similar to copying target labels from Service Discovery attributes. For example, users commonly copy `namespace`, `deployment`, etc. labels to make it easier to query the metrics. They should be able to copy over similar attributes when ingesting OTLP, i.e, `k8s.namespace.name`, `k8s.deployment.name`, etc.

Based on user demand, it would be preferable if Prometheus were to have better UX for including OTel resource attributes in query results.
The current solution is to join with `target_info in queries, to pick also the labels one is interested in (corresponding to OTel resource attributes).
This requires relatively advanced knowledge of PromQL though and is a barrier to many users.
Take as an example querying HTTP request rates per K8s cluster and status code, while having to join with the `target_info` metric to obtain the `k8s.cluster.name` resource attribute (encoded as `k8s_cluster_name`):

```promql
# Join with target_info on job and instance labels, to include k8s_cluster_name.
sum by (k8s_cluster_name, http_status_code) (
rate(http_server_request_duration_seconds_count[2m])
* on (job, instance) group_left (k8s_cluster_name)
target_info
)
```

### Pitfalls of the current solution

As already mentioned, the current solution of including OTel resource attributes in query results through join queries represents a technical barrier to users.
Also, it requires the user to know which `target_info` labels can be joined on (i.e., `job` and `instance`), plus which labels represent the various OTel resource attributes.
All in all, the UX for including OTel resource attributes in Prometheus query results is not very smooth.

## Goals

Goals and use cases for the solution as proposed in [How](#how):

* Support, in the OTLP endpoint, automatic promotion of a configurable set of OTel resource attributes to metric labels.

### Audience

Prometheus maintainers.

## How

* Make the OTLP endpoint support a configurable set of OTel resource attributes to promote to metric labels.
* Add a Prometheus configuration parameter for which OTel resource attributes to promote (default: none).

With OTel resource attribute promotion configured to `[k8s.cluster.name]`, we can simplify the previously given PromQL join example as follows:

```
sum by (k8s_cluster_name, http_status_code) (
rate(http_server_request_duration_seconds_count[2m])
)
```

## Alternatives

### Simplify joins with info metrics in PromQL

Instead of promoting selected OTel resource attributes to labels at ingest time, another [proposal](https://github.com/prometheus/proposals/pull/37) is to simplify the joining with `target_info` in queries.
These proposals are not necessarily competing though, as the respective proposed features can co-exist.

#### Pros

* Avoids having to add more labels to metrics than strictly required to identify them.
* Avoids series churn when one or more of the promoted OTel resource attributes change.
* More labels per metric increases CPU/memory usage.
* Avoids the user having to decide up front which OTel resource attributes to promote at ingestion time.
* Avoids series churn when the user changes which OTel resource attributes to promote.
* Simply improves the UX for the existing solution of encoding OTel resource attributes as `target_info` labels.

#### Cons

* Much more complicated to implement.
* Requires the user to call `info` in their queries.

## Action Plan

The tasks to do in order to migrate to the new idea.

* [ ] https://github.com/prometheus/prometheus/pull/14200

0 comments on commit 3ab2580

Please sign in to comment.