Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring which resource attributes should be kept at the metric #22914

Closed
jmichalek132 opened this issue May 29, 2023 · 11 comments
Closed

Comments

@jmichalek132
Copy link
Contributor

Component(s)

pkg/translator/prometheusremotewrite

Is your feature request related to a problem? Please describe.

Right now prometheus remote write exporter allows you to either use target_info metric to store resource attributes or you can configure it to attach all the resource attribute on the existing metric.

In our case we are currently using the target_info metric approach. However the current translation expects to have service.instance.id present on all metrics (it's used to uniquely identify the target), this is however not always the case.

In our company we unfortunately didn't set it before hand so it's missing on most metrics, but also at least some of the receivers present in otel contrib are not setting it either (i.e. postgresqlreceiver).

This leads to metrics being rejected by prometheus compatible backends, due to not having unique metrics, because the rest of the unique identifiers are moved to the target_info metric.

Describe the solution you'd like

We would like to solve this by updating the pkg/translator/prometheusremotewrite to allow providing a list resource attributes that would be moved to the original metrics even when the target_info metric is enabled.

Configuration could look something like this:

exporters:
  prometheusremotewrite:
    endpoint: "https://my-cortex:7900/api/v1/push"
    resource_to_telemetry_conversion_allowed_list:
      - k8s_pod_name
      - k8s_node_name
      - k8s_namespace_name
      - hostname
      - ip
      - etc ....

Making this even more flexible, possibly as a second iteration we could expect regex inside the list, so in case we have a common prefix among resource attribute keys we want to move we could write a single regex to match all of them instead of having to list them one by one, and therefore updating otel collector configuration each time such resource attribute is added.

None of the processors currently supports an operation where you can iterate on keys of resource attributes like this to support this feature.

Example of config like that:

exporters:
  prometheusremotewrite:
    endpoint: "https://my-cortex:7900/api/v1/push"
    resource_to_telemetry_conversion_allowed_list:
      - k8s.*
      - hostname
      - ip
      - etc ....

Describe alternatives you've considered

As of now we use transform processor as a workaround to copy specific resource attributes into attributes and therefore having them on metrics, and having the metrics uniquely identified.

Example configuration:

transform:
  metric_statements:
    - context: datapoint
      statements:
        - set(attributes["k8s_replicaset_name"], resource.attributes["k8s.replicaset.name"])
        - set(attributes["k8s_deployment_name"], resource.attributes["k8s.deployment.name"])
        - set(attributes["k8s_statefulset_name"], resource.attributes["k8s.statefulset.name"])
        - set(attributes["k8s_hpa_name"], resource.attributes["k8s.hpa.name"])
        - set(attributes["k8s_volume_name"], resource.attributes["k8s.volume.name"])
        - set(attributes["k8s_job_name"], resource.attributes["k8s.job.name"])

Additional context

This was discussed during OpenTelemetry SIG: Agent/Collector on May 24th, 2023 outcome being open this issue, and starting a discussion about implementation / how would this be configurable.

I would be willing to working on contributing this feature.

@jmichalek132 jmichalek132 added enhancement New feature or request needs triage New item requiring triage labels May 29, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@echatman-ias
Copy link

+1 I am very interested in this feature also

@bryan-aguilar bryan-aguilar added priority:p2 Medium help wanted Extra attention is needed and removed needs triage New item requiring triage labels Aug 11, 2023
@bryan-aguilar
Copy link
Contributor

I'm not sure this is a change required in the translator package but instead is an enhancement on the prometheus remote write exporter. The all or nothing behavior resource_to_telemetry_conversion configuration option may not be suitable to all users.

Would we need to plan for regex handling from the start? Other components may need to be looked at to see how they handle it. This issue may be relevant to how the configuration needs to look for the epxorter. #25161. I wonder if we consider this behavior a filter and map operation?

@dmitryax
Copy link
Member

Thanks, @bryan-aguilar. This issue seems like a good fit for #25161.

@github-actions
Copy link
Contributor

Pinging code owners for pkg/resourcetotelemetry: @mx-psi. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bryan-aguilar
Copy link
Contributor

It was shared with me that the resource to telemetry conversion actually exists in its own exporter helper package here. I think this would fall under an enhancement to that package.

I think regex should definitely be considered in a first pass and possibly the inclusion of OTTL functionality. I'm not super familiar with OTTL so someone with more experience would have to provide some guidance on if and where it could be useful here.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 11, 2023
@jmichalek132
Copy link
Contributor Author

Not stale I might pick it up after finishing #27565.

@github-actions github-actions bot removed the Stale label Nov 23, 2023
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 23, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2024
@mx-psi mx-psi removed the Stale label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants