Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Prometheus metric libraries with APM agents #355

Open
1 of 8 tasks
felixbarny opened this issue Oct 16, 2020 · 12 comments
Open
1 of 8 tasks

Support Prometheus metric libraries with APM agents #355

felixbarny opened this issue Oct 16, 2020 · 12 comments

Comments

@felixbarny
Copy link
Member

felixbarny commented Oct 16, 2020

The background of this is that we want to support sending custom metrics with our agents. We don't want to create another metrics API though and thus we're seeking to integrate with existing ones. Each language has its own favorite metric libraries. But Prometheus has good cross-language coverage and is quite popular in all of them so it seems like a good fit to align on cross-agent.

Prometheus doesn't have to be the one and only API we support. In fact, the Java agent already supports the Micrometer metric registry which is quite popular in that ecosystem. It's up to each agent team to decide on the priority and the choice of other metric APIs they want to support.

This issue tracks writing up a cross agent spec and implementing a reference implementation for supporting the Prometheus client library.

Instead of the pull-based model that is typical for Prometheus, the Elastic APM integration will send the metrics that are registered in the Prometheus metric registry directly from the application to the APM Server intake API.

Histograms will not be supported in the first iteration. After APM Server adds support for histograms (elastic/apm-server#3195), there will be a follow up to support them as well.

Agents that support auto instrumentation should automatically plug into the Prometheus client so that users don't need to configure anything in order for the metrics to be sent.

TODOs

  • For counters and timers, decide on whether to report the difference/delta since the last report or whether to send up cumulative values
  • Define a mapping for different metric types, like counters, gauges, and timers to the intake API metrics spec
    • Reconcile that with the existing Go Prometheus implementation and the Java Micrometer implementation
  • Work with the PM and docs team on getting started documentation

Spec issue

Agent issues

@felixbarny felixbarny added this to the 7.11 milestone Oct 16, 2020
@exekias
Copy link

exekias commented Oct 16, 2020

Instead of the pull-based model that is typical for Prometheus, the Elastic APM integration will send the metrics that are registered in the Prometheus metric registry directly from the application directly to the APM Server intake API.

Any thoughts on what will be the experience when users are collecting Prometheus metrics from both instrumented applications and other services just exposing these? I understand for the later users would be using Elastic Agent with autodiscover for all Prometheus endpoints. This would lead to duplicating the data I guess, which may be ok?

@felixbarny
Copy link
Member Author

Yep, that would lead to duplicated metrics in different indices. It probably makes sense for agents to offer an option to disable Prometheus metric collection.
Not sure if we need to have a cross-agent consistent way of doing that. In the Java agent, the easiest way to implement that would be to make users set disable_instrumentations=prometheus.

But that makes me realize that we should make sure that metrics collected via APM Agents are consistent with the format of the Metricbeat Prometheus collector.

@exekias
Copy link

exekias commented Oct 16, 2020

But that makes me realize that we should make sure that metrics collected via APM Agents are consistent with the format of the Metricbeat Prometheus collector.

💯

In that sense, I'm wondering, would it make sense for APM agents to inject the APM related metadata into Prometheus labels? I guess you are not really storing that as a Prometheus label, but using some other ECS fields.

@felixbarny
Copy link
Member Author

inject the APM related metadata

Could you elaborate on what you mean with APM related metadata? Do you mean host/Docker/k8s/cloud/service metadata? Agents only send that once with each request to APM Server. The Server then folds the metadata to each event (such as a metricset) that's sent in the same request. I guess we'd just map the regular Prometheus labels to the ECS field labels.*.

@exekias
Copy link

exekias commented Oct 16, 2020

Thanks for the explanation, I was thinking aloud about this part from https://github.com/elastic/observability-dev/issues/1178:

Currently in order to monitor Prometheus client metrics customers has to either export Prometheus metrics using prometheus module, which lacks deep correlation with APM via ECS fields.

I guess this refers to the service fields, it would be nice if we could still attach the right fields to the metrics when we are under the scenario I explained. Anyway, I agree that injecting these into Prometheus labels may be challenging or not worth it.

@alex-fedotyev
Copy link

I am curious what happens with duplicate metrics when we get to datastreams:
https://docs.google.com/document/d/1y56a9fjkLi6Zen5qGC_JKYM9ljYpBA5W0fgdWilcwYc/edit

Today when I enable apm-* and metrics-* on waffle map, I end up seeing duplicate instances.

@alex-fedotyev
Copy link

For counters and timers, decide on whether to report the difference/delta since the last report or whether to send up cumulative values

Regarding difference/delta vs actual value, I think it make sense to align with how integrations collect those metrics.
I am wondering how to simplify visualization of custom metrics and making this easier than today (we already offer TSVB, Lens, Metrics Explorer, Inventory waffle map already to work with custom metrics).

@cyrille-leclerc
Copy link

👍 on @alex-fedotyev , can we offer the same user experience via Elastic APM and via Metricbeat Prometheus?
I particularly have in mind to be aligned on the histogram support.

A difference I see is to question the idea to prefix the metric name as Metricbeat does it for the prometheus integration prefixing by prometheus.

@nicholas-r-king
Copy link

Any movement on this? This seems dead even though elastic/apm-agent-python#1005 was completed successfully for Python. Why was this stalled for all other agents?

@gregkalapos
Copy link
Contributor

OTel metrics changed the priority of this: #691

As OTel became more popular for metrics as well, we focused on supporting OTel metrics, instead of going for Prometheus. There is some overlap - e.g. in Java there is a prometheus exporter for OTel, that's mentioned in our docs.

But to address the question directly:

Why was this stalled for all other agents?

Due to OTel metrics getting more important, so we focus on that.

@nicholas-r-king
Copy link

That doesn't seem entirely true as there doesn't seem to have been any movement on any of those tickets either. No milestones, no branches, and inactive since Nov 2022.

@gregkalapos
Copy link
Contributor

Is there anything we can help you with @nicholas-r-king? Any specific missing feature in any specific agent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants