Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry JVM metrics are not properly mapped #12

Closed
AlexanderWert opened this issue Dec 2, 2022 · 8 comments · Fixed by elastic/kibana#151826
Closed

OpenTelemetry JVM metrics are not properly mapped #12

AlexanderWert opened this issue Dec 2, 2022 · 8 comments · Fixed by elastic/kibana#151826
Assignees
Labels
8.8-candidate agent-java bug Something isn't working
Milestone

Comments

@AlexanderWert
Copy link
Member

AlexanderWert commented Dec 2, 2022

The JVM metrics reported by OpenTelemetry Java agents are not properly mapped. In elastic/apm-server#8777 we changed the mapping to comply with the change in the metrics semantic convention, however, this mapping logic seems to be ignored.

The metric documents do appear under discover, however with wrong field names.

Example

This is how the metric document looks right now:
image

And this is how a valid JVM metrics document would look like:
image

So, seems that this mapping logic is not being applied.

Example data:

Here is some OTLP example data for the JVM metrics: https://gist.github.com/AlexanderWert/bf3b8a6cbbd02a345038bd8e8cac520f

Hypothesis:

Let's take a concrete metric: process.runtime.jvm.memory.usage

In this mapping logic it is assumed that this metric is reported as a Gauge metric type, however, in fact this metric (process.runtime.jvm.memory.usage) has the type sum (as we can see in the example data).
So very likely the root cause is the wrong metric type in the mapping logic.

Here is the OTel spec for the metrics: https://opentelemetry.io/docs/reference/specification/metrics/semantic_conventions/runtime-environment-metrics/#jvm-metrics

All types of counters (Counter, UpDownCounter) are mapped to the sum metric type in the OTLP protocol!
So we need to have the mapping in the MetricTypeSumif-branch.

@AlexanderWert AlexanderWert added the bug Something isn't working label Dec 2, 2022
@felixbarny
Copy link
Member

As an alternative to mapping/duplicating the metrics, we could also have a dedicated visualization for OTel. Duplicating all metrics doesn't seem scalable in the long-term.
We could even think about aligning the metric names in our agents with the OTel metric names in the future.

@adamleantech
Copy link

I'm also hitting this issue which makes the OTel Java integration quite frustrating. I can potentially use the elastic agent instead in the short term but I'd like to have confidence that switching to OTel at a later date would be seamless. Please can this be considered when designing a solution?

@AlexanderWert AlexanderWert added this to the 8.8 milestone Feb 6, 2023
AlexanderWert added a commit to elastic/kibana that referenced this issue Mar 22, 2023
## Summary

Adds support for showing OpenTelemetry based system and JVM process
metrics in APM's metrics view.

Resolves elastic/apm-data#12 in a cleaner way
without the need to transform metrics.
@rcooper85
Copy link

Has this been released in version 8.8? As I can't see any mention of it in the release notes.

@SylvainJuge
Copy link
Member

@rcooper85 this was solved in the UI by making the UI rely on the captured OTel metrics when they are available, thus with either OTel agents and Elastic agents you should get similar results in UI, however the data is expected to remain in the captured format.

@mielastic
Copy link

@rcooper85 this was solved in the UI by making the UI rely on the captured OTel metrics when they are available, thus with either OTel agents and Elastic agents you should get similar results in UI, however the data is expected to remain in the captured format.

This does not seem to be resolved. I'm running 8.11.1 and OTel Java Agent 1.32.0 and in APM app, "System memory usage", "Garbage collection per minute" and "Garbage collection time spent per minute" are not showing any data. Is this still not fixed?

@herrBez
Copy link

herrBez commented Jan 10, 2024

Hi,
During an analysis I could see that with the Exporter "logging" (i.e., stdout) the metric "process.runtime.jvm.gc.duration" is exported, while with the otlp exporter (i.e., by sending the data to APM-Server) it is not collected. Is it possible that APM-Server is doing some manipulation and/or discarding this metric for some reasons?

@SylvainJuge
Copy link
Member

Hi @herrBez , the process.runtime.jvm.* prefix is the one we are using for testing thus it should be properly supported.

Can you provide which version of APM server and OpenTelemetry agent you are using ?

Please note that now the JVM metrics are now part of the semantic conventions and are stable with a different name, which also means:

@herrBez
Copy link

herrBez commented Jan 11, 2024

Hi,
Thank you for the answer. I am using APM-Server on Elastic-Agent 8.11.3 and Opentelemetry javaagent 1.32.0 without the OTEL_SEMCONV_STABILITY_OPT_IN=jvm (TIL). With this configuration process.runtime.jvm.gc.duration (which should potentially work with my version of Kibana) is sent to APM-Server, but it is not indexed.

With the flag OTEL_SEMCONV_STABILITY_OPT_IN=jvm, I can see the GC Metric jvm.gc.duration without the process.runtime prefix which should work once elastic/kibana#174445 is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.8-candidate agent-java bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants