Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify queueSize metric description and attribute #5836

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

puckpuck
Copy link
Contributor

Fixes #4834

Also references #4382 and #5066

The Java SDK, when auto-configured, will emit 2 metrics with the same metric name (queueSize) but a different description. Combined with the very popular Prometheus exporter in the OpenTelemetry Collector, this leads to an error and 1 of the 2 metrics being dropped. The metric is emitted as part of the Batch*Processors for traces and logs.

This PR uses the same description for both metrics and updates an attribute key to be common across both. The Attribute name could be seen as a larger change outside the scope of this PR, and I'm happy to remove it. In my opinion, when I look at how this metric could be used, I see a benefit with using a common name of processorType for the attribute key.

I also understand there may be a larger issue at the specification level to help resolve this, but this issue has existed for well over a year. It can be resolved with this PR while waiting for the specification to be clearer on handling these use cases.

@puckpuck puckpuck requested a review from a team September 16, 2023 01:58
@codecov
Copy link

codecov bot commented Sep 16, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
...metry/sdk/logs/export/BatchLogRecordProcessor.java 100.00%
...telemetry/sdk/trace/export/BatchSpanProcessor.java 100.00%

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM...thanks for the contribution.

Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this has been a problem for too long. Thanks for the fix.

Opened #5856 to track some remaining issues for these metrics.

@jack-berg
Copy link
Member

Thought about this more. Ideally we would get all the problems of #5856 fixed in a single release to minimize churn, but that is dependent on progress in semantic-conventions which doesn't appear to be making quick progress. So we can either:

  1. wait an unknown amount of time for semantic-conventions to resolve the issue and limit the churn here, but continue having a bad prometheus experience until then
  2. make the simplest possible change to fix the prometheus experience now, with the expectation of churning again when the semantic-conventions issue resolves

Would be partial to option 1 if there was an end in sight for the semantic-conventions repo, but its not reasonable to have broken prometheus experience without a clear path and timetable to fix.

@jack-berg jack-berg merged commit 8d27e24 into open-telemetry:main Sep 28, 2023
@puckpuck puckpuck deleted the queueSize.unify-metric branch September 29, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider changing help text for span/logs batch exporter metric "queueSize"
3 participants