Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to reduce performance overhead in DefaultSynchronousMetricStorage? #6601

Closed
wgy035 opened this issue Jul 24, 2024 · 6 comments
Closed
Labels
Feature Request Suggest an idea for this project

Comments

@wgy035
Copy link

wgy035 commented Jul 24, 2024

Is your feature request related to a problem? Please describe.
While conducting a performance test, I discovered through flame graph analysis that DefaultSynchronousMetricStorage.record() causes extra performance overhead.
image

Is there any way to reduce performance overhead in DefaultSynchronousMetricStorage?
Describe the solution you'd like
1.The process of AdviceAttributesProcessor seems only remove some attributes from input and it cost a lot, could we skip the remove process?
2.CollectionAggregatorHandles used ConcurrentHashMap, the performance is suboptimal in high concurrency scenarios, consider using a Lock-Free Queue instead, such as Disruptor or RingBuffer.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@wgy035 wgy035 added the Feature Request Suggest an idea for this project label Jul 24, 2024
@jack-berg
Copy link
Member

The AdviceAttributesProcessor is involved when a measurement is recorded to an instrument which is using the (experimental) setAttributesAdvice(List<AttributeKey<?>> attributes) (e.g. ExtendedDoubleHistogramBuilder#setAttributesAdvice.

If setAttributesAdvice is not set, there is no AdviceAttributesProcessor and the overhead is reduced. I wrote a detailed blog post about the otel java metrics implementation, and suffice it to say a lot of performance engineering has been done and there isn't much room for improvement.

So if setAttributesAdvice is causing overhead, why is it being called. It looks like you're using the otel java agent. The setAttributesAdvice API serves a key use case for the agent in that it allows users to opt-in to additional attributes besides the default set which is recorded. For example, the http.server.request.duration metric defines a bunch of safe, low cardinality attributes to include on every measurement. But some users want to opt-in to additional higher cardinality attributes which are normally reserved for http spans. The otel java agent instrumentation records measurements with all the attributes (both high and low cardinality), and uses the setAttributesAdvice API to instruct the otel metrics SDK that only a subset of them should be retained by default. Users can override this by configuring views to select additional attributes.

But the consequence of this is that the otel metrics SDK has to do additional work for every measurement. Measurements come in with all attributes, and the SDK needs to trim that attributes set down to the subset retained for aggregation. This takes CPU and memory. There may be ways to optimize AdviceAttributesProcessor, but I suspect they'll be minor improvements.

@jack-berg jack-berg added the needs author feedback Waiting for additional feedback from the author label Jul 24, 2024
@trask
Copy link
Member

trask commented Jul 24, 2024

@wgy035 how does this compare to the overhead of everything under io/opentelemetry/javaagent/?

Copy link
Contributor

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@github-actions github-actions bot added the Stale label Jul 31, 2024
@wgy035
Copy link
Author

wgy035 commented Aug 6, 2024

@jack-berg Thanks for your detailed response! I currently understand the purpose of AdviceAttributesProcessor, but in the case of java agent usage, I think that the choice of additional attributes should be defined in Instrumenter.

I've observed that otel java agent has already traversed all attributes in the process of OperationListener.onEnd before calling the SDK(e.g. HttpServerMetrics#onEnd), so I think introducing AdviceAttributesProcessor may be an extra operation and we can remove attributes that users don't need in OperationListener.onEnd, just as what is done in AdviceAttributesProcessor.

@trask In my performance test, it accounts for 9% of the total usage under io/opentelemetry/javaagent/

@github-actions github-actions bot removed the needs author feedback Waiting for additional feedback from the author label Aug 6, 2024
@jack-berg jack-berg removed the Stale label Aug 7, 2024
@jack-berg
Copy link
Member

Opened #6629 as a minor optimization to exit early when advice doesn't end up removing any attributes. This still doesn't help the common case of when advice does result in attributes being removed.

@jack-berg
Copy link
Member

Closing this after merging #6633. I know there is a another idea in the description to use a lock free alternative to ConcurrentHashMap, but that's a big enough topic that it should have its own issue. Please open a new issue if anyone wishes to discuss that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Suggest an idea for this project
Projects
None yet
Development

No branches or pull requests

3 participants