Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpanAttributes modified to use Vec instead of OrderMap/EvictedHashMap #1293

Merged
merged 21 commits into from
Oct 20, 2023

Conversation

cijothomas
Copy link
Member

@cijothomas cijothomas commented Oct 11, 2023

Towards #1284 and #1283. Only focused on Span attributes (to be extended to Events, Links post this PR).

This PR modifes Span attributes to use Vec instead of OrderMap<Key, Value> in SpanBuilder( part of API), EvictedHashMap in Span (part of SDK). This is similar to what was done for Logs.

Performance Implications:
The performance gains are matching what is already demonstrated in #1121, but sharing the results from my run:

Stress test:
main branch:
Number of threads: 4
Throughput: 6,592,200 iterations/sec
Throughput: 6,504,400 iterations/sec
Throughput: 6,461,400 iterations/sec

This PR:
Number of threads: 4
Throughput: 10,274,200 iterations/sec
Throughput: 10,295,800 iterations/sec
Throughput: 10,329,000 iterations/sec

This is ~50% improvement in throughput!

Benchmarks:
start-end-span-4-attrs/always-sample
time: [586.89 ns 594.65 ns 603.28 ns]
change: [-38.279% -36.582% -34.871%] (p = 0.00 < 0.05)
Performance has improved.

This means we'll no longer be doing de-duplicating of span attributes. Opened an issue to track user feedback on this, which is also linked from changelog for easy visibility.

Additional Details:
The feature to enforce attributes limit is not done now. dropped_attribute_count will be 0 hence. It can be added back as a follow up, but I'll open a separate issue to discuss some issues in implementation of that, with some proposals.

Have just done the bare minimum to show the changes and its perf implications. I expect this to be 1st of a series of PRs doing similar for Events, Links, and also bringing back the attribute limit enforcement, along with new unit tests coverage. I want to get an okay that this direction is agreed upon, before working further on this. (Note: There were few other potential ideas discussed in todays SIG call, which can further help perf, but I'll open separate issues to track them., and they can be additive changes after this). Also, there might be interest in doing de-dup on a opt-in basis, either in SDK or a custom Processor, or Exporter - this is another candidate for follow up!

Thanks @shaun-cox and @lalitb for prior works, benchmarks and other consultation!

@shaun-cox
Copy link
Contributor

I love the direction and think it's necessary to achieve the project's mission.

@cijothomas
Copy link
Member Author

I made a quick attempt to fix this issue for Spans on top of this branch, and the throughput reaches 16m/sec. Overall that'd be more than doubling the perf (6M/sec to 16M/sec). Of course I haven't thought through the full implications, so will wait for green light on this PR's direction before further exploring that...

@jtescher
Copy link
Member

@cijothomas I think this makes sense, thanks for working on it. Retaining first n rather than last n seems fine so the perf win here is well worth it. We probably still want to keep track of the dropped counts where possible, could store vec and droppped count, or have a struct that stored both?

@cijothomas
Copy link
Member Author

We probably still want to keep track of the dropped counts where possible, could store vec and droppped count, or have a struct that stored both

I can bring it back. I left some comments in here about that feature overall : #1283 (comment). I'd pursue that separately, to avoid distracting this PR.

Given there is +1s to the direction of this PR from other maintainer/approvers, I'll continue working on it.
Will mark it draft now, to polish it further and to add back the feature of tracking dropped_count. Once done, will undraf the PR to request full review! Thanks.

@TommyCpp
Copy link
Contributor

I think it's the right direction. We should be explicit in docs that attributions can duplicate so we don't need to have deduplication or sort logic for them. Thanks for working on it

@cijothomas cijothomas marked this pull request as draft October 12, 2023 16:33
@jtescher
Copy link
Member

could also see where perf is with hashmap to keep de-duplication but change to last-write-wins, would be worth knowing the cost of each

@cijothomas
Copy link
Member Author

could also see where perf is with hashmap to keep de-duplication but change to last-write-wins, would be worth knowing the cost of each

Yes thats what I tried first!. From my stress tests:
6.2 M/sec main branch

7.5 M/sec with regular HashMap (so sdk does dedup, but no eviction).

10 M/Sec with vec, where we removed de-duplication as well.

(I do see a future where users will ask for de-duplication in the sdk level. We can do that as an opt-in SpanProcessor or even an Exporter feature, so as to no penalize every user.)

@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Files Coverage Δ
opentelemetry-datadog/src/exporter/model/mod.rs 85.2% <100.0%> (-0.6%) ⬇️
opentelemetry-jaeger/src/exporter/mod.rs 58.1% <100.0%> (ø)
opentelemetry-proto/src/transform/common.rs 23.4% <ø> (-2.3%) ⬇️
opentelemetry-sdk/src/export/trace.rs 40.0% <ø> (ø)
opentelemetry-sdk/src/testing/trace/mod.rs 80.8% <100.0%> (+0.2%) ⬆️
opentelemetry-sdk/src/trace/sampler.rs 92.4% <100.0%> (ø)
opentelemetry-sdk/src/trace/span_processor.rs 81.2% <100.0%> (+<0.1%) ⬆️
opentelemetry-stackdriver/src/lib.rs 30.8% <100.0%> (ø)
opentelemetry-zipkin/src/exporter/model/span.rs 96.1% <100.0%> (+<0.1%) ⬆️
opentelemetry/src/trace/span.rs 97.0% <ø> (ø)
... and 11 more

... and 3 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@cijothomas cijothomas marked this pull request as ready for review October 17, 2023 00:01
@cijothomas
Copy link
Member Author

@open-telemetry/rust-approvers Please review!

@TommyCpp
Copy link
Contributor

Discussed with @cijothomas and we will follow up with the nits. Merging this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants