-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify MetricPoint to avoid copy #2321
Merged
cijothomas
merged 8 commits into
open-telemetry:metrics
from
cijothomas:cijothomas/metricpoint_avoidcopy2
Sep 10, 2021
Merged
Changes from 1 commit
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
8a5ae51
Modify MetricPoint to avoid copy
cijothomas 8dbfaf0
Merge branch 'metrics' into cijothomas/metricpoint_avoidcopy2
cijothomas 29ce594
Merge branch 'metrics' into cijothomas/metricpoint_avoidcopy2
cijothomas bc352e6
add benchmark for with ref
cijothomas 1ce3f8e
fix format
cijothomas 55d9939
modify benchmark to do something with each point to avoid JIT opitmiz…
cijothomas 32b0908
output paste
cijothomas 0aa9837
Merge branch 'metrics' into cijothomas/metricpoint_avoidcopy2
cijothomas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this because, if you think about it, ref MetricPoint is essentially a pointer (think &MetricPoint) which is what we were trying to do with that MetricPointPointer struct on the other PR. What I don't know is if using this enumerator in a foreach is the compiler smart enough to use Current as a ref and not make a local copy? Need to look at compiled MSIL, possibly JIT output in benchmark, to see. I can take that on if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I tried benchmark (https://github.com/open-telemetry/opentelemetry-dotnet/pull/2322/files) and the ref approach (this PR) reduced the mean time from ~100us to ~50us. (Must be the savings from copying, but yes we should confirm this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I threw together a quick benchmark over just the bits (more or less) of the pattern we're talking about.
Benchmarks
Some interesting results.
ForeachWithRef
andUnrolled
seem to be slower, which I can't explain.Foreach
does:ldobj
is making a copy of mp (MetricPoint) which should be much slower.Where
ForeachWithRef
andUnrolled
do:Which is using the address of mp on the stack.
My guess would be the ref Current in the enumerator throws off some rule in the JIT optimizing the foreach but we probably need some help from runtime team to confirm.
Here's the disassembled output. I used .NET 6 because it supposedly has optimizations for value types but .net 5.0 & 3.1 performed the same.
Disassembled .NET 6 code
.NET 6.0.0 (6.0.21.45706), X64 RyuJIT
.NET 6.0.0 (6.0.21.45706), X64 RyuJIT
.NET 6.0.0 (6.0.21.45706), X64 RyuJIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GrabYourPitchforks @stephentoub Hey sorry for bothering you guys, looking for some expert guidance on the code above. Could either of you assist or tag in someone who has an extra cycle or two?
What we're trying to vet out: Good idea or bad idea to use
ref [struct] Current
in an enumerator?Seems to work but seeing some surprising benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JIT is sometimes detecting that the benchmark code posted at #2321 (comment) isn't actually doing anything other than looping, and it's optimizing away the dereference. For example, the
Benchmarks.RefEnumeratorBenchmark.Foreach
benchmark only ever loops from i = 0 to arr.Length - 1, never attempting to dereference any element from the array. InForeachWithRef
, it is dereferencing the value, but it's immediately throwing away the value once the read is complete. I suspect you're right in that it's simply a missing optimization in the JIT: in theory it could detect that the dereference is a no-op (as is done in the non-ref case) and elide it entirely.To defeat this optimization and get closer to what you're trying to measure for real, you should do something with the result. Something like:
This will force the JIT to dereference at least the DoubleValue field for all entries. But the JIT is still free to detect that you're not touching any other part of the entries, so it could still attempt to keep the number of dereferences as minimal as possible. (This is generally a good thing!)
I think the best benchmark would be one that performs a realistic unit of work within each loop. At least that would have the effect of forcing the JIT to generate code that matches what your users will observe in the real world.
Hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas OK updated the benchmarks using @GrabYourPitchforks suggestions (thank you!).
Benchmarks
This is showing more what I was expecting to see.
I like this pattern with the
ref MetricPoint Current
enumerator but we'll need to take care that exporters properly loop (egforeach (ref MetricPoint metricPoint in metric.GetMetricPoints())
) otherwise the loop will make a copy of each struct and be slooooow. I don't think that's a deal-breaker?Disassembled .NET 5 code
.NET 5.0.9 (5.0.921.35908), X64 RyuJIT
.NET 5.0.9 (5.0.921.35908), X64 RyuJIT
.NET 5.0.9 (5.0.921.35908), X64 RyuJIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insights here! I added similar benchmark, part of the metricbenchmarks demo-ing this as well.
@CodeBlanch I think asking export authors to use ref is fine.
@alanwest @reyang Please share your thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merging to unblock releases. WIll address any comments via future PRs.