-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor aggregator and exporters #2295
Refactor aggregator and exporters #2295
Conversation
Codecov Report
@@ Coverage Diff @@
## metrics #2295 +/- ##
===========================================
+ Coverage 74.58% 78.90% +4.31%
===========================================
Files 218 235 +17
Lines 6957 7476 +519
===========================================
+ Hits 5189 5899 +710
+ Misses 1768 1577 -191
|
@@ -32,7 +32,7 @@ public ConsoleMetricExporter(ConsoleExporterOptions options) | |||
{ | |||
} | |||
|
|||
public override ExportResult Export(in Batch<MetricItem> batch) | |||
public override ExportResult Export(in Batch<Metric> batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1) metrics exporter will always run in a separate thread (unlike trace/log exporter which might run in the hot path - e.g. SimpleExportingProcessor) 2) metrics exporter will only be triggered at a much lower frequency (unlike trace/log exporter which might get triggered for every single event), so it might be actually easier if we define this as IEnumerable<Metric>
.
I understand that there might be a desire to have ConsoleExporter<T>
which covers logs/metrics/traces (for sake of consistency).
@CodeBlanch do you have strong opinion on this? (look at the change on the src/OpenTelemetry/Batch.cs
, do you think the extra if-condition could slow down traces/logs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree to make this simpler with just IEnumerable
, so we won't affect the other signals's hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do IList<T>
instead of IEnumerable<T>
? With that we could at least do a for (int i = 0; i < list.Count; i++) { var item = list[i]; }
type of loop that doesn't allocate an enumerator. Or we could even pass the array directly. Array doesn't have a struct enumerator but AFAIK the JIT is smart enough to re-write a foreach loop on an array to be allocation-free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we will be able to tell the count of metrics or not, if we could, IList definitely works better.
I guess array won't work since it assumes certain memory layout. For example, if we pre-allocate memory and keep reusing them, we would need to compact the pre-allocated aggregation buffer before exporting (and that seems to be expensive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about this in the follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also pass a ReadOnlySpan over the array?
|
||
if (circularBuffer == null) | ||
if (circularBuffer == null && metrics == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned about the extra perf cost although it might be just 2 nanoseconds, as this will contribute to every single export call. https://github.com/open-telemetry/opentelemetry-dotnet/pull/2295/files#r700611706
Probably need some perf numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the final decision here: https://github.com/open-telemetry/opentelemetry-dotnet/pull/2327/files#r704879293
/// <summary> | ||
/// Calculate SUM from incoming delta measurements. | ||
/// </summary> | ||
DoubleSumIncomingDelta = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider making this Flags?
e.g.
- first four bits:
- 0001: Sum
- 0010: Gauge
- 0100: Histogram
- anything else: Invalid
- next two bits:
- 00: Invalid
- 01: Delta
- 10: Cumulative
- 11: Invalid
- next four bits:
- 0001: int64 (long)
- 0010: double
- anything else: Invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This in internal only enum, and this micro-optimization can be added in the future transparently.
I don't know how should we proceed on this big PR, want to get some input from @alanwest @CodeBlanch @utpilla:
I guess one possible way is that we merge this PR to the |
I agree its best to leave the Batch<> untouched in this PR, given its unlikely to be the final stage we want. |
src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/MetricItemExtensions.cs
Show resolved
Hide resolved
src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/MetricItemExtensions.cs
Show resolved
Hide resolved
I'll approve the PR (knowing that there are multiple things that we can / probably should refactor / improve) if you could update it to target |
foreach (var processor in this.metricProcessors) | ||
{ | ||
processor.SetGetMetricFunction(this.Collect); | ||
processor.SetParentProvider(this); | ||
temporality = processor.GetAggregationTemporality(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda outside the scope of this PR, but would it make sense to do away with the list of MetricProcessors here since there can only be one? My first glance of this code made me think the last processor would dictate the temporality, but then I remembered:
internal MeterProviderBuilderSdk AddMetricProcessor(MetricProcessor processor) | |
{ | |
if (this.MetricProcessors.Count >= 1) | |
{ | |
throw new InvalidOperationException("Only one MetricProcessor is allowed."); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its planned for the next milestone , where we match the spec. (There is no MetricProcessor anymore :))
https://github.com/open-telemetry/opentelemetry-dotnet/milestone/27
I support approving and targeting the metrics branch. @cijothomas, you mentioned doing another alpha release this Friday. Would we do the release from the metrics branch in this case? |
/// <summary> | ||
/// Calculate SUM from incoming delta measurements. | ||
/// </summary> | ||
LongSumIncomingDelta = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Incoming
mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incoming = Measurement. i.e the one reported by user using the counter API.
For sync counter, the measurement reported by the user is deltas.
For async counter, the measurement reported by the user is cumulative.
/// <summary> | ||
/// Histogram. | ||
/// </summary> | ||
Histogram = 6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be the need for HistogramDelta
and HistogramCumulative
?
My 🧠 is still churning to grok what's going on here... so looks like LongSumIncomingDelta
vs. LongSumIncomingCumulative
, for example, really just speaks to whether whether the instrument is sync (delta) vs async (cumulative)... not so much the resultant aggregation temporality that is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. This class simply refers to the types of aggregation we need to do internally. And speaks nothing about the outputted temporality!
initValue = this.doubleVal; | ||
newValue = initValue + number; | ||
} | ||
while (initValue != Interlocked.CompareExchange(ref this.doubleVal, newValue, initValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 😄 I learned something here... Interlocked.Add
equivalent for double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has scope for improvement. I think we need to put a hard limit like this:
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/CircularBuffer.cs#L149-L152
this.Description = instrument.Description; | ||
this.Unit = instrument.Unit; | ||
this.Meter = instrument.Meter; | ||
AggregationType aggType = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably for another PR, but I assume we might want some logging in the event none of the conditions match the instrument like Counter<Decimal>
, for example. Or is this already prevented by .NET?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check again, but i think .NET already restricts the types for
The next release is tied to the exporter/aggregator improvement. That milestone(alpha3) will not happen until these changes are brought to main. Hopefully I can quickly address issues and take it to main soon. I changed the PR to target metrics branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on this discussion.
merging to metrics branch. Will send follow ups to the metrics branch, and do a merge to main once all concerns are addressed. The plan is to have alpha3 release include fixes (Promtheus fix, OTLP fix - histogram) from this PR. |
this.circularBuffer = circularBuffer ?? throw new ArgumentNullException(nameof(circularBuffer)); | ||
this.targetCount = circularBuffer.RemovedCount + Math.Min(maxSize, circularBuffer.Count); | ||
} | ||
|
||
internal Batch(T[] metrics, int maxSize) | ||
{ | ||
Debug.Assert(maxSize > 0, $"{nameof(maxSize)} should be a positive number."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need some runtime check on maxSize
, as Debug.Assert
doesn't show up in release build?
|
Introduce Metric and MetricPoint, where Metric is a stream of aggregated metrics, containing upto N MetricPoints.
Each instrument result in one Metric.
(Once views are in, each instrument may result in more than on Metric. Also view config will be "hard-wired" at instrument creation time, so hot path has a simple Metric.Update() with no view lookups)
Each Metric pre-allocates MetricPoint[2000], to store time-series. The Dictionary is still used for lookup, to obtain the index within this array.
SDK caps max number of Metric (1000), and number of MetricPoints within each Metric to be 2000 - just hardcoded numbers
for now. This needs to be revisited to have a reasonable defaults, as we don't anticipate exposing this to user (unless spec asks so, but that'd likely be a separate discussion)
Exporters get Batch, reusing Batch from trace/log. This might be reconsidered for a simple IEnumerable.
Metric exposes GetDataPoints(), returning BatchMetricPoint. BatchMetricPoint is newly introduced as a way to iterate though the MetricPoint array, which is a struct and hence cannot re-use Batch. This need revisit based on 5 below.
MetricPoint is a struct now storing all type of Aggregator output - Sum,Gauge,Histogram. (Might save some cost by leveraging Union approach.)
There is cost involved in "copying" the struct when iterating. Will explore better ways to handle this.
All exporters modified with the new Metric/MetricPoint structure. This fixes the Prometheus exporter issue, and optimizes OTLP as well. The histogram is also modelled as per OTLP which is more efficient. (Also fixed the bound issue which was reported by Alan)
// Upcoming in separate PRs
Its a big PR, I can split to separate steps if needed. Or keep this as is, and address key issues as separate PRs.