-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus text format serializers. #4178
Add Prometheus text format serializers. #4178
Conversation
d3ca6e1
to
7c68315
Compare
7c68315
to
1f49ebb
Compare
Codecov Report
@@ Coverage Diff @@
## main #4178 +/- ##
============================================
- Coverage 90.28% 90.24% -0.05%
- Complexity 4611 4653 +42
============================================
Files 537 539 +2
Lines 14097 14326 +229
Branches 1348 1370 +22
============================================
+ Hits 12728 12929 +201
- Misses 926 943 +17
- Partials 443 454 +11
Continue to review full report at Codecov.
|
My main concern is not about serializing in prometheus format, but whether or not we keep up-to-date with Prometheus Optimisations (e.g. filtering which metrics are returned via HTTP request headers, etc.). Given we already opt-in to doing that work, via maintaining our own HTTP server, I think it's only logical to do the serialization as well. Agree it should be minimal code and hopefully reduce memory usage. |
void writeExemplar( | ||
Writer writer, Collection<ExemplarData> exemplars, double minExemplar, double maxExemplar) | ||
throws IOException { | ||
for (ExemplarData exemplar : exemplars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this code should also be writing dropped attributes as exemplar labels (after your trace_id/span_id code, with a limit on total size of labels written to prometheus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we do that right now
Would like to keep this pr to just reproducing our current behavior and we can add features in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is because I was waiting for the Attribute -> Label spec to finalize in prometheus. Fine to push to future PR, but I do wish I had added that in the original hook.
case EXPONENTIAL_HISTOGRAM: | ||
return HISTOGRAM; | ||
} | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause the entire prometheus export to crash, or is this handled in the exporter later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not wired up to an exporter yet so I believe that's a decision that is yet to be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can never be thrown when the BOM is used, if it did happen to be thrown then the export would fail. I've seen various linkage exceptions related to unaligned dependencies and think it's mostly not possible to use OTel without alignment so leant towards just making this an exception instead of log / fallback, but either approach would be fine here.
} | ||
|
||
private void write(MetricData metric, Writer writer) throws IOException { | ||
// TODO: Implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a bug for this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this would be an issue in the spec repo. I changed this comment a bit to reflect this proposed spec wording
valueAtPercentile.getValue(), | ||
point.getAttributes(), | ||
point.getEpochNanos(), | ||
"quantile", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a note for me, doubting my own code here.
Percentile != Quantile. We had this issue in OpenCensus where we needed to switch from [0.0,1.0] to [0.0, 100.0].
However, OTLP specifies summaries in Quantile
. Is this an instance where Java SDK is uses the wrong name? I never noticed before, as I didn't pay close attention until working on some OpenCensus-Go bridges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wikipedia tells me that a percentile is a particular type of quantile, so is this a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jsuereth is referring to the fact that the OTel quantile is not actually supposed to be that particular type of quantile. Spec says
within the interval [0.0, 1.0]
We would at least need to change the name of the class to ValueAtQuantile
and this obviously wrong javadoc
Would also need to make sure we don't have any logic such as validation that is using the percentile expectation
public void accept(AttributeKey<?> key, Object value) { | ||
try { | ||
if (wroteOne) { | ||
writer.write(','); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to get test coverage for multi-attributes.
case EXPONENTIAL_HISTOGRAM: | ||
return HISTOGRAM; | ||
} | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not wired up to an exporter yet so I believe that's a decision that is yet to be made.
&& longSumData.getAggregationTemporality() == AggregationTemporality.CUMULATIVE) { | ||
return COUNTER; | ||
} | ||
return GAUGE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a prometheus collector which prefers cumulative temporality to get non-monotonic, delta, sum data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe after we added preferred aggregation support it isn't possible when using our SDK. It could be possible if someone used the exporter without our SDK though (creating arbitrary MetricData).
valueAtPercentile.getValue(), | ||
point.getAttributes(), | ||
point.getEpochNanos(), | ||
"quantile", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wikipedia tells me that a percentile is a particular type of quantile, so is this a problem?
With metrics nearing stability, I think coupling the exporter to the prometheus client library has some risk as it may not make it in time. It's nice to also get some performance by directly serializing given how huge they can be (in particular, lots of arrays allocated for histograms in the adapter code that are eliminated). Many Java apps have a strong correlation between full GC and Prometheus scrapes :) I will move the
Collector
integration into an alpha extension instead so it can be decoupled from a stable release.This only adds serialization logic, it doesn't use it yet or remove a few smaller usages of the client library that are left which will come in followups.
Maintenance-wise, given the above benefits, I think 414 LoC of the serializer vs 340 LoC for the MetricAdapter is not that much more. Code is not quite as simple but prometheus format is relatively straight forward so doesn't seem too bad.
The output makes sure to match the client library 100% for openmetrics. It has some small deviations for legacy prometheus format which are compatible with prometheus, presumably the client library didn't mess with that format to avoid breaking unit tests relying on the strings.