Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for consuming OTLP/gRPC metrics #4722

Merged
merged 4 commits into from
Feb 16, 2021
Merged

Conversation

axw
Copy link
Member

@axw axw commented Feb 11, 2021

Motivation/summary

Add support for consuming basic OTLP/gRPC metrics,
like we support in the OpenTelemetry Collector exporter.

We do not yet support distributions or summaries; these
will be added once the Elasticsearch enhancements we
are dependent on are available. If these are received, the
server will ignore them and increment a monitoring counter
to indicate that an unsupported metric was received and
dropped.

Checklist

How to test these changes

  1. Start apm-server with monitoring enabled
  2. Run an application that is instrumented with OpenTelemetry to collect metrics (including some distributions and summaries), and configured to export OTLP to APM Server's standard host:port.
  3. Repeat step 2, exporting to an OpenTelemetry Collector with the elastic exporter.
  4. Compare the data received directly by APM Server to the data sent by the exporter.
  5. Check the beats monitoring doc for apm-server, and check the value for apm-server.otlp.grpc.metrics.consumer.unsupported_dropped corresponds to the number of distribution and summary metrics sent

Related issues

Closes #4503

@axw axw marked this pull request as draft February 11, 2021 05:19
@apmmachine
Copy link
Contributor

apmmachine commented Feb 11, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #4722 updated

  • Start Time: 2021-02-16T01:42:34.255+0000

  • Duration: 56 min 20 sec

  • Commit: 53a9a74

Test stats 🧪

Test Results
Failed 0
Passed 4733
Skipped 124
Total 4857

Trends 🧪

Image of Build Times

Image of Tests

Steps errors 4

Expand to view the steps failures

Run Window tests
  • Took 10 min 6 sec . View more details on here
Compress
  • Took 0 min 0 sec . View more details on here
  • Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage
Compress
  • Took 0 min 0 sec . View more details on here
  • Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
Test Sync
  • Took 3 min 26 sec . View more details on here
  • Description: ./.ci/scripts/sync.sh

Add support for consuming basic OTLP/gRPC metrics,
like we support in the OpenTelemetry Collector exporter.
This does not cover distributions or summaries; these
will be added once the Elasticsearch enhancements we
are dependent on are available.
@codecov-io
Copy link

codecov-io commented Feb 11, 2021

Codecov Report

Merging #4722 (53a9a74) into master (a25b110) will increase coverage by 0.26%.
The diff coverage is 97.72%.

@@            Coverage Diff             @@
##           master    #4722      +/-   ##
==========================================
+ Coverage   76.43%   76.70%   +0.26%     
==========================================
  Files         165      166       +1     
  Lines       10083    10211     +128     
==========================================
+ Hits         7707     7832     +125     
- Misses       2376     2379       +3     
Impacted Files Coverage Δ
beater/otlp/grpc.go 87.87% <91.30%> (+2.16%) ⬆️
processor/otel/metrics.go 99.06% <99.06%> (ø)
processor/otel/consumer.go 94.89% <100.00%> (+0.02%) ⬆️

@axw axw marked this pull request as ready for review February 11, 2021 08:50
@axw axw requested a review from a team February 11, 2021 08:50
UnsupportedMetricsDropped int64
}

type consumerStats struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this, why both consumerStats and ConsumerStats are needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consumerStats might change at any time, ConsumerStats is a snapshot. I'll add a comment.

otelMetrics := in.Metrics()
var unsupported int64
for i := 0; i < otelMetrics.Len(); i++ {
if !c.addMetric(otelMetrics.At(i), &ms) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a method that both mutates and argument and returns a value, can't we say unsupported += otelMetrics.Len() - ms.Len()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because two metrics may end up in the same metricset. (ms is a collection of metricsets)

func (c *Consumer) addMetric(metric pdata.Metric, ms *metricsets) bool {
switch metric.DataType() {
case pdata.MetricDataTypeIntGauge:
dps := metric.IntGauge().DataPoints()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dumb question, but can't we make all these data types conform an interface defining LabelsMap(), Timestamp() and Value()?
All these cases are the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but:

  • That won't work when we add support for distributions and summaries
  • Ideally we would also support distinguishing doubles from ints, storing the latter in a long field type

I'd prefer to leave it as it is for now (this is also how it looks in opentelemetry-collector-contrib), and refactor it later if it's a pain.

@axw axw requested a review from jalvz February 11, 2021 12:57
@axw
Copy link
Member Author

axw commented Feb 15, 2021

jenkins run the tests please

@axw axw merged commit 6d8ad81 into elastic:master Feb 16, 2021
@axw axw deleted the otel-metrics branch February 16, 2021 02:56
@simitt simitt self-assigned this Mar 2, 2021
@simitt
Copy link
Contributor

simitt commented Mar 4, 2021

Tested with BC 2; metrics are accepted and ingested as expected.

Regarding testing of distributions or summaries - how did you test that when implementing? I was instrumenting a go app, but the otel go metrics implementation only supports counters so far.

@axw
Copy link
Member Author

axw commented Mar 5, 2021

@simitt it does support histograms and summaries. You need to define an aggregator in the metrics controller, and then use a ValueRecorder meter. Frankly speaking their metrics API is obtuse, so you can't be faulted for missing that :)

Here's how I tested histograms:

controller := controller.New(
processor.New(
simple.NewWithHistogramDistribution([]float64{1, 100, 1000, 10000}),
exporter,
),
controller.WithPusher(exporter),
controller.WithCollectPeriod(time.Minute),
)
if err := controller.Start(context.Background()); err != nil {
return err
}
meterProvider := controller.MeterProvider()
meter := meterProvider.Meter("test-meter")
float64Counter := metric.Must(meter).NewFloat64Counter("float64_counter")
float64Counter.Add(context.Background(), 1)
// This will be dropped, as we do not support consuming histograms yet.
int64Recorder := metric.Must(meter).NewInt64ValueRecorder("int64_recorder")
int64Recorder.Record(context.Background(), 123)

I believe you need to change NewWithHistogramDistribution to NewWithInexpensiveDistribution to aggregate the recordings as a summary instead.

@simitt
Copy link
Contributor

simitt commented Mar 5, 2021

I was missing that one could use simple.NewWithInexpensiveDistribution() - thanks.

Tested with BC3:

  • compared metrics for data sent via otel exporter and elastic exporter - no differences found
  • works as expected when using the otel exporter to directly send data to APM Server
  • when using the elastic exporter, no beats_stats.metrics.apm-server.otlp.grpc.metrics.consumer.unsupported_dropped are recorded (probably expected that this is not supported, but mentioning it due to the testing instructions above)
  • otlp data are not indexed, as they are missing in the template definition. This is probably fine as no aggregations are currently done on the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native OTLP intake
5 participants