Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Add OpenTelemetry metrics to Monitoring Collection plugin #135999

Merged

Conversation

crespocarlos
Copy link
Contributor

Summary

This PR creates Otel integration in Kibana and closes https://github.com/elastic/observability-dev/issues/2220. It's based on this PoC #133171, but removes the instrumentation code part, leaving only the essential code to enable the integration.

Screenshots

With Prometheus endpoint enabled

image

image

With the OpenTelemetry Metrics API exported as OpenTelemetry protocol enabled

image

Review notes

  • Since we're going with versioned API for Prometheus endpoint, I refactored the monitoring_collection plugin routes folder to organise it better, following the standard found in monitoring plugin
  • Unfortunately just by following the steps described in README.md, there won't be any metrics generated by Otel integration. In order to have those, we need to instrument the code - for testing purposes. Basically, copy the content from alerting/monitoring folder from the PoC PR https://github.com/elastic/kibana/pull/133171/files

Todo

While testing this, I found a possible bug in the Prometheus package that causes histogram type fail to be ingested

@crespocarlos crespocarlos added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services release_note:skip Skip the PR/issue when compiling release notes Feature:Stack Monitoring backport:skip This commit does not require backporting v8.4.0 labels Jul 8, 2022
@crespocarlos crespocarlos changed the title 2220 otel metrics alerting plugin [Stack Monitoring] Add OpenTelemetry metrics to Monitoring Collection plugin Jul 8, 2022
@crespocarlos crespocarlos marked this pull request as ready for review July 8, 2022 13:21
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@crespocarlos crespocarlos requested a review from a team July 8, 2022 13:21
@matschaffer
Copy link
Contributor

@crespocarlos I'm not sure if we're gaining much by keeping my PoC commit history here. Did you include it intentionally?

Copy link
Contributor

@matschaffer matschaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you explore adding an API integration test? I'm not sure, but if there's a way to inject a dummy plugin during the integration testing cycle, we could use that as a test-bed for instrumentation techniques.

package.json Outdated
@@ -263,6 +264,12 @@
"@mapbox/mapbox-gl-draw": "1.3.0",
"@mapbox/mapbox-gl-rtl-text": "0.2.3",
"@mapbox/vector-tile": "1.3.1",
"@opentelemetry/api-metrics": "0.29.2",
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.29.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to set these all to 0.30.0 I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

## OpenTelemetry Metrics

TODO: explain how to instrument the code with `@opentelemetry/api-metrics` so that the steps below will work with metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to do this to review the PR, so might be good to include something simple like a plugin setup counter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. I've added a very simple example to it. let me know if it's clear enough

@crespocarlos
Copy link
Contributor Author

crespocarlos commented Jul 11, 2022

@crespocarlos I'm not sure if we're gaining much by keeping my PoC commit history here. Did you include it intentionally?

I didn't pay attention to that. I can squash those

Did you explore adding an API integration test? I'm not sure, but if there's a way to inject a dummy plugin during the integration testing cycle, we could use that as a test-bed for instrumentation techniques.

I did. At first it seemed too complex to get all things in place, especially alerting to see any sort of data (I haven't thought about a dummy plugin). But I can definitely send more time here.

@crespocarlos crespocarlos force-pushed the 2220-otel-metrics-alerting-plugin branch from 6ffd8f6 to 2638cbb Compare July 11, 2022 08:36
@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

Comment on lines 40 to 44
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({
prometheus: {
enabled: true,
},
})}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({
prometheus: {
enabled: true,
},
})}`,
`--monitoring_collection.opentelemetry.metrics.prometheus.enabled=true`,

Pretty sure this should work too and maybe simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Confirmed locally.

export class PrometheusExporter extends MetricReader {
private readonly _prefix?: string;
private readonly _appendTimestamp: boolean;
private _serializer: PrometheusSerializer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be just _serializer - pretty sure the underscore is just copy-pasta from the otel js repo during my PoC work.

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean without the type? It can, but then TS will understand that this an any. Better to define the type as we're currently doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without the underscore I mean :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason my brain didn't read the underscore from your original comment lol. Yeah, I'll remove the underscore from these properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt to leave them, but there are more examples without _ than with in the codebase.

Copy link
Contributor

@matschaffer matschaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall, just some recommendations for polishing.

@@ -5,7 +5,7 @@
* 2.0.
*/

import { registerDynamicRoute } from './dynamic_route';
import { registerDynamicRoute } from '.';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the idea here is to get both the prometheus and dynamic route under a v1 directory. I'd probably leave that alone to keep the scope of the PR just on the otel stuff.

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave it alone too, but it looked too ugly, that's why I decided to move that to v1 folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based it on how monitoring plugin organizes its api folder. Leaving both in the root folder felt weird, and moving only prometheus to v1 seemed weird too.

package.json Outdated
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.30.0",
"@opentelemetry/exporter-prometheus": "^0.30.0",
"@opentelemetry/resources": "^1.3.1",
"@opentelemetry/sdk-metrics-base": "^0.29.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get all the 0.29s out of here and just focus on 0.30

Comment on lines 40 to 44
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({
prometheus: {
enabled: true,
},
})}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Confirmed locally.

@crespocarlos crespocarlos requested a review from matschaffer July 12, 2022 08:42

const credentials = url.startsWith('https://')
? grpc.credentials.createSsl()
: grpc.credentials.createInsecure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure with 0.30 we can drop this (and the grpc direct dependency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened crespocarlos#1 with the proposed change - still need to test it with/without SSL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly looks like the grpc dependency is still required to set headers, but at least we don't have to do our own secure/insecure check anymore.

@matschaffer
Copy link
Contributor

@elasticmachine merge upstream


const credentials = url.startsWith('https://')
? grpc.credentials.createSsl()
: grpc.credentials.createInsecure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened crespocarlos#1 with the proposed change - still need to test it with/without SSL

Copy link
Contributor

@matschaffer matschaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open a follow up issue for the https://opentelemetry.io/docs/reference/specification/protocol/exporter/ env vars. This is plenty good to merge (once you've sorted the merge conflict of course).

Great work @crespocarlos !

metrics.setGlobalMeterProvider(meterProvider);

const otlpConfig = this.config.opentelemetry?.metrics.otlp;
const url = otlpConfig?.url ?? process.env.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTEL_EXPORTER_OTLP_ENDPOINT should be allowable too. It'll be convenient once we align traces and metrics both on the otel spec. Could open that as a follow up issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the OTEL_EXPORTER_OTLP_ENDPOINT - I forgot about it. It's a small change.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@crespocarlos crespocarlos merged commit b58d07e into elastic:main Jul 14, 2022
@crespocarlos crespocarlos deleted the 2220-otel-metrics-alerting-plugin branch July 14, 2022 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Stack Monitoring release_note:skip Skip the PR/issue when compiling release notes Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants