-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] Add OpenTelemetry metrics to Monitoring Collection plugin #135999
[Stack Monitoring] Add OpenTelemetry metrics to Monitoring Collection plugin #135999
Conversation
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
@crespocarlos I'm not sure if we're gaining much by keeping my PoC commit history here. Did you include it intentionally? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you explore adding an API integration test? I'm not sure, but if there's a way to inject a dummy plugin during the integration testing cycle, we could use that as a test-bed for instrumentation techniques.
package.json
Outdated
@@ -263,6 +264,12 @@ | |||
"@mapbox/mapbox-gl-draw": "1.3.0", | |||
"@mapbox/mapbox-gl-rtl-text": "0.2.3", | |||
"@mapbox/vector-tile": "1.3.1", | |||
"@opentelemetry/api-metrics": "0.29.2", | |||
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.29.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to set these all to 0.30.0 I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
## OpenTelemetry Metrics | ||
|
||
TODO: explain how to instrument the code with `@opentelemetry/api-metrics` so that the steps below will work with metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to do this to review the PR, so might be good to include something simple like a plugin setup counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I've added a very simple example to it. let me know if it's clear enough
I didn't pay attention to that. I can squash those
I did. |
6ffd8f6
to
2638cbb
Compare
@elasticmachine merge upstream |
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({ | ||
prometheus: { | ||
enabled: true, | ||
}, | ||
})}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({ | |
prometheus: { | |
enabled: true, | |
}, | |
})}`, | |
`--monitoring_collection.opentelemetry.metrics.prometheus.enabled=true`, |
Pretty sure this should work too and maybe simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Confirmed locally.
export class PrometheusExporter extends MetricReader { | ||
private readonly _prefix?: string; | ||
private readonly _appendTimestamp: boolean; | ||
private _serializer: PrometheusSerializer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be just _serializer
- pretty sure the underscore is just copy-pasta from the otel js repo during my PoC work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean without the type? It can, but then TS will understand that this an any
. Better to define the type as we're currently doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without the underscore I mean :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason my brain didn't read the underscore from your original comment lol. Yeah, I'll remove the underscore from these properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't hurt to leave them, but there are more examples without _ than with in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good overall, just some recommendations for polishing.
@@ -5,7 +5,7 @@ | |||
* 2.0. | |||
*/ | |||
|
|||
import { registerDynamicRoute } from './dynamic_route'; | |||
import { registerDynamicRoute } from '.'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the idea here is to get both the prometheus and dynamic route under a v1
directory. I'd probably leave that alone to keep the scope of the PR just on the otel stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to leave it alone too, but it looked too ugly, that's why I decided to move that to v1 folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based it on how monitoring
plugin organizes its api folder. Leaving both in the root folder felt weird, and moving only prometheus
to v1 seemed weird too.
package.json
Outdated
"@opentelemetry/exporter-metrics-otlp-grpc": "^0.30.0", | ||
"@opentelemetry/exporter-prometheus": "^0.30.0", | ||
"@opentelemetry/resources": "^1.3.1", | ||
"@opentelemetry/sdk-metrics-base": "^0.29.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's get all the 0.29s out of here and just focus on 0.30
`--monitoring_collection.opentelemetry.metrics=${JSON.stringify({ | ||
prometheus: { | ||
enabled: true, | ||
}, | ||
})}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Confirmed locally.
|
||
const credentials = url.startsWith('https://') | ||
? grpc.credentials.createSsl() | ||
: grpc.credentials.createInsecure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure with 0.30 we can drop this (and the grpc direct dependency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened crespocarlos#1 with the proposed change - still need to test it with/without SSL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly looks like the grpc dependency is still required to set headers, but at least we don't have to do our own secure/insecure check anymore.
x-pack/plugins/monitoring_collection/server/routes/api/v1/dynamic_route/index.test.ts
Outdated
Show resolved
Hide resolved
@elasticmachine merge upstream |
x-pack/plugins/monitoring_collection/server/routes/api/v1/prometheus/index.ts
Show resolved
Hide resolved
|
||
const credentials = url.startsWith('https://') | ||
? grpc.credentials.createSsl() | ||
: grpc.credentials.createInsecure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened crespocarlos#1 with the proposed change - still need to test it with/without SSL
Review: [Stack Monitoring] Add OpenTelemetry metrics to Monitoring Collection plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open a follow up issue for the https://opentelemetry.io/docs/reference/specification/protocol/exporter/ env vars. This is plenty good to merge (once you've sorted the merge conflict of course).
Great work @crespocarlos !
metrics.setGlobalMeterProvider(meterProvider); | ||
|
||
const otlpConfig = this.config.opentelemetry?.metrics.otlp; | ||
const url = otlpConfig?.url ?? process.env.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTEL_EXPORTER_OTLP_ENDPOINT
should be allowable too. It'll be convenient once we align traces and metrics both on the otel spec. Could open that as a follow up issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the OTEL_EXPORTER_OTLP_ENDPOINT
- I forgot about it. It's a small change.
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
Summary
This PR creates Otel integration in Kibana and closes https://github.com/elastic/observability-dev/issues/2220. It's based on this PoC #133171, but removes the instrumentation code part, leaving only the essential code to enable the integration.
Screenshots
With Prometheus endpoint enabled
With the OpenTelemetry Metrics API exported as OpenTelemetry protocol enabled
Review notes
monitoring_collection
plugin routes folder to organise it better, following the standard found inmonitoring
pluginalerting/monitoring
folder from the PoC PR https://github.com/elastic/kibana/pull/133171/filesTodo
While testing this, I found a possible bug in the Prometheus package that causes histogram type fail to be ingested