-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] PoC for kibana instrumentation using opentelemetry metrics sdk #128755
Comments
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
This should go in @elastic/infra-monitoring-ui cycle 9 once it gets created. |
Reposting an early diagram from @chrisronline on how the kibana internal API might look. |
Love this effort! I'll just add some thoughts about the parts in the diagram above. I don't have a strong opinion on the options listed in this issue, but I do want to stress the desire to make writes and reads as easy as possible for Kibana plugin owners. In an ideal world, they are able to either directly use some open telemetry SDK to write metrics (in my example above, I abstracted this detail away by adding write apis to the The other part of this that I think is important to mention is how the Stack Monitoring plugin evolves as a result of this - IMO, it should turn into a pure read plugin that subscribes to the same read APIs that other Kibana plugins do. It still has a significant purpose because it is the place where users will see metrics at a birds-eye view, which is very helpful in understanding correlation of problems. I know these things are probably in everyone's mind around this effort, but I don't see it mentioned explicitly so I want to ensure we have a plan for this too |
Did you consider an abstraction so that plugin authors would just have a read API on observability data and the location of these data (local versus remote Elasticsearch) would be injected by the "Platform Observability" configuration? |
Exactly what I think we should do - in my model above, that's the other purpose of Now, how that configuration gets there is another story. Following the stack monitoring path, we'd just need to document the need to configure it appropriately but maybe there is something fancy that Elastic agent can do here - I'm not well versed in that area. |
For approach, I'm planning to try to replicate #123726 for a good metric comparison. If there's anything in that new response ops work that we can't do with the otel metricspace, we should highlight it as early as possible. |
Noting that open-telemetry/opentelemetry-js#2929 is merged, so we may be able to use a >0.27 version here. That was the PR blocking grpc support in the 0.28 release. Current as of writing is 0.29. |
So I got some data coming from something along side
Issues so far:
|
Yeah, definitely need to move metric creation up. I put it in the |
Doc counts still seem really high. Not sure what's up with that. update apm-server delivers once a minute with |
Success! This is option 3 running in ESS by adding this to the kibana configuration: monitoring_collection.opentelemetry.metrics:
otlp:
url: "https://MY-MONITORING-CLUSTER.apm.us-west2.gcp.elastic-cloud.com"
headers:
Authorization: "Bearer REDACTED"
prometheus.enabled: true The prometheus endpoint is active too: I'm trying to see if I can get the ESS-included agent polling it, but not sure if that's possible. Might have to attach a self-managed agent. |
We have a demo & notes posted internally (https://drive.google.com/file/d/1uAOvX9IXi5Y3D2QhrMu2pMm8yplxXxbn/view?usp=sharing) which I think meets the acceptance criterial for this issue. The PoC PR is still open and I'll open new issues to work toward merging it as the conversation evolves. |
We discussed a number of possible implementations for ongoing kibana instrumentations in (internal) https://github.com/elastic/observability-dev/issues/2054
In this issue we'll build a proof of concept for how that might work.
Here are the two options we'd like to PoC on. They should both be very similar at the code level, the main difference is the collection mechanism (pull from metricbeat vs push to apm-server).
option 2: OpenTelemetry Metrics API prometheus endpoint with Elastic Agent prometheus input
Here we use the official otel metrics sdk and expose that via prometheus protocol for elastic-agent to poll via the underlying metricbeat prometheus module.
option 3: OpenTelemetry Metrics API exported as OpenTelemetry Protocol
Here we use the official otel metrics sdk and push that via OpenTelemetry Protocol. OpenTelemetry Protocol is natively supported by Elastic APM so we use that to receive the data. There are some caveats for otel collection, but none of them should hinder the collection of platform observability metrics today.
Ideally this apm-server is managed by elastic-agent, but that work is still TBD. See 2022-01 - Elastic Agent Pipeline Runtime Environment for latest info.
Some consumers to keep in mind (see internal companion issue):
Steps
AC: Recording of PoC as walkthrough
The text was updated successfully, but these errors were encountered: