Instrument Elasticsearch with APM #84369

pugnascotia · 2022-02-24T20:24:39Z

elasticmachine · 2022-02-24T20:24:42Z

Pinging @elastic/es-delivery (Team:Delivery)

elasticmachine · 2022-02-24T20:24:42Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

Part of elastic#84369. Split out from elastic#87696. Rework how some work is executed by creating child tasks for them, so that when traced by APM, it results in more meaningful parent and child tasks in the UI. It also improves how Elasticsearch is modelling the work.

Part of #84369. Split out from #87696. Rework how some work is executed by creating child tasks for them, so that when traced by APM, it results in more meaningful parent and child tasks in the UI. It also improves how Elasticsearch is modelling the work.

lizozom · 2022-07-20T14:05:01Z

Do you have a rough estimate what would be the initial release version of this?

Part of #84369. Split out from #87696. Introduce tracing interfaces in advance of adding APM support to Elasticsearch. The only implementation at this point is a no-op class.

Split out from elastic#88443. Part of elastic#84369. Use the tracing API that was added in elastic#87921 in TaskManager. This won't actually do anything until we provide a tracer with an actual implemenation.

Split out from #88443. Part of #84369. Use the tracing API that was added in #87921 in TaskManager. This won't actually do anything until we provide a tracer with an actual implemenation.

Part of elastic#84369. Split out from elastic#88443. This PR wraps parts of the code either in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal. Note that in some places we actually clear the tracing context completely. This is done where the operation to be performed should have no association with the current trace context. For example, when creating a new index via a REST request, the resulting background tasks for the index should not be associated with the REST request in perpetuity.

Part of #84369. Split out from #88443. This PR wraps parts logic in `InternalExecutePolicyAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. Split out from #88443. This PR wraps parts logic in `AsyncTaskManagementService` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. Split out from #88443. This PR wraps parts logic in `TransportSubmitAsyncSearchAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. ML uses the task framework to register a tasks for each loaded model. These tasks are not executed in the usual sense, and it does not make sense to trace them using APM. Therefore, make it possible to register a task without also starting tracing.

Part of #84369. Split out from #88443. This PR wraps parts of the code in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal. In order to avoid future diff noise, the wrapped code has mostly been refactored into methods. Note that in some places we actually clear the tracing context completely. This is done where the operation to be performed should have no association with the current trace context. For example, when creating a new index via a REST request, the resulting background tasks for the index should not be associated with the REST request in perpetuity.

Part of #84369. Implement the `Tracer` interface by providing a module that uses OpenTelemetry, along with Elastic's APM agent for Java. See the file `TRACING.md` for background on the changes and the reasoning for some of the implementation decisions. The configuration mechanism is the most fiddly part of this PR. The Security Manager permissions required by the APM Java agent make it prohibitive to start an agent from within Elasticsearch programmatically, so it must be configured when the ES JVM starts. That means that the startup CLI needs to assemble the required JVM options. To complicate matters further, the APM agent needs a secret token in order to ship traces to the APM server. We can't use Java system properties to configure this, since otherwise the secret will be readable to all code in Elasticsearch. It therefore has to be configured in a dedicated config file. This in itself is awkward, since we don't want to leave secrets in config files. Therefore, we pull the APM secret token from the keystore, write it to a config file, then delete the config file after ES starts. There's a further issue with the config file. Any options we set in the APM agent config file cannot later be reconfigured via system properties, so we need to make sure that only "static" configuration goes into the config file. I generated most of the files under `qa/apm` using an APM test utility (I can't remember which one now, unfortunately). The goal is to setup up a complete system so that traces can be captured in APM server, and the results in Elasticsearch inspected.

pugnascotia · 2022-08-03T13:15:13Z

Do you have a rough estimate what would be the initial release version of this?

@lizozom It should be in 8.5.0.

pugnascotia · 2022-11-14T09:59:03Z

I think we've done everything that we intended to be covered by this issue 🎉

nicholas-r-king · 2023-10-30T21:13:44Z

@pugnascotia is this ever going to be released publicly or will customers ever be able to take advantage of tracing/APM metrics for monitoring our Elasticsearch clusters?

pugnascotia · 2023-11-07T08:46:44Z

It depends what you mean by "publicly" - on-prem customers could use this today. Cloud customers cannot, since APM data collection is not multi-tenant. If a Cloud customer has an issue that required APM data to resolve, they'd have to engage with Support. This would likely be necessary in any case since the APM data is a very low-level tool for investigating Elasticsearch issues.

nicholas-r-king · 2023-11-08T04:04:21Z

We are an enterprise customer with several on-prem installations (aws/govcloud) and have a use for this.

pugnascotia · 2023-11-08T09:58:57Z

In that case you can definitely configure APM yourselves. We don't have user-facing documentation yet, but you can consult TRACING.md for how to get started.

ramdaspotale · 2024-09-25T10:29:24Z

Hi @pugnascotia is there similar feature available for kibana?

I run elasticsearch on our own and i want to understand if i wanted to pass on additional resource attributes like data_stream.dataset, data_stream.namespace etc is this possible? so that i could send these traces to separate datastream in elasticsearch

philippkahr · 2024-09-25T10:46:34Z

@ramdaspotale , yes you have traces@custom as an ingest pipeline and there you can do whatever you want to the data, same for metrics, logs, so you would do that after the data is sent and not inside the APM.

Just keep in mind that changing the data_stream and namespace can have negative consequences if you do not do it correctly (index templates, component templates).

Use the reroute processor (https://www.elastic.co/guide/en/elasticsearch/reference/current/reroute-processor.html).

ramdaspotale · 2024-09-25T12:11:34Z

Hi @philippkahr - i am using this elastic/apm-data#201 feature from apm server to segregate traces coming from different applications in our environment so that search against each application can be done separately without overloading ES.

using traces@custom ingest pipeline with reroute processor would add some load on ES given how busy our prod elasticsearch is.

and as this feature is already there in APM 8.13 i was curious if i could use it in this scenario as well. add data_stream.dataset, data_stream.namespace and rest assured that it will happen automatically.

DaveCTurner · 2024-09-25T12:19:02Z

Thanks very much for your interest in Elasticsearch @ramdaspotale.

This appears to be a user question, and we'd like to direct these kinds of things to the Elasticsearch forum. If you can move this conversation there, we'd appreciate it. This allows us to use GitHub for verified bug reports, feature requests, and pull requests.

There's an active community in the forum that should be able to help get an answer to your question. As such, I hope you don't mind that I am marking this thread as resolved.

pugnascotia added >feature :Core/Infra/Core Core issues without another label :Delivery/Tooling Developer tooliing and automation labels Feb 24, 2022

pugnascotia self-assigned this Feb 24, 2022

elasticmachine added Team:Delivery Meta label for Delivery team Team:Core/Infra Meta label for core/infra team labels Feb 24, 2022

This was referenced Mar 7, 2022

Search API response time breakdown #21073

Open

Better tooling/logs for troubleshooting long running CCS requests #73922

Open

felixbarny mentioned this issue Mar 31, 2022

Elasticsearch instrumentation elastic/apm-agent-java#2550

Closed

7 tasks

This was referenced Apr 11, 2022

[Stack Monitoring] Investigate any already-collected data we might have regarding ingest pipelines elastic/kibana#129351

Closed

[Stack Monitoring] Discuss: Collection options for ingest pipeline monitoring elastic/kibana#130078

Closed

pugnascotia mentioned this issue Jun 15, 2022

Integrate ES with APM #87696

Closed

2 tasks

javanna mentioned this issue Jun 16, 2022

Count shards skipped on the coordinating node #86690

Closed

This was referenced Jun 22, 2022

Refactor tasks to improve APM support #87917

Merged

Introduce tracing interfaces #87921

Merged

javanna mentioned this issue Jul 8, 2022

Improve query performance analysis UX #88370

Open

pugnascotia mentioned this issue Jul 11, 2022

Provide tracing implementation using OpenTelemetry + APM agent #88443

Merged

pugnascotia mentioned this issue Jul 28, 2022

Use tracing API in TaskManager #88885

Merged

This was referenced Jul 28, 2022

Wrap code in new tracing contexts where required #88920

Merged

Wrap async search action logic in a new trace context #88937

Merged

This was referenced Aug 2, 2022

Wrap enrich execute action in new tracing context #89021

Merged

Wrap ML model loading task in new tracing context #89024

Merged

pugnascotia mentioned this issue Aug 2, 2022

Wrap async QL task execution in new tracing context #89029

Merged

pugnascotia closed this as completed Nov 14, 2022

javanna mentioned this issue Nov 16, 2022

Add a 'hot queries' API for sampling query details #34807

Closed

elastic locked as resolved and limited conversation to collaborators Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument Elasticsearch with APM #84369

Instrument Elasticsearch with APM #84369

pugnascotia commented Feb 24, 2022 •

edited

Loading

elasticmachine commented Feb 24, 2022

elasticmachine commented Feb 24, 2022

lizozom commented Jul 20, 2022

pugnascotia commented Aug 3, 2022

pugnascotia commented Nov 14, 2022

nicholas-r-king commented Oct 30, 2023

pugnascotia commented Nov 7, 2023

nicholas-r-king commented Nov 8, 2023

pugnascotia commented Nov 8, 2023

ramdaspotale commented Sep 25, 2024 •

edited

Loading

philippkahr commented Sep 25, 2024

ramdaspotale commented Sep 25, 2024

DaveCTurner commented Sep 25, 2024 •

edited

Loading

Instrument Elasticsearch with APM #84369

Instrument Elasticsearch with APM #84369

Comments

pugnascotia commented Feb 24, 2022 • edited Loading

Description

Tasks

Out-of-scope

elasticmachine commented Feb 24, 2022

elasticmachine commented Feb 24, 2022

lizozom commented Jul 20, 2022

pugnascotia commented Aug 3, 2022

pugnascotia commented Nov 14, 2022

nicholas-r-king commented Oct 30, 2023

pugnascotia commented Nov 7, 2023

nicholas-r-king commented Nov 8, 2023

pugnascotia commented Nov 8, 2023

ramdaspotale commented Sep 25, 2024 • edited Loading

philippkahr commented Sep 25, 2024

ramdaspotale commented Sep 25, 2024

DaveCTurner commented Sep 25, 2024 • edited Loading

pugnascotia commented Feb 24, 2022 •

edited

Loading

ramdaspotale commented Sep 25, 2024 •

edited

Loading

DaveCTurner commented Sep 25, 2024 •

edited

Loading