-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate ES with APM #87696
Closed
pugnascotia
wants to merge
109
commits into
elastic:main
from
pugnascotia:apm-integration-with-agent
Closed
Integrate ES with APM #87696
Changes from 106 commits
Commits
Show all changes
109 commits
Select commit
Hold shift + click to select a range
9325cbf
Base module for APM Tracing (#80705)
tlrx 22ecf6f
First set of deps (#80720)
tlrx ba34d70
Integrate tracer with task manager (#80721)
DaveCTurner 13eddb9
Merge branch 'master' into feature/apm-integration
DaveCTurner d5a2503
Use OpenTelemetry with HTTP/gRPC exporters in apm-integration (#80762)
tlrx 34239d4
Add Traceable interface (#80788)
ywangd 2f51e5f
Capture task span context in thread context to parent nested tasks (#…
dimitris-athanasiou a23bf66
[APM] Add multi-shard search test case (#80792)
DaveCTurner 69caefd
Merge branch 'master' into feature/apm-integration
DaveCTurner 3a304f2
Remove unused TracingPlugin interface (#80799)
DaveCTurner 8bd8a24
single service + few attributes
SylvainJuge 110bb00
tune a few minor things
SylvainJuge 4a1a899
adding dynamic setting `xpack.apm.tracing.enabled` (#80796)
idegtiarenko d03a457
Merge branch 'feature/apm-integration' into sylvain
DaveCTurner 9f68a26
Spotless
DaveCTurner 76a9414
Merge remote-tracking branch 'origin/master' into feature/apm-integra…
ywangd f3f9835
Add tracing for authorization (#80815)
ywangd a7e2359
Merge branch 'feature/apm-integration' of github.com:elastic/elastics…
SylvainJuge ed6223c
use otel sem attributes when we can
SylvainJuge 79f0da3
Merge branch 'master' into feature/apm-integration
DaveCTurner d95c634
Trace recoveries and cluster state updates (#80875)
DaveCTurner 0d58db7
Add `xpack.apm.tracing.names.include` setting for filtering (#80871)
dimitris-athanasiou 04c76c6
Merge remote-tracking branch 'upstream/master' into feature/apm-integ…
pugnascotia 7e0c606
Merge remote-tracking branch 'upstream/master' into feature/apm-integ…
pugnascotia 8132527
Fix compilation issue
pugnascotia 84b558d
Update SHAs
pugnascotia 3badd42
Compilation fix
pugnascotia 3d35bd4
Tweaks
pugnascotia 2e3aba1
Formatting
pugnascotia 36c8943
Fix 3rd party errors
pugnascotia eed58a6
Merge remote-tracking branch 'upstream/master' into feature/apm-integ…
pugnascotia 99b948c
WIP - hacks to make distributed tracing work
pugnascotia 2dec258
WIP - trying to get REST tracing working
pugnascotia 4e7f9dc
WIP - more messing around
pugnascotia 11dc5e1
HACK HACK HACK
pugnascotia e7cca58
OMG I think it's working
pugnascotia 461226b
Seems to be working now :tada:
pugnascotia b646a69
Hacks to try to use the APM Java agent
pugnascotia fddc9e8
Merge remote-tracking branch 'upstream/master' into feature/apm-integ…
pugnascotia 64adb49
Formatting
pugnascotia cd6f1ef
Improve REST tracing
pugnascotia f04c6ff
Update to latest APM agent
pugnascotia b772714
Don't log graphviz by default
pugnascotia 2f45a51
Tweaks
pugnascotia ec612a6
Rework trace header stashing
pugnascotia 6ee831f
Merge branch 'feature/apm-integration' into apm-integration-with-agent
pugnascotia b633f7e
Fixes
pugnascotia 4810a7f
Managed to get traces to ship if I hack the APM agent
pugnascotia bda6ea2
Move java agent CLI option into plugin descriptor
pugnascotia fb87e79
Tweak for adding java opts via modules
pugnascotia b76910b
Header fixes
pugnascotia a7266b3
Add run script
pugnascotia 48bebd3
Header fixes
pugnascotia 1436637
Tweaks
pugnascotia dfb8f8b
Detach tracing when starting an index's background tasks
pugnascotia d94ec59
Detach tracing when starting an index's background tasks
pugnascotia d834dd5
Start a doc about tracing
pugnascotia b611835
Span attribte tweaks
pugnascotia 53058b3
Span attribte tweaks
pugnascotia 0989f17
Add extra docker tag
pugnascotia 1be7e7e
Tweaks
pugnascotia fd5f3cb
Bump APM agent
pugnascotia ef6ae5f
Get tracing across nodes working again
pugnascotia a0978c9
Compilation fixes
pugnascotia 48a6486
Merge remote-tracking branch 'upstream/master' into feature/apm-integ…
pugnascotia ca896fb
Bump version in run.sh
pugnascotia 7e0c535
Merge branch 'feature/apm-integration' into apm-integration-with-agent
pugnascotia 01d9c42
Fix
pugnascotia fdbe843
Fixes for using latest agent version
pugnascotia 6a23b24
Fully configure the APM via config file in the module
pugnascotia 694180b
Tidy up
pugnascotia 89f90c2
Mass-refactoring
pugnascotia d2efa7e
Fixes
pugnascotia 6bb80b6
Formatting
pugnascotia eefb53a
Beginnings of an end-to-end APM test
pugnascotia e9695d2
Get the APM integration test working
pugnascotia 7164dd8
Test fixes
pugnascotia e6d5e4f
Merge remote-tracking branch 'upstream/master' into apm-integration-w…
pugnascotia e56427b
Add support for opening Scope via the Tracer
pugnascotia aad7f4c
Make it possible to configure APM agent via settings API
pugnascotia 08da3a3
Fix apm settings to work under assertions
pugnascotia f63a154
Updates to TRACING.md
pugnascotia 31ff299
Tweaks
pugnascotia fd6a9a9
More testing
pugnascotia 562e6ed
More testing
pugnascotia f8431e7
More TaskManager unit tests
pugnascotia c55632a
Add unit testing
pugnascotia d329b7a
Make qa test work again
pugnascotia 854c8c0
Merge remote-tracking branch 'origin/apm-integration-with-agent' into…
pugnascotia 9dcf369
More notes on tracing
pugnascotia e0f70de
Merge remote-tracking branch 'upstream/master' into apm-integration-w…
pugnascotia 3c4e323
Add an exclude filter and filtering unit tests
pugnascotia 76ebf99
Switch to automaton instead of regexes
pugnascotia ccc47c6
Redact sensitive http headers
pugnascotia 0b911af
Merge remote-tracking branch 'upstream/master' into apm-integration-w…
pugnascotia f0dbe4a
Post-merge fixes
pugnascotia 55772a2
Tweak log4j security policy
pugnascotia 8be329f
Switch to auto-generating an APM config
pugnascotia feb3b3f
Shorten APM settings prefix
pugnascotia 762d6fe
Bump APM agent to 1.32.0
pugnascotia 9b2685d
Upgrade opentelemetry
pugnascotia e21346c
Javadoc
pugnascotia 7c3084e
Merge remote-tracking branch 'upstream/master' into apm-integration-w…
pugnascotia 8f98e7e
Update TRACING.md
pugnascotia 2d3b30b
General fixing and polishing
pugnascotia 15a1baf
Merge remote-tracking branch 'upstream/master' into apm-integration-w…
pugnascotia 52fd4d3
Remove debug gradle config
pugnascotia 5e0892b
Fix typo
pugnascotia 4912eec
Remove run script
pugnascotia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
# Tracing in Elasticsearch | ||
|
||
Elasticsearch is instrumented using the [OpenTelemetry][otel] API, which allows | ||
us to gather traces and analyze what Elasticsearch is doing. | ||
|
||
## How is tracing implemented? | ||
|
||
The Elasticsearch server code contains a [`tracing`][tracing] package, which is | ||
an abstraction over the OpenTelemetry API. All locations in the code that | ||
perform instrumentation and tracing must use these abstractions. | ||
|
||
Separately, there is the [`apm-integration`](./x-pack/plugins/apm-integration/) | ||
module, which works with the OpenTelemetry API directly to record trace data. | ||
Underneath the OTel API, we use Elastic's [APM agent for Java][agent], which | ||
attaches at runtime to the Elasticsearch JVM and removes the need for | ||
Elasticsearch to hard-code the use of an SDK. | ||
|
||
## How is tracing configured? | ||
|
||
* The `xpack.apm.enabled` setting must be set to `true` | ||
* You must supplied credentials for the APM server. See below. | ||
|
||
All APM settings live under `xpack.apm`. All settings related to the Java agent | ||
go under `xpack.apm.agent`. Anything you set under there will be propagated to | ||
the agent. | ||
|
||
For agent settings that can be changed dynamically, you can use the cluster | ||
settings REST API. For example, to change the sampling rate: | ||
|
||
curl -XPUT \ | ||
-H "Content-type: application/json" \ | ||
-u "$USERNAME:$PASSWORD" \ | ||
-d '{ "persistent": { "xpack.apm.agent.transaction_sample_rate": "0.75" } }' \ | ||
https://localhost:9200/_cluster/settings | ||
|
||
### More details about configuration | ||
|
||
For context, the APM agent pulls configuration from [multiple | ||
sources][agent-config], with a hierarchy that means, for example, that options | ||
set in the config file cannot be overridden via system properties. | ||
|
||
Now, in order to send tracing data to the APM server, ES needs to configured with | ||
either a `secret_key` or an `api_key`. We could configure these in the agent via | ||
system properties, but then their values would be available to any Java code | ||
that can read system properties. | ||
|
||
Instead, when Elasticsearch bootstraps itself, it compiles all APM settings | ||
together, including any `secret_key` or `api_key` values from the ES keystore, | ||
and writes out a temporary APM config file containin all static configuration | ||
(i.e. values that cannot change after the agent starts). This file is deleted | ||
soon after ES starts up. Settings that are not sensitive and can be changed | ||
dynamically are configure via system properties. Calls to the ES settings REST | ||
API are translated into system property writes, which the agent later picks up | ||
and applies. | ||
|
||
## Where is tracing data sent? | ||
|
||
You need to have an APM server running somewhere. For example, you can | ||
create a deployment in Elastic Cloud with Elastic's APM integration. | ||
|
||
## What do we trace? | ||
|
||
We primarily trace "tasks". The tasks framework in Elasticsearch allows work to | ||
scheduled for execution, cancelled, executed in a different thread pool, and so | ||
on. Tracing a task results in a "span", which represents the execution of the | ||
task in the tracing system. We also instrument REST requests, which are not (at | ||
present) modelled by tasks. | ||
|
||
A span can be associated with a parent span, which allows all spans in, for | ||
example, a REST request to be grouped together. Spans can track work across | ||
different Elasticsearch nodes. | ||
|
||
Elasticsearch also supports distributed tracing via [W3c Trace Context][w3c] | ||
headers. If clients of Elasticsearch send these headers with their requests, | ||
then that data will be forwarded to the APM server in order to yield a trace | ||
across systems. | ||
|
||
## Thread contexts and nested spans | ||
|
||
When a span is started, Elasticsearch tracks information about that span in the | ||
current [thread context][thread-context]. If a new thread context is created, | ||
then current span information is propagated but renamed, so that (1) it doesn't | ||
interfere when new trace information is set in the context, and (2) the previous | ||
trace information is available to establish a parent / child span relationship. | ||
|
||
Sometimes we need to detach new spans from their parent. For example, creating | ||
an index starts some related background tasks, but these shouldn't be associated | ||
with the REST request, otherwise all the background task spans will be | ||
associated with the REST request for as long as Elasticsearch is running. | ||
`ThreadContext` provides the `clearTraceContext`() method for this purpose. | ||
|
||
## How to I trace something that isn't a task? | ||
|
||
First work out if you can turn it into a task. No, really. | ||
|
||
If you can't do that, you'll need to ensure that your class can get access to a | ||
`Tracer` instance (this is available to inject, or you'll need to pass it when | ||
your class is created). Then you need to call the appropriate methods on the | ||
tracer when a span should start and end. | ||
|
||
## What additional attributes should I set? | ||
|
||
That's up to you. Be careful not to capture anything that could leak sensitive | ||
or personal information. | ||
|
||
## What is "scope" and when should I used it? | ||
|
||
Usually you won't need to. | ||
|
||
That said, sometimes you may want more details to be captured about a particular | ||
section of code. You can think of "scope" as representing the currently active | ||
tracing context. Using scope allows the APM agent to do the following: | ||
|
||
* Enables automatic correlation between the "active span" and logging, where | ||
logs have also been captured. | ||
* Enables capturing any exceptions thrown when the span is active, and linking | ||
those exceptions to the span | ||
* Allows the sampling profiler to be used as it allows samples to be linked to | ||
the active span (if any), so the agent can automatically get extra spans | ||
without manual instrumentation. | ||
|
||
However, a scope must be closed in the same thread in which it was opened, which | ||
cannot be guaranteed when using tasks. | ||
|
||
In the OpenTelemetry documentation, spans, scope and context are fairly | ||
straightforward to use, since `Scope` is an `AutoCloseable` and so can be | ||
easily created and cleaned up use try-with-resources blocks. Unfortunately, | ||
Elasticsearch is a complex piece of software, and also extremely asynchronous, | ||
so the typical OpenTelemetry examples do not work. | ||
|
||
Nonetheless, it is possible to manually use scope where we need more detail by | ||
explicitly opening a scope via the `Tracer`. | ||
|
||
|
||
[otel]: https://opentelemetry.io/ | ||
[thread-context]: ./server/src/main/java/org/elasticsearch/common/util/concurrent/ThreadContext.java). | ||
[w3c]: https://www.w3.org/TR/trace-context/ | ||
[tracing]: ./server/src/main/java/org/elasticsearch/tracing/ | ||
[config]: ./x-pack/plugin/apm-integration/src/main/config/elasticapm.properties | ||
[agent-config]: https://www.elastic.co/guide/en/apm/agent/java/master/configuration.html | ||
[agent]: https://www.elastic.co/guide/en/apm/agent/java/current/index.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -65,11 +65,14 @@ | |
import java.io.UncheckedIOException; | ||
import java.net.URL; | ||
import java.nio.charset.StandardCharsets; | ||
import java.nio.file.FileVisitResult; | ||
import java.nio.file.Files; | ||
import java.nio.file.NoSuchFileException; | ||
import java.nio.file.Path; | ||
import java.nio.file.SimpleFileVisitor; | ||
import java.nio.file.StandardCopyOption; | ||
import java.nio.file.StandardOpenOption; | ||
import java.nio.file.attribute.BasicFileAttributes; | ||
import java.time.Instant; | ||
import java.util.ArrayList; | ||
import java.util.Arrays; | ||
|
@@ -1356,19 +1359,12 @@ private void createConfiguration() { | |
StandardOpenOption.CREATE | ||
); | ||
|
||
final List<Path> configFiles; | ||
try (Stream<Path> stream = Files.list(getDistroDir().resolve("config"))) { | ||
configFiles = stream.collect(Collectors.toList()); | ||
} | ||
logToProcessStdout("Copying additional config files from distro " + configFiles); | ||
for (Path file : configFiles) { | ||
Path dest = configFile.getParent().resolve(file.getFileName()); | ||
if (Files.exists(dest) == false) { | ||
Files.copy(file, dest); | ||
} | ||
} | ||
final Path distConfigDir = getDistroDir().resolve("config"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These changes were ultimately unnecessary, but I've kept them because the make it possible to retain the config file hierarchy when starting a test cluster. |
||
final RecursiveCopyFileVisitor visitor = new RecursiveCopyFileVisitor(distConfigDir); | ||
Files.walkFileTree(distConfigDir, visitor); | ||
logToProcessStdout("Copied additional config files from distro: " + visitor.getCopiedFiles()); | ||
} catch (IOException e) { | ||
throw new UncheckedIOException("Could not write config file: " + configFile, e); | ||
throw new UncheckedIOException("Could not write config file: " + e.getMessage(), e); | ||
} | ||
|
||
tweakJvmOptions(configFileRoot); | ||
|
@@ -1686,4 +1682,37 @@ private static class LinkCreationException extends UncheckedIOException { | |
super(message, cause); | ||
} | ||
} | ||
|
||
private class RecursiveCopyFileVisitor extends SimpleFileVisitor<Path> { | ||
private final Path sourceDir; | ||
private final List<Path> copiedFiles; | ||
|
||
RecursiveCopyFileVisitor(Path sourceDir) { | ||
this.sourceDir = sourceDir; | ||
this.copiedFiles = new ArrayList<>(); | ||
} | ||
|
||
public List<Path> getCopiedFiles() { | ||
return copiedFiles; | ||
} | ||
|
||
@Override | ||
public FileVisitResult preVisitDirectory(Path sourceDir, BasicFileAttributes attrs) throws IOException { | ||
final Path relativePath = this.sourceDir.relativize(sourceDir); | ||
final Path destPath = configFile.getParent().resolve(relativePath); | ||
if (Files.notExists(destPath)) { | ||
Files.createDirectory(destPath); | ||
} | ||
return FileVisitResult.CONTINUE; | ||
} | ||
|
||
@Override | ||
public FileVisitResult visitFile(Path sourcePath, BasicFileAttributes attrs) throws IOException { | ||
final Path relativePath = sourceDir.relativize(sourcePath); | ||
final Path destPath = configFile.getParent().resolve(relativePath); | ||
Files.copy(sourcePath, destPath, StandardCopyOption.REPLACE_EXISTING); | ||
copiedFiles.add(sourcePath); | ||
return FileVisitResult.CONTINUE; | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.