Label thread names with tracing info #77791

grcevski · 2021-09-15T14:42:00Z

This PR is a proposal to solve #74580 by adding the X-Opaque-Id and trace.id header values in the Thread name field while the wrapped Runnable is running.

The mentioned issue above has some discussion on the possible alternatives and their pros/cons.

The change is localized to the private wrapped runnable implementations inside ThreadContext. Looking at usages I think this should be good for all client use cases, but I'm not 100% sure, I might've missed a class of threads or thread pools that can hold the tracing headers and not be handled by the two wrapped runnable implementations.

I also added a disclaimer in HotThreads that the tracing ids reported may not be accurate for short-lived request executions.

Closes #74580

Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers in requests to help with debugging issues. This change adds any tracing headers into the thread name for the duration of the request, such that this information can be visible in HotThreads or while using other profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ). Closes elastic#74580

elasticmachine · 2021-09-15T14:43:52Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

grcevski · 2021-09-15T16:28:25Z

@elasticmachine update branch

rjernst

One minor thought on testing. I don't have any strong opinions about whether modifying the thread name is ok, but I'm eager to hear @ywelsch's thoughts.

rjernst · 2021-09-16T22:40:27Z

server/src/test/java/org/elasticsearch/common/util/concurrent/ThreadContextTests.java

+                protected void doRun() {
+                    // Assertions are errors, turn this into exception to make sure we get the error on failure
+                    try {
+                        assertTrue(Thread.currentThread().getName().contains("some_id"));


Using assertThat(name, containsString("some_id")); would make any failures more readable and actionable, since we would see what failed to match

Ah, good idea! Thanks!

ywelsch

I think this might not cover all relevant cases (comment in-line). I'm also not very fond of the thread name String manipulation.

Taking a step back, I wonder if what we're ultimately interested in is to somehow marry _tasks API output with _hot_threads output, i.e., have a notion of "actively executing tasks" in the system, which is reflected in both tasks and _hot_threads APIs (and possibly other places). X-Opaque-ID allows to somehow join up this information. If ultimately the more structured information is useful, perhaps an approach as I've outlined in #74580 (comment) might be a better starting point. I'm also becoming less convinced that jstack/JFR/MissionControl or VisualVM support via thread name is really useful unless these tools help visualize how the thread name / tracing id has changed over time (e.g. does Mission control visualize this somehow?). X-Opaque-ID is useful in the context of _hot_threads as it directly gives you an idea about expensive user operations (so should be as accurate as possible when it comes to the X-Opaque-ID).

ywelsch · 2021-09-20T11:46:47Z

server/src/main/java/org/elasticsearch/common/util/concurrent/ThreadContext.java

+        }
+    }
+
+    public interface WithTracingIdsInThreadName {


What's the point of this interface? Aren't these just all simple static methods?

You are right, these methods are effectively static methods. I used an interface with default methods to encapsulate the two 'static' methods for use only in the wrapped runnables, since they rely logically on the second undoing the naming of the first.

ywelsch · 2021-09-20T13:03:44Z

server/src/main/java/org/elasticsearch/common/util/concurrent/ThreadContext.java

    }

    /**
     * Wraps a Runnable to preserve the thread context.
     */
-    private class ContextPreservingRunnable implements WrappedRunnable {
+    private class ContextPreservingRunnable implements WrappedRunnable, WithTracingIdsInThreadName {


I wonder if by just applying this to ContextPreservingRunnable and ContextPreservingAbstractRunnable we're missing some important places. Think for example about requests that are handled on the transport thread (using executor SAME), and which then set the headers using setHeaders (in particular the tracing ones). I also wonder about other nested calls of stashContext etc and whether there's a risk now that these would run / show under the wrong tracing id.

Thanks Yannik, this is exactly the sort of thing I wasn't sure about. Let me investigate the uses of the SAME executor and the transport thread to check if this works at all.

grcevski · 2021-09-20T17:20:50Z

Taking a step back, I wonder if what we're ultimately interested in is to somehow marry _tasks API output with _hot_threads output, i.e., have a notion of "actively executing tasks" in the system, which is reflected in both tasks and _hot_threads APIs (and possibly other places). X-Opaque-ID allows to somehow join up this information. If ultimately the more structured information is useful, perhaps an approach as I've outlined in #74580 (comment) might be a better starting point. I'm also becoming less convinced that jstack/JFR/MissionControl or VisualVM support via thread name is really useful unless these tools help visualize how the thread name / tracing id has changed over time (e.g. does Mission control visualize this somehow?). X-Opaque-ID is useful in the context of _hot_threads as it directly gives you an idea about expensive user operations (so should be as accurate as possible when it comes to the X-Opaque-ID).

I agree that if multiple places in the code would benefit from having a notion of all currently active tasks/operations/threads, then having a shared common repository like a concurrent Map would be a better solution. We'll just need to oversize the map to avoid false contention for quick requests.

MissionControl or VisualVM don't track thread names over time, but with real-time monitoring the thread names will still be useful, or when folks do thread dumps within the tool. I think naming the threads will be useful for many 3rd party tools, those 2 are just some examples. Heapdump analysis tools will benefit, you could tell the resident set per specific request-id. Another example is jstack, in case Elasticsearch is in a bad state and cannot complete the hot_threads call, jstack will likely still be able to dump the thread stacks, along with the thread names.

grcevski · 2021-09-21T18:30:46Z

Hi @ywelsch and @rjernst, I investigated the TransportService and it's uses of ThreadContext and it's true, there are few cases when we send a transport request or send a transport response that are not wrapped in one of the two context runnables, but simply run on the netty thread pool. The netty thread pool threads in these cases will not be labelled with any of the tracing ids.

If these cases with the TransportService are important to capture with tracing ids, then my solution is not workable as-is. Please let me know what you think, if this is a no-go, I think we should close this PR.

Wrapping only around the runnables is a very localized change, but it works only for the Elasticsearch thread pools. I guess it's independent of if we want to use a ConcurrentHashMap or name the threads, it's simple but doesn't cover exactly 100% of ThreadContext uses.

The alternative is to tap into the ThreadContext wrapping and unwrapping methods directly, using a ConcurrentHashMap rather than the thread name. I think @pgomulka already has a prototype for this approach.

elasticsearchmachine · 2022-01-12T09:54:37Z

Hi @grcevski, I've created a changelog YAML for you.

elasticsearchmachine added the v8.0.0 label Sep 15, 2021

grcevski added :Core/Infra/Core Core issues without another label >enhancement >feature Team:Core/Infra Meta label for core/infra team v7.16.0 labels Sep 15, 2021

grcevski removed the >feature label Sep 15, 2021

grcevski requested review from rjernst and ywelsch September 15, 2021 14:45

grcevski changed the title ~~Label thread pool names with tracing info~~ Label thread names with tracing info Sep 15, 2021

Merge branch 'master' into feature/hotthreads_tracing_id

94c6220

rjernst reviewed Sep 16, 2021

View reviewed changes

ywelsch reviewed Sep 20, 2021

View reviewed changes

danhermann added v8.1.0 and removed v7.16.0 labels Oct 27, 2021

arteam removed the v8.0.0 label Jan 12, 2022

Update docs/changelog/77791.yaml

f47afb0

grcevski closed this Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Label thread names with tracing info #77791

Label thread names with tracing info #77791

grcevski commented Sep 15, 2021

elasticmachine commented Sep 15, 2021

grcevski commented Sep 15, 2021

rjernst left a comment

rjernst Sep 16, 2021

grcevski Sep 20, 2021

ywelsch left a comment

ywelsch Sep 20, 2021

grcevski Sep 20, 2021

ywelsch Sep 20, 2021

grcevski Sep 20, 2021

grcevski commented Sep 20, 2021

grcevski commented Sep 21, 2021

elasticsearchmachine commented Jan 12, 2022

Label thread names with tracing info #77791

Label thread names with tracing info #77791

Conversation

grcevski commented Sep 15, 2021

elasticmachine commented Sep 15, 2021

grcevski commented Sep 15, 2021

rjernst left a comment

Choose a reason for hiding this comment

rjernst Sep 16, 2021

Choose a reason for hiding this comment

grcevski Sep 20, 2021

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Sep 20, 2021

Choose a reason for hiding this comment

grcevski Sep 20, 2021

Choose a reason for hiding this comment

ywelsch Sep 20, 2021

Choose a reason for hiding this comment

grcevski Sep 20, 2021

Choose a reason for hiding this comment

grcevski commented Sep 20, 2021

grcevski commented Sep 21, 2021

elasticsearchmachine commented Jan 12, 2022