-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label thread names with tracing info #77791
Conversation
Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers in requests to help with debugging issues. This change adds any tracing headers into the thread name for the duration of the request, such that this information can be visible in HotThreads or while using other profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ). Closes elastic#74580
Pinging @elastic/es-core-infra (Team:Core/Infra) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor thought on testing. I don't have any strong opinions about whether modifying the thread name is ok, but I'm eager to hear @ywelsch's thoughts.
protected void doRun() { | ||
// Assertions are errors, turn this into exception to make sure we get the error on failure | ||
try { | ||
assertTrue(Thread.currentThread().getName().contains("some_id")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using assertThat(name, containsString("some_id"));
would make any failures more readable and actionable, since we would see what failed to match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good idea! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not cover all relevant cases (comment in-line). I'm also not very fond of the thread name String manipulation.
Taking a step back, I wonder if what we're ultimately interested in is to somehow marry _tasks
API output with _hot_threads
output, i.e., have a notion of "actively executing tasks" in the system, which is reflected in both tasks
and _hot_threads
APIs (and possibly other places). X-Opaque-ID
allows to somehow join up this information. If ultimately the more structured information is useful, perhaps an approach as I've outlined in #74580 (comment) might be a better starting point. I'm also becoming less convinced that jstack/JFR/MissionControl or VisualVM support via thread name is really useful unless these tools help visualize how the thread name / tracing id has changed over time (e.g. does Mission control visualize this somehow?). X-Opaque-ID
is useful in the context of _hot_threads
as it directly gives you an idea about expensive user operations (so should be as accurate as possible when it comes to the X-Opaque-ID
).
} | ||
} | ||
|
||
public interface WithTracingIdsInThreadName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this interface? Aren't these just all simple static methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, these methods are effectively static methods. I used an interface with default methods to encapsulate the two 'static' methods for use only in the wrapped runnables, since they rely logically on the second undoing the naming of the first.
} | ||
|
||
/** | ||
* Wraps a Runnable to preserve the thread context. | ||
*/ | ||
private class ContextPreservingRunnable implements WrappedRunnable { | ||
private class ContextPreservingRunnable implements WrappedRunnable, WithTracingIdsInThreadName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if by just applying this to ContextPreservingRunnable
and ContextPreservingAbstractRunnable
we're missing some important places. Think for example about requests that are handled on the transport thread (using executor SAME
), and which then set the headers using setHeaders
(in particular the tracing ones). I also wonder about other nested calls of stashContext
etc and whether there's a risk now that these would run / show under the wrong tracing id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yannik, this is exactly the sort of thing I wasn't sure about. Let me investigate the uses of the SAME executor and the transport thread to check if this works at all.
I agree that if multiple places in the code would benefit from having a notion of all currently active tasks/operations/threads, then having a shared common repository like a concurrent Map would be a better solution. We'll just need to oversize the map to avoid false contention for quick requests. MissionControl or VisualVM don't track thread names over time, but with real-time monitoring the thread names will still be useful, or when folks do thread dumps within the tool. I think naming the threads will be useful for many 3rd party tools, those 2 are just some examples. Heapdump analysis tools will benefit, you could tell the resident set per specific request-id. Another example is jstack, in case Elasticsearch is in a bad state and cannot complete the hot_threads call, jstack will likely still be able to dump the thread stacks, along with the thread names. |
Hi @ywelsch and @rjernst, I investigated the TransportService and it's uses of ThreadContext and it's true, there are few cases when we send a transport request or send a transport response that are not wrapped in one of the two context runnables, but simply run on the netty thread pool. The netty thread pool threads in these cases will not be labelled with any of the tracing ids. If these cases with the TransportService are important to capture with tracing ids, then my solution is not workable as-is. Please let me know what you think, if this is a no-go, I think we should close this PR. Wrapping only around the runnables is a very localized change, but it works only for the Elasticsearch thread pools. I guess it's independent of if we want to use a ConcurrentHashMap or name the threads, it's simple but doesn't cover exactly 100% of ThreadContext uses. The alternative is to tap into the ThreadContext wrapping and unwrapping methods directly, using a ConcurrentHashMap rather than the thread name. I think @pgomulka already has a prototype for this approach. |
Hi @grcevski, I've created a changelog YAML for you. |
This PR is a proposal to solve #74580 by adding the X-Opaque-Id and trace.id header values in the Thread name field while the wrapped Runnable is running.
The mentioned issue above has some discussion on the possible alternatives and their pros/cons.
The change is localized to the private wrapped runnable implementations inside ThreadContext. Looking at usages I think this should be good for all client use cases, but I'm not 100% sure, I might've missed a class of threads or thread pools that can hold the tracing headers and not be handled by the two wrapped runnable implementations.
I also added a disclaimer in HotThreads that the tracing ids reported may not be accurate for short-lived request executions.
Closes #74580