Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label thread names with tracing info #77791

Closed

Conversation

grcevski
Copy link
Contributor

This PR is a proposal to solve #74580 by adding the X-Opaque-Id and trace.id header values in the Thread name field while the wrapped Runnable is running.

The mentioned issue above has some discussion on the possible alternatives and their pros/cons.

The change is localized to the private wrapped runnable implementations inside ThreadContext. Looking at usages I think this should be good for all client use cases, but I'm not 100% sure, I might've missed a class of threads or thread pools that can hold the tracing headers and not be handled by the two wrapped runnable implementations.

I also added a disclaimer in HotThreads that the tracing ids reported may not be accurate for short-lived request executions.

Closes #74580

Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers
in requests to help with debugging issues. This change adds any tracing
headers into the thread name for the duration of the request, such
that this information can be visible in HotThreads or while using other
profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ).

Closes elastic#74580
@grcevski grcevski added :Core/Infra/Core Core issues without another label >enhancement >feature Team:Core/Infra Meta label for core/infra team v7.16.0 labels Sep 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@grcevski grcevski removed the >feature label Sep 15, 2021
@grcevski grcevski requested review from rjernst and ywelsch September 15, 2021 14:45
@grcevski grcevski changed the title Label thread pool names with tracing info Label thread names with tracing info Sep 15, 2021
@grcevski
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thought on testing. I don't have any strong opinions about whether modifying the thread name is ok, but I'm eager to hear @ywelsch's thoughts.

protected void doRun() {
// Assertions are errors, turn this into exception to make sure we get the error on failure
try {
assertTrue(Thread.currentThread().getName().contains("some_id"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assertThat(name, containsString("some_id")); would make any failures more readable and actionable, since we would see what failed to match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good idea! Thanks!

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not cover all relevant cases (comment in-line). I'm also not very fond of the thread name String manipulation.

Taking a step back, I wonder if what we're ultimately interested in is to somehow marry _tasks API output with _hot_threads output, i.e., have a notion of "actively executing tasks" in the system, which is reflected in both tasks and _hot_threads APIs (and possibly other places). X-Opaque-ID allows to somehow join up this information. If ultimately the more structured information is useful, perhaps an approach as I've outlined in #74580 (comment) might be a better starting point. I'm also becoming less convinced that jstack/JFR/MissionControl or VisualVM support via thread name is really useful unless these tools help visualize how the thread name / tracing id has changed over time (e.g. does Mission control visualize this somehow?). X-Opaque-ID is useful in the context of _hot_threads as it directly gives you an idea about expensive user operations (so should be as accurate as possible when it comes to the X-Opaque-ID).

}
}

public interface WithTracingIdsInThreadName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this interface? Aren't these just all simple static methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, these methods are effectively static methods. I used an interface with default methods to encapsulate the two 'static' methods for use only in the wrapped runnables, since they rely logically on the second undoing the naming of the first.

}

/**
* Wraps a Runnable to preserve the thread context.
*/
private class ContextPreservingRunnable implements WrappedRunnable {
private class ContextPreservingRunnable implements WrappedRunnable, WithTracingIdsInThreadName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if by just applying this to ContextPreservingRunnable and ContextPreservingAbstractRunnable we're missing some important places. Think for example about requests that are handled on the transport thread (using executor SAME), and which then set the headers using setHeaders (in particular the tracing ones). I also wonder about other nested calls of stashContext etc and whether there's a risk now that these would run / show under the wrong tracing id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yannik, this is exactly the sort of thing I wasn't sure about. Let me investigate the uses of the SAME executor and the transport thread to check if this works at all.

@grcevski
Copy link
Contributor Author

Taking a step back, I wonder if what we're ultimately interested in is to somehow marry _tasks API output with _hot_threads output, i.e., have a notion of "actively executing tasks" in the system, which is reflected in both tasks and _hot_threads APIs (and possibly other places). X-Opaque-ID allows to somehow join up this information. If ultimately the more structured information is useful, perhaps an approach as I've outlined in #74580 (comment) might be a better starting point. I'm also becoming less convinced that jstack/JFR/MissionControl or VisualVM support via thread name is really useful unless these tools help visualize how the thread name / tracing id has changed over time (e.g. does Mission control visualize this somehow?). X-Opaque-ID is useful in the context of _hot_threads as it directly gives you an idea about expensive user operations (so should be as accurate as possible when it comes to the X-Opaque-ID).

I agree that if multiple places in the code would benefit from having a notion of all currently active tasks/operations/threads, then having a shared common repository like a concurrent Map would be a better solution. We'll just need to oversize the map to avoid false contention for quick requests.

MissionControl or VisualVM don't track thread names over time, but with real-time monitoring the thread names will still be useful, or when folks do thread dumps within the tool. I think naming the threads will be useful for many 3rd party tools, those 2 are just some examples. Heapdump analysis tools will benefit, you could tell the resident set per specific request-id. Another example is jstack, in case Elasticsearch is in a bad state and cannot complete the hot_threads call, jstack will likely still be able to dump the thread stacks, along with the thread names.

@grcevski
Copy link
Contributor Author

Hi @ywelsch and @rjernst, I investigated the TransportService and it's uses of ThreadContext and it's true, there are few cases when we send a transport request or send a transport response that are not wrapped in one of the two context runnables, but simply run on the netty thread pool. The netty thread pool threads in these cases will not be labelled with any of the tracing ids.

If these cases with the TransportService are important to capture with tracing ids, then my solution is not workable as-is. Please let me know what you think, if this is a no-go, I think we should close this PR.

Wrapping only around the runnables is a very localized change, but it works only for the Elasticsearch thread pools. I guess it's independent of if we want to use a ConcurrentHashMap or name the threads, it's simple but doesn't cover exactly 100% of ThreadContext uses.

The alternative is to tap into the ThreadContext wrapping and unwrapping methods directly, using a ConcurrentHashMap rather than the thread name. I think @pgomulka already has a prototype for this approach.

@danhermann danhermann added v8.1.0 and removed v7.16.0 labels Oct 27, 2021
@arteam arteam removed the v8.0.0 label Jan 12, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @grcevski, I've created a changelog YAML for you.

@grcevski grcevski closed this Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement Team:Core/Infra Meta label for core/infra team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include tracing headers in hot_threads output?
7 participants