-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include tracing headers in hot_threads output? #74580
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Thanks @jtibshirani for the write up. @pgomulka is going to spend a bit of time investigating how we would implement this, and we'll discuss at the next core/infra team meeting. |
I am not sure we can achieve this with current infrastructure. Accessing X-Opaque-ID from other threads would require a public access to ThreadLocal.getMap(Thread t) or Thread.threadLocals - which are package private to java.lang package. I am also not sure that would be a great design anyway to access foreign threads private parts.. @original-brownbear and @ywelsch Do you think this is achievable with |
You would also first have to get ahold of the |
Hi @ywelsch, I had a chat with @pgomulka on this today and I'm wondering if it would be possible to avoid creating a shared concurrent map by modifying the Thread names when we stash and restore the ThreadContext? Essentially, all we really need to do is show the X-Opaque-ID in the thread dump and we can use the Thread name field for it. Right now a thread name would be something like:
I'm new here so I may not understand the full scope of this kind of a change, but I can see some advantages if it was possible. We wouldn't have to run additional atomic operations of every request for an infrequent API call, HotThreads doesn't need to be modified and having the X-Opaque-ID in the thread name may come in handy during debugging, or while using other external tools like JFR/MissionControl or VisualVM. |
Note that Also, wouldn't the rename require you to parse the current thread name first, e.g. in order to find out what thread pool it belonged to (write/generic/...) so that you can emit the correct updated name? I agree that having the |
This is a very valid point, thanks for bringing it up. We would have to extract all In a sense, if we implemented this feature it would have to be taken with a grain of salt anyway. If there were 10, 50ms API calls on the same executor thread, for the default duration that the HotThreads sampler thread sleeps (500ms), we'd miss at least 9 of those Opaque-Ids with both solutions we have here. The only way to get 100% accurate would be via sliding time window id tracking in a list (per thread), with 1 second window, assuming nobody uses more than 500ms HotThreads interval. I think that kind of tracking would be very expensive inside ThreadContext. In short, I think with short running requests we'll always be misleading the users on the X-Opaque-Id. Perhaps the solution is to word the messaging around the Id in the report so that it speaks to this potential uncertainty?
My thinking here was to simply append the X-Opaque-Id to whatever the thread name already is. By default it's set by the pool, so we'd add to it. However, now that you mention this, I see a way to get around having to store/restore the original thread name in a ThreadLocal, by doing String.split(threadName, 'X-Opaque-Id') on the restore path and simply reverting the name to the first String in the resulting array. |
Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers in requests to help with debugging issues. This change adds any tracing headers into the thread name for the duration of the request, such that this information can be visible in HotThreads or while using other profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ). Closes elastic#74580
When searches are causing high CPU load, it's common to consult
hot_threads
to find the source of expensive searches. Working backwards to the source can be difficult, as it requires detailed knowledge of how searches are executed, often at the Lucene level.We support a special header
X-Opaque-Id
that allows clients to tag a request with context information. Kibana is currently working on passingX-Opaque-Id
on all search requests, to surface the application and component that initiated the request (elastic/kibana#101587). The header is already included in search slow logs and tasks output. Could we also add it tohot_threads
to help debug the source of expensive queries?Note: I'm not familiar with the
hot_threads
implementation and am not sure how feasible this actually is.The text was updated successfully, but these errors were encountered: