Include tracing headers in hot_threads output? #74580

jtibshirani · 2021-06-24T20:37:23Z

When searches are causing high CPU load, it's common to consult hot_threads to find the source of expensive searches. Working backwards to the source can be difficult, as it requires detailed knowledge of how searches are executed, often at the Lucene level.

We support a special header X-Opaque-Id that allows clients to tag a request with context information. Kibana is currently working on passing X-Opaque-Id on all search requests, to surface the application and component that initiated the request (elastic/kibana#101587). The header is already included in search slow logs and tasks output. Could we also add it to hot_threads to help debug the source of expensive queries?

Note: I'm not familiar with the hot_threads implementation and am not sure how feasible this actually is.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-24T20:37:25Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

rdnm · 2021-07-01T07:44:15Z

Thanks @jtibshirani for the write up. @pgomulka is going to spend a bit of time investigating how we would implement this, and we'll discuss at the next core/infra team meeting.

pgomulka · 2021-07-08T10:10:46Z

I am not sure we can achieve this with current infrastructure. Accessing X-Opaque-ID from other threads would require a public access to ThreadLocal.getMap(Thread t) or Thread.threadLocals - which are package private to java.lang package. I am also not sure that would be a great design anyway to access foreign threads private parts..

@original-brownbear and @ywelsch Do you think this is achievable with ThreadContext? Or maybe you think there might be a totally different approach to this?

ywelsch · 2021-07-08T12:41:09Z

You would also first have to get ahold of the ThreadContext associated with the thread. And you would have to get ahold of the Thread object, as HotThreads is only operating on ThreadInfo, not Thread. From a quick look, the simplest way to implement something along the lines of what @jtibshirani asked would be to create a static concurrent map in ThreadContext, keyed by thread id (long Thread.getId()), update it whenever ThreadContext.threadLocal is updated, and then get the relevant information in HotThreads by doing a lookup on this map using ThreadInfo.getThreadId.

grcevski · 2021-09-07T21:00:05Z

Hi @ywelsch, I had a chat with @pgomulka on this today and I'm wondering if it would be possible to avoid creating a shared concurrent map by modifying the Thread names when we stash and restore the ThreadContext? Essentially, all we really need to do is show the X-Opaque-ID in the thread dump and we can use the Thread name field for it. Right now a thread name would be something like: elasticsearch[elastic][generic][T#2], which comes from the name of the thread pool I think, so the idea is to:

Save the original thread name on context stash into a thread local
Call Thread.currentThread().setName() and rename the current worker thread name to elasticsearch[elastic][generic][T#2]: X-Opaque-ID: 123
Put back the original name on context restore

I'm new here so I may not understand the full scope of this kind of a change, but I can see some advantages if it was possible. We wouldn't have to run additional atomic operations of every request for an infrequent API call, HotThreads doesn't need to be modified and having the X-Opaque-ID in the thread name may come in handy during debugging, or while using other external tools like JFR/MissionControl or VisualVM.

ywelsch · 2021-09-08T11:32:25Z

Note that hot_threads is more than just a thread dump. It takes multiple samples for each thread in order to identify common parts in the stack traces over a period of time for a given thread. HotThreads currently assumes that the thread name is stable, just picking the name from one sample. If X-Opaque-ID is encoded into the thread name, and it changes from sample to sample, couldn't this now mislead users into thinking that the thread had spent all this effort in the context of a given opaque id whereas this was only applicable to a (small) subset of the samples?

Also, wouldn't the rename require you to parse the current thread name first, e.g. in order to find out what thread pool it belonged to (write/generic/...) so that you can emit the correct updated name?

I agree that having the X-Opaque-ID in the thread name for debugging / external tools could be handy as well (unless these tools operate on multiple samples as well and assume a stable thread name to identify identical threads). I'm not sure what a good practice here is. I don't hold a strong opinion on this either way.

grcevski · 2021-09-08T17:03:37Z

Note that hot_threads is more than just a thread dump. It takes multiple samples for each thread in order to identify common parts in the stack traces over a period of time for a given thread. HotThreads currently assumes that the thread name is stable, just picking the name from one sample. If X-Opaque-ID is encoded into the thread name, and it changes from sample to sample, couldn't this now mislead users into thinking that the thread had spent all this effort in the context of a given opaque id whereas this was only applicable to a (small) subset of the samples?

This is a very valid point, thanks for bringing it up. We would have to extract all X-Opaque-Ids as we capture the thread stacks from the thread name and clean-up the opaque id from the thread name when we report it. I think it's similar to what we would need to do if we used a map, but with less String manipulation and clean-up.

In a sense, if we implemented this feature it would have to be taken with a grain of salt anyway. If there were 10, 50ms API calls on the same executor thread, for the default duration that the HotThreads sampler thread sleeps (500ms), we'd miss at least 9 of those Opaque-Ids with both solutions we have here. The only way to get 100% accurate would be via sliding time window id tracking in a list (per thread), with 1 second window, assuming nobody uses more than 500ms HotThreads interval. I think that kind of tracking would be very expensive inside ThreadContext.

In short, I think with short running requests we'll always be misleading the users on the X-Opaque-Id. Perhaps the solution is to word the messaging around the Id in the report so that it speaks to this potential uncertainty?

Also, wouldn't the rename require you to parse the current thread name first, e.g. in order to find out what thread pool it belonged to (write/generic/...) so that you can emit the correct updated name?

My thinking here was to simply append the X-Opaque-Id to whatever the thread name already is. By default it's set by the pool, so we'd add to it. However, now that you mention this, I see a way to get around having to store/restore the original thread name in a ThreadLocal, by doing String.split(threadName, 'X-Opaque-Id') on the restore path and simply reverting the name to the first String in the resulting array.

Elasticsearch clients can supply X-Opaque-Id or trace.id tracing headers in requests to help with debugging issues. This change adds any tracing headers into the thread name for the duration of the request, such that this information can be visible in HotThreads or while using other profiling tools (e.g. JFR/Mission Control, VisualVM, etc. ). Closes elastic#74580

jtibshirani added >enhancement :Core/Infra/Core Core issues without another label labels Jun 24, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 24, 2021

jtibshirani mentioned this issue Jun 24, 2021

Add Per thread memory allocations in hot_threads API #70345

Closed

rdnm added the team-discuss label Jun 30, 2021

rdnm assigned pgomulka Jul 1, 2021

grcevski mentioned this issue Sep 10, 2021

Add tracing id's in thread name for wrapped runnable #77587

Closed

grcevski mentioned this issue Sep 15, 2021

Label thread names with tracing info #77791

Closed

grcevski mentioned this issue Jan 20, 2022

Include index and pipeline names in write hot threads #82834

Open

rdnm removed the team-discuss label Jun 8, 2022

rjernst unassigned pgomulka Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include tracing headers in hot_threads output? #74580

Include tracing headers in hot_threads output? #74580

jtibshirani commented Jun 24, 2021 •

edited

Loading

elasticmachine commented Jun 24, 2021

rdnm commented Jul 1, 2021

pgomulka commented Jul 8, 2021

ywelsch commented Jul 8, 2021

grcevski commented Sep 7, 2021

ywelsch commented Sep 8, 2021

grcevski commented Sep 8, 2021

Include tracing headers in hot_threads output? #74580

Include tracing headers in hot_threads output? #74580

Comments

jtibshirani commented Jun 24, 2021 • edited Loading

elasticmachine commented Jun 24, 2021

rdnm commented Jul 1, 2021

pgomulka commented Jul 8, 2021

ywelsch commented Jul 8, 2021

grcevski commented Sep 7, 2021

ywelsch commented Sep 8, 2021

grcevski commented Sep 8, 2021

jtibshirani commented Jun 24, 2021 •

edited

Loading