Search requests resource tracking framework #1179

tushar-kharbanda72 · 2021-08-31T08:27:57Z

Is your feature request related to a problem? Please describe.

#1042 aims to build back-pressure support for Search requests. This framework will act as a basic building block for building an effective search back-pressure mechanism.

Describe the solution you'd like

Build a resource tracking framework for search requests (queries), which tracks resource consumption on OpenSearch nodes, for various Search operations at different levels of granularity -

i. Individual Search Request (Rest) - On the Coordinator Node across phases (such as search and query phase) for end to end resource tracking from coordinator perspective.
ii. Shard Search Requests (Transport) - On the Data Node per phase, for discrete search task tracking.
iii. Shard level Aggregated View - Total consumption of resources mapped to every shard for searches on the node.
iv. Node level Aggregated View - Total consumption of resources for all search request on the node.

Characteristics:

Resources to track for each of the above proposed metrics
- Memory Consumption (JVM) : On the similar lines of #1009 we want to enable memory tracking.
- CPU Consumption : TBD
Frequent checkpointing and footprint update during the progress of a search phase: The resource tracking should continuously be done as the search request progresses, and need not necessarily wait for a particular phase to complete. This is important as a single phase execution can consume significant amount of resources itself. So, tracking resources within a phase itself as it progresses becomes important for these metrics to represent the accurate state on the node.
Data Node feedback to build Coordinator state : Have a capability for data node to piggyback on the response and send its current shard utilisation state to coordinator. This can later feed into coordinator state to take adaptive and short circuit routing decisions (covered as point 2 below).

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

tushar-kharbanda72 · 2021-11-25T13:48:00Z

Got time this week to think more on this. Sharing my high level thoughts about the solution I'm thinking:

If we want to have E2E tracking of a search requests resource utilisation we have 2 major things to handle:

Track all memory allocations when search request is executing (Work done on Search ThreadPool across different threads). Should work for all Shard Search phases and Search request (Coordinator) as well.
Track the response overhead (payload received from data nodes) while the request is in progress on coordinator. This overhead of serialising bytes and construct response object is taken on Transport Worker thread.

Tracking allocations on Search ThreadPool

Making TaskId of the request available in Thread Context so that even when the request is handled by different threads - they should have these details. This can be made a part of transient headers. This can be done from TaskManager when a Task is created and registered.
TaskResourceTracker: Create a TaskResourceTracker entity where all tasks are registered which needs resource tracking support. With each task additional optional meta data can be added like action type, shard/index name/id etc. Once the task is picked up by different threads those threads are then registered under the task for which they will be executing the work. And once the task completes that task will also get unregistered from TaskResourceTracker. So, it’ll hold a structure like Map<TaskInfo, List> - Implementation used would be of ConcurrentHashMap to support high throughput
ResourceTrackingRunnable: Decorate the runnable which will get executed on Search ThreadPool (can be added to other ThreadPools as well if required). The responsibility of this runnable would be to register the thread in TaskResourceTracker under a task id (which’ll be available in Thread Context). While registering the thread it’ll capture the heap allocations by the thread at that time and CPU time of the thread. Once the runnable execution completes/errors out then it’ll capture the heap allocations and CPU time again for the thread and send this update to TaskResourceTracker that this thread has finished processing.
ResourceWatcher: This will be a scheduled task which will be responsible to take snapshots at regular intervals (5 secs). What it’ll do is it’ll update the heap allocation and CPU time for each thread registered in TaskResourceTracker (not the ones which have finished). Once done then it’ll create views using that data. For eg: Memory allocations per node/shard/index/ThreadPool/actionType etc which ever required.

Without resource watcher we'll only get resource utilisation info once the task/thread completes which wouldn't be helpful as task is about to leave the node and don't need resources. Also, if there's a rogue query for which a phase is consuming a lot of resources and taking longer - We should know that which the phase is in progress so that if required a shortcutting decision can be made.

Tracking response overhead on coordinator

In InboundHandler while we’re constructing Java Object from bytes we’ll be doing allocations on heap. We track the overhead of creating the response object. At that time we don’t know the task id so we’ll map the overhead against the response object address and as soon as the thread context is restored and we have the thread id available we’ll move that response overhead to be tracked under that task only. And if there are any objects within the response which have delayed initialisation those will be constructed which being executed on Search ThreadPool so will get tracked automatically.

These metrics can exposed via stats API. Full details can be discussed once there’s alignment on this high level proposal.

These 2 should give us much needed visibility into resource utilisation due to Search requests on a node.

tushar-kharbanda72 · 2021-11-25T13:50:12Z

I need to run a performance benchmark to see if it hurts the performance under normal/high load. Will create a patch and should be able to get some results on this by next week.

asafm · 2022-03-02T15:06:22Z

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

tushar-kharbanda72 · 2022-03-23T19:15:31Z

This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue?

Thanks for showing interest in this feature @asafm . We're working on a Task resource tracking framework where you can get resource consumption for any task running on a cluster which addresses not only search requests but all sorts of requests we want to track. We're code complete and currently in review phase and trying to this feature out with OpenSearch 2.0 release.

First PR for initial frame: #2089

tushar-kharbanda72 · 2022-03-29T14:45:16Z

This PR completes the initial task resource tracking framework. Users can get insights into resource consumption of tasks running on the cluster by using list tasks API. List Tasks API refreshes the resource consumption info before returning response so that it is accurate.

Users can further use X-Opaque-Id to get insights into how much resources their queries are using on different nodes.

curl --location --request GET 'http://127.0.0.1:9200/_tasks?actions=*'

{
    "nodes": {
        "JsXVdDkXRAOg3m3v6NNhrA": {
            "tasks": {
                "JsXVdDkXRAOg3m3v6NNhrA:74": {
                    "node": "JsXVdDkXRAOg3m3v6NNhrA",
                    "id": 74,
                    "type": "direct",
                    "action": "indices:data/read/search[phase/query]",
                    "description": "shardId[[test-index][0]]",
                    "start_time_in_millis": 1648563530779,
                    "running_time_in_nanos": 3777544037,
                    "cancellable": true,
                    "parent_task_id": "JsXVdDkXRAOg3m3v6NNhrA:73",
                    "headers": {},
                    "resource_stats": {
                        "total": {
                            "cpu_time_in_nanos": 6429000,
                            "memory_in_bytes": 307424
                        }
                    }
                }
            }
        }
    }
}

tushar-kharbanda72 · 2022-03-29T14:50:05Z

The entity to serve out different aggregated views (shard/node level for different phases aggregated together) for rejections and cancellations are being kept out of scope for this issue and will be moved to the following issue - Server side Cancellation of in-flight search requests based on resource consumption #1181
For piggybacking the resource utilisation info per node - This will also be moved to a separate issue.

This current issue is now only for the Task resource tracking framework

dblock · 2022-04-25T18:48:45Z

The PR in #3046 was reverted.

alexahorgan · 2022-08-03T14:51:17Z

Demo feedback (8/3/22):

Outcome:
Approved, ship it.

Action Items/Follow up:

Partner with documentation for use cases for release, consider creating a series of blog posts.
Identify current priorities for future query experiences with a PM.

tushar-kharbanda72 added the enhancement Enhancement or improvement to existing feature or request label Aug 31, 2021

This was referenced Aug 31, 2021

Adaptive replica Selection - Factor in node resource utilisation #1183

Open

[Meta] BackPressure in the OpenSearch Query (Search) path #1042

Open

tushar-kharbanda72 mentioned this issue Dec 2, 2021

Search Task Resource Tracking PoC #1643

Closed

5 tasks

tushar-kharbanda72 mentioned this issue Jan 11, 2022

Add Task Id in thread context for Transport and Rest requests #1882

Closed

5 tasks

tushar-kharbanda72 mentioned this issue Mar 29, 2022

Support task resource tracking in OpenSearch #2639

Merged

5 tasks

tushar-kharbanda72 mentioned this issue Apr 8, 2022

Improvements in preserving task id in ThreadContext #2819

Open

ketanv3 mentioned this issue Jul 22, 2022

Support task resource tracking in OpenSearch #3982

Merged

5 tasks

This was referenced Nov 9, 2022

Add search backpressure cancellation at the coordinator level #5173

Closed

Search Query Runtime Cost Calculation #5174

Open

rishabhmaurya mentioned this issue Mar 3, 2023

[RFC] Performance metrics framework #6533

Open

ansjcy mentioned this issue Feb 8, 2024

[RFC] Real-time Insights into Top N Queries by Latency and Resource Usage #11186

Closed

ansjcy mentioned this issue Feb 20, 2024

[Query Insights] Capture query-level resource usage metrics #12399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search requests resource tracking framework #1179

Search requests resource tracking framework #1179

tushar-kharbanda72 commented Aug 31, 2021 •

edited

Loading

tushar-kharbanda72 commented Nov 25, 2021

tushar-kharbanda72 commented Nov 25, 2021

asafm commented Mar 2, 2022

tushar-kharbanda72 commented Mar 23, 2022

tushar-kharbanda72 commented Mar 29, 2022

tushar-kharbanda72 commented Mar 29, 2022

dblock commented Apr 25, 2022

alexahorgan commented Aug 3, 2022

Search requests resource tracking framework #1179

Search requests resource tracking framework #1179

Comments

tushar-kharbanda72 commented Aug 31, 2021 • edited Loading

tushar-kharbanda72 commented Nov 25, 2021

Tracking allocations on Search ThreadPool

Tracking response overhead on coordinator

tushar-kharbanda72 commented Nov 25, 2021

asafm commented Mar 2, 2022

tushar-kharbanda72 commented Mar 23, 2022

tushar-kharbanda72 commented Mar 29, 2022

tushar-kharbanda72 commented Mar 29, 2022

dblock commented Apr 25, 2022

alexahorgan commented Aug 3, 2022

tushar-kharbanda72 commented Aug 31, 2021 •

edited

Loading