-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search requests resource tracking framework #1179
Comments
Got time this week to think more on this. Sharing my high level thoughts about the solution I'm thinking: If we want to have E2E tracking of a search requests resource utilisation we have 2 major things to handle:
Tracking allocations on Search ThreadPool
Tracking response overhead on coordinatorIn InboundHandler while we’re constructing Java Object from bytes we’ll be doing allocations on heap. We track the overhead of creating the response object. At that time we don’t know the task id so we’ll map the overhead against the response object address and as soon as the thread context is restored and we have the thread id available we’ll move that response overhead to be tracked under that task only. And if there are any objects within the response which have delayed initialisation those will be constructed which being executed on Search ThreadPool so will get tracked automatically. These metrics can exposed via stats API. Full details can be discussed once there’s alignment on this high level proposal. These 2 should give us much needed visibility into resource utilisation due to Search requests on a node. |
I need to run a performance benchmark to see if it hurts the performance under normal/high load. Will create a patch and should be able to get some results on this by next week. |
This is great works towards supplying observability of search requests, specifically per shard (once done). I was wondering what is the state of this issue? |
Thanks for showing interest in this feature @asafm . We're working on a Task resource tracking framework where you can get resource consumption for any task running on a cluster which addresses not only search requests but all sorts of requests we want to track. We're code complete and currently in review phase and trying to this feature out with OpenSearch 2.0 release. First PR for initial frame: #2089 |
This PR completes the initial task resource tracking framework. Users can get insights into resource consumption of tasks running on the cluster by using list tasks API. List Tasks API refreshes the resource consumption info before returning response so that it is accurate. Users can further use
|
This current issue is now only for the Task resource tracking framework |
The PR in #3046 was reverted. |
Demo feedback (8/3/22): Outcome: Action Items/Follow up:
|
Is your feature request related to a problem? Please describe.
#1042 aims to build back-pressure support for Search requests. This framework will act as a basic building block for building an effective search back-pressure mechanism.
Describe the solution you'd like
Build a resource tracking framework for search requests (queries), which tracks resource consumption on OpenSearch nodes, for various Search operations at different levels of granularity -
i. Individual Search Request (Rest) - On the Coordinator Node across phases (such as search and query phase) for end to end resource tracking from coordinator perspective.
ii. Shard Search Requests (Transport) - On the Data Node per phase, for discrete search task tracking.
iii. Shard level Aggregated View - Total consumption of resources mapped to every shard for searches on the node.
iv. Node level Aggregated View - Total consumption of resources for all search request on the node.
Characteristics:
Resources to track for each of the above proposed metrics
Frequent checkpointing and footprint update during the progress of a search phase: The resource tracking should continuously be done as the search request progresses, and need not necessarily wait for a particular phase to complete. This is important as a single phase execution can consume significant amount of resources itself. So, tracking resources within a phase itself as it progresses becomes important for these metrics to represent the accurate state on the node.
Data Node feedback to build Coordinator state : Have a capability for data node to piggyback on the response and send its current shard utilisation state to coordinator. This can later feed into coordinator state to take adaptive and short circuit routing decisions (covered as point 2 below).
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: