-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Backpressure in Search Path #1329
Comments
Thanks for bringing this up, @tushar-kharbanda72 , there are not doubts that applying back pressure on the search flows would lead to more stable clusters. There are quite a few successful implementations with respect to the proposal, fe Akka's Adaptive Load Balancing (https://doc.akka.io/docs/akka/current/cluster-metrics.html#adaptive-load-balancing), which to some extent addresses the same problem (could be good case study).
I assume the only way to claim that node uses all resources only for search is when it has a single Also, should the resource tracking framework consult the overall node load (memory/cpu at least)? The co-located deployments sadly are not that rare :(
I think in general it makes a lot of sense. But here is the counter example (seen in the wild): the query phase take a very long time, following the fetch phase. At that moment, the client has already lost any hope to get the response (may be even gone), but the server is still working on it, the live queries are going to be rejected because the fetch phase for this "dead" query is still ongoing. Would it make sense to take into account the overall age of the query (when kicking off the fetch phase) to weight its completion relevance to the client?
The approach(es) you have mentioned make sense. Probably it would be good to invest into full-fledges query planner which will assign costs to each query element and overall query at the end? May not be easy, for sure, but the iterative approach could be taken to refine it along the way. Thank you. |
Perhaps we can check if the client connection is still active, and if not, propagate it to the data nodes to cancel that query. |
|
This is a good proposal 👍. Resource consumption is great, but it's a lot of work, so I would in parallel consider simpler approaches.
|
Thanks @dblock for going through the proposal. We have the PRs related to task resource tracking in progress and that should complete soon. Targeting march for that. Soon we'll fast follow that up with initial low effort rejection strategies which you and others mentioned which we'll include as part of Milestone 2. Milestone 3 is a lot of work and we'll have to evaluate the need and criticality after we have initial improvements merged in. |
Adding Meta Issue and the child issues opened previously to this RFC in order to track deliverables: |
@tushar-kharbanda72 @getsaurabh02 since you have META issue to track the progress of this initiative. Could you close the RFC? and what is the target release for this project. |
Is this still on track for 2.2? |
This (milestone 2) will come in 2.3 - we are merging in the changes for resource tracking framework in 2.2 (milestone 1) |
Thanks @rramachand21! |
@rramachand21 is there an issue cut for milestone 2? I'd like to sync it up with the related documentation issue. |
@rramachand21 please provide a feature document for this so we can get it documented if it's still on track for 2.3. The docs issue is opensearch-project/documentation-website#795 Currently I'm blocked. |
@tushar-kharbanda72 is this still on track for 2.4 release? code freeze on 11/3 |
@rramachand21 is this on track for 2.4 release? code freeze today |
@rramachand21 Is there anything pending on this? If not, can we close this issue since 2.4 is already in production and update with latest version that you are tracking? |
@rramachand21 is this on track for 2.6.0 release? The code freeze for this is today (Feb 21, 2023) |
Hey @tushar-kharbanda72, is this feature still on track for 2.7? The code freeze date is April 17, 2023. |
@ketanv3 could you please update the latest on this? |
Hi @tushar-kharbanda72 , This issue will be marked for next-release |
Tagging it to next release: |
Introduction
BackPressure in Search Path aims to enhance the overall resiliency of OpenSearch. The current protection mechanisms on OS nodes such as ThreadPoolQueueSize and CircuitBreakers are not fully efficient to protect the cluster against traffic surge, partial failures, slow node or a single rogue (resource-guzzling) query. Search BackPressure aims to introduce constructs to have fair rejections, minimal wastage of useful work done, search request cost estimation and ability to stabilise the cluster when under duress.
Problem Statement
In OpenSearch, for Search requests there are few gating mechanisms to prevent a node from meltdown when under a duress situation. These are essentially queue rejections and circuit breakers. However, these gating mechanism are all static in limits, take local decisions, and are often too late to act upon due to configuration issues.
Problem with existing constructs
Other problems which OS users can run into
Proposed Solution
Summary
We propose to introduce BackPressure framework in the search path to address the above concerns and improve the overall resiliency of the OpenSearch cluster. For this we’ll introduce resource tracking framework for end to end memory/cpu tracking of Search request, covering different phases of execution across all shard requests. Adaptive replica selection will consider these resource metrics from different nodes, while ranking the nodes to reduce requests on target nodes already running into resource contention issues. Framework will also use the same resource utilisation metrics to take decisions whether to reject or cancel an in-flight request, with added fairness. Task level prioritizations based on current state of request can be achieved, such as preferring requests already in fetch phase over those yet to be picked for query phase. This will help stabilising the node under duress by completing requests faster which has some useful work already done. In addition to this, framework would have a Search request cost-estimation model to support a pro-active form of backpressure, which takes decision on incoming requests/tasks, if a node will be able to serve the query of certain cost or not. We have divided this proposal into 3 milestones as follows:
Milestone 1
Goals
1.1 Resource Tracking Framework
Build a resource tracking framework for Search requests at both Rest level (Client requests) and Transport level (Shard search requests), which tracks resource consumption on OpenSearch nodes, for various search operations at different level of granularity:
Individual Search Request (Rest) - On the Coordinator Node across phases (such as query and fetch phase) for end to end resource tracking from coordinator perspective.
Shard Search Requests (Transport) - On the Data Node per phase, for discrete search task tracking.
Shard level Aggregated View - Total consumption of resources mapped to every shard for searches on the node.
Node level Aggregated View - Total consumption of resources for all search request on the node.
Characteristics:
1.2 Adaptive replica Selection - Factor in node resource utilisation
Currently, for Adaptive replica selection the tasks in search thread-pool queue are factored in while coming up with the ranking for the target shards. Due to the fact that cost of each search request can vary significantly the queue count doesn’t accurately tells what more resources the node have to accept and complete more search requests. To be more accurate on this we can factor in the resource utilisation for search requests on these nodes, and take better routing decisions on the coordinator.
Milestone 2
Goals
2.1 Server Side rejection of in-coming search requests
Currently, Search rejections are solely based on the number of tasks in queue for Search ThreadPool. That doesn’t provide fairness in rejections as multiple smaller queries can exhaust this limit but are not resource intensive and the node can take in much more requests and vice versa. Essentially count is not the reflection of actual work.
Hence based on metrics in point 1.1 above, we want to build a frame which can perform more informed rejections based on point in time resource utilisation. The new model will take the admission decision for search requests on the node. These admission decisions or rejection limits can have different levels to it:
This can be further evolved to support Shard level priority model, where user can set priority on an index or every request, so that framework can consume them for taking admission/rejection decisions.
If user has configured partial results to be true, then upon these rejections and Coordinator’s inability to retry the request on another shard on a different node might result in user’s getting partial response.
The above will provide the required isolation of accounting and fairness in the rejections which is currently not there. This is still a reactive back-pressure mechanism as it only focusses on the current consumption and does not estimate the future work which is to be done for these search requests.
2.2 Server side Cancellation of in-flight search requests based on resource consumption
This is the 3rd level which kicks in after we’re cancelling all search request coming to a node. Here, we take decision to
cancel on-going requests, If the resource limits for that shard/node have started breaching the assigned limits (point 2.1), and there is no recovery seen for a certain time threshold. The BackPressure model should support identification of queries which are most resource guzzling with minimal wasteful work. These can then be cancelled for recovering a node under load and continue doing useful work.
Milestone 3
Goals
3.1 Query resource consumption estimation
Improve the framework added as part of (point 2) to also estimate the cost of new search queries, based on their potential resource utilisation. It can be achieved by looking at the past query patterns, with similar query constructs and the actual data on shard. This will help in building a pro-active back-pressure model for search requests, where estimates will be compared against the available resources during the admission decision for granular control.
The text was updated successfully, but these errors were encountered: