-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Implement parallel execution of sub-queries for hybrid search #279
Comments
This is not going to be ready for 2.14 (cc: @VijayanB ) moving to 2.15 |
Hybrid Query Hot SpotsWhile closely following execution of Hybrid Query, we found out that in three places (listed below), independent sub-queries are being executed synchronously. We could introduce multi threading inside our abstraction to parallelize these execution without making changes to existing api to improve query latency. Query re-writeQuery re-write is performed as first step during search to improve the search quality. HybridQuery performs rewrite by individually calling re-write api on every query sequentially, and, this will be performed in a loop, till no-more rewrite is possible. However, query re-write doesn’t just re-writes existing query object. For example, Lucene knn, during re-write, performs approximate search and build TopDocs, whereas, other Query can just convert into internal representation of Query without performing actual lucene search. Create Scorer by Weight for every SegmentsOnce Query re-write is successful, Weight is created from Query. As mentioned earlier, Weight is required to store state of the Query. For given Weight, to perform search, we need to create an instance of Scorer for every segments to iterate documents inside segments and score accordingly. HybridQueryWeight, internally , iterates weights from sub-queries sequentially, to build HybridQueryScorer, which provides an abstraction over list of QueryScorer from its sub-queries. The cost of creating Scorer instance can vary and it is depends on type of query. For example, TermWeight, before creating scorer, from list of unique terms that are created during index, it will first navigate to the term value that is specified in the query, and, then it provides posting list iterator to TermScorer. Similarly, KNNWeight before creating KNNScorer, it performs approximate/exact search for given segments, and, provides list of KnnQueryResult as iterator to KNNScorer. Hence, creating scorer from Weight can be expensive depends on number of segments, size of segment and type of Query. Calculate Score by Scorer for every matched documentsOnce HybridScorer is created, search calls collector manager to collect doc id. HybridCollectorManager, calls hybridscores method to get list of scores from every subquery by calling individual subquery scorer’s score api. Like above, here, hybridscores api calls score api sequentially from scorer’s method. The cost of this score method depends on scorer. For example, BM25Similarity scorer, calculates score when calling during score api, whereas, Lucene Knn or Native Knn just returns the score that was previously calculated during query re-write and scorer creation. However, converting hybridscore implementation from single thread to multi thread improves latency of some query where score calculation is expensive actually at the time of score api. Tasks
|
Closing this out since changes are merged |
Is your feature request related to a problem?
Currently individual queries of Hybrid query (aka sub-queries) are executed sequentially (Initial implementation that is done under #123). That may be sub-optimal as system may waste time on waiting.
Obvious approach for such execution is to run every sub-query execution using parallel thread and wait for results from all thread, this will be at shard level. Results then will be combined and sent to coordinator node all at once, as it's done today.
What solution would you like?
Sub-queries may be executed in parallel. Actual improvement must be verified by running benchmarks on sequential and parallel approaches.
The text was updated successfully, but these errors were encountered: