-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Provide capability for not adding top docs collector in the query search path #13170
Comments
@martin-gaievski I don't think we need to modify |
@reta if we modify the
On this I would suggest we can have a function returns the topDocsCollector instance, rather than a flag which says topdocscollector should be added or not. |
@navneet1v we may have an overloaded version:
The current |
@reta when you say we may have an overload version, are you saying we already have it? or we can build such a version? Little confused here. I think its the later one right? |
@navneet1v sorry, I should have been more clear: we could add the overloaded version (no flags or functions). |
Thanks. I myself don't agree with flags. But we can do override functions. :D Thanks @reta for helping out here. @martin-gaievski can we start making the change. |
It would be great if you folks can confirm my understanding of the approach with new overloaded function with additional param
|
:+1
That is correct,
I think no, the implementation would follow this chain: |
@martin-gaievski i know you did some deep-dive on the impact of having topdocs collector and hybrid search collector both running. Can you paste the flame graph and benchmark results here for visibility. |
Sure Navneet, after we've implemented one optimization on plugin side I got following numbers :
Based on profiling info (flamegraphs are attached) 45 to 80% percent of CPU time taken by the TopDocsCollector related methods, mainly by For reference: benchmark is done for 2.13 using noaa OSB workload. Following search queries used for benchmark Bool queries:
equivalent hybrid queres are:
|
Thanks martin for providing the details. Let start working on the POC to remove the topdocsCollector and see how much improvement we can get here. |
Is there a scope of making this work little more generic? I mean like to disable or enable any collector? |
@vibrantvarun I don't think the idea is to enable or disable any collector, instead the |
Is your feature request related to a problem? Please describe
Right now as part of search code path both Default and Concurrent QueryPhaseSearchers are adding TopDocsCollector as first collector in the chain (code ref1 DefaultQueryPhaseSearcher, code ref2 ConcurrentQueryPhaseSearcher).
In case of certain types of extensions that may lead to a performance degradation of search. In particular that's true if custom collector and collector manager are added and those collectors are used in a way that scores are collected with a custom logic. In such case some compute (like filtering) will be done twice and depending on the extension logic that can be a thrown away work.
Example of such extension is hybrid query. Please check collector manager and collector added in neural-search plugin.
Describe the solution you'd like
I suggest core add an interface or method in existing QueryPhaseSearcher interface (or public core class QueryPhaseSearcherWrapper) that allows extension or plugin to set a flag and depending on that flag the top docs collector will be added or skipped. Default implementation should allow adding of topdocscollector so system is backward compatible.
Related component
Search
Describe alternatives you've considered
Right now there isn't much of alternatives, custom collector is executed after the top docs collector. This leads to performance degradation of search queries, that has been submitted to us by multiple customers.
Additional context
Custom query collector and collector manager were added as part of the neural-search plugin in 2.13 release (original PR)
The text was updated successfully, but these errors were encountered: