-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Multiple QueryPhaseSearcher in OpenSearch #7020
Comments
@navneet1v there are probably few things to mention:
|
@reta Thanks for the response. I understand that we can compose queryphase searcher using delegation / composition but that new implementation should be registered with SearchModule(here) so that QueryPhase class can use new QueryPhaseSearcher here. Please correct me if my understanding is wrong. @nknize to add more. |
Oh I see, you basically mean if |
@navneet1v Does that mean each of the |
@sohami No, that is not what I am saying. This is what will be done if we use the solution provided by @reta. @reta Correct me if I am wrong. My proposal is at search level provide capability for customer to choose what type of searcher they want. Just like they do for codec.
This can be one solution. But I believe the whole purpose of QueryPhaseSearcher is to run the queries and get the docIds. The way that search needs to be performed is what QueryPhaseSearcher do. Like Concurrent QueryPhaseSearch runs search parallelly on segments, for Normalization and Score combination feature, I want to remove all the default DocsCollector and provide the one collector which collects documents separately for all the queries. |
Thanks @navneet1v , to be fair it is hard (for me) to understand what exactly you are going to do and why you need multiple |
hey @reta, we do have RFC for high level approach for normalization feature that requires QueryPhaseSearcher: opensearch-project/neural-search#126, it's specifically mentioned under "Obtaining Relevant Information for Normalization and score Combination" section. We've done a POC that drafts described approach, code is here https://github.com/navneet1v/neural-search/tree/normalization-poc, it has custom implementation of QueryPhaseSearcher Concurrent search has been moved recently from sandbox to general core (#7203), that makes this issue more relevant as currently if concurrent search is enabled then plugins cannot register custom implementation of QueryPhaseSearcher. |
Thanks @martin-gaievski
I think this is totally fine and was always possible, but do you need multiple |
If you mean that we need both for exact same query execution - I think we need it (@navneet1v please verify this). There can be some logic that chooses which one of registered searchers to apply. At the first glance looks like searcher for Normalization feature is not compatible with concurrent searcher: Normalization uses it's own low level classes like doc collector etc. Also Normalization searcher is specific to only one query type, for everything else system can apply concurrent searcher. The problem is that without searcher we cannot have alternative doc collector, as searcher orchestrates collection of docs. Mechanism for selection between searchers in runtime is missing (for instance, it can be selection based on the query type, something similar to how plugins are registering different queries by a clause name under "query" tag, it's a collection of supported queries). I see feature flag is used for concurrent searcher, but that is all or nothing approach, for one cluster customer cannot have two searchers. And after concurrent searcher became part of the core it's not longer possible to register another searcher. |
When I am looking at this code change to enable Concurrent Searcher, I see 2 conditions:
In normalization case, 1 will never be true. Hence Concurrent will never be picked. @reta please let me know if this understanding is wrong. This is based on the code I am seeing. |
I see, that will allow plugins to take priority with custom searcher. Just one issue I do see, if we register customer searcher from plugin then concurrent searcher will skip execution, even if feature flag is enabled. Something we can do in plugin is to add check for the feature flag and register custom searcher depending on the flag value. |
That is correct, only one
Oh, I see now what @navneet1v meant by "multiple QueryPhaseSearcher in OpenSearch": we basically will use the single |
Yes that is correct. :) I was thinking if we can make some dynamic settings that can help us pick what should be the QueryPhaseSearcher. I am open to options. |
Gotcha, my apologies for confusion, for some reason I thought about composing multiple |
I am glad we cleared this out. So, is there something that we can do or propose here to have a working solution around this. |
Sure, we also have the notion of search processors now, that could be useful in solving this particular problem (hinting what |
@martin-gaievski lets look at what options do we have here and what can be potential solution. |
@navneet1v @martin-gaievski @reta - It seems like the use-case here is to control at request level what gets executed and skip the execution mechanism in core which is performed in I am not sure if |
@sohami In our use case it's ok to apply new collector to only one query type, this is actually how we did it in the POC, for everything that isn't our custom query type execute core logic. I've checked the code refs you've provided, seems that phases are predefined and cannot be added or changed from plugin. I was thinking about something like fetch sub-phase that are registered in search module like here. |
Restarting the conversation on this thread. We need put some changes on OpenSearch core so that an index level QueryPhaseSearcher can be provided and core can deicide which QueryPhaseSearcher to use. This is similar what we have done for Enabling the Concurrent Segment Search as an Index Level and cluster level setting. OpenSearch/server/src/main/java/org/opensearch/search/query/QueryPhaseSearcherWrapper.java Lines 34 to 39 in e5c4f9d
|
Is your feature request related to a problem? Please describe.
As part of Enabling Normalization and Score combination Feature in OpenSearch(RFC: opensearch-project/neural-search#126), we need to implement a new QueryPhaseSearcher(see section: Obtaining Relevant Information for Normalization and score Combination in the RFC). But currently OpenSearch can have only a single QueryPhaseSearcher implementation. If more than one is provided it breaks the bootstrap of OpenSearch.
Concurrent Search creates a new QueryPhaseSearcher and once the Concurrent Search plugin is moved from sandbox to main distribution the runtime exceptions will start to come.
Describe the solution you'd like
On high level I was thinking if we can do something we did for having multiple Codec in OpenSearch at index level, where we added the CodecServiceFactory interface via EnginePlugin to provide plugins to provide their own Codec Service via CodecServiceFactory implementation.
I can come up with a proposal for this, but wanted to check if this enhancement is already been worked upon.
Describe alternatives you've considered
There is no alternative I can think of.
Additional context
This is more of a proactive reach-out to find out a solution rather than doing it when we have the problem. I can see both Concurrent Search and Normalization Features are in development.
Concurrent Search: #2587
Normalization and Score Combination: opensearch-project/neural-search#123
cc: @nknize, @reta, @martin-gaievski , @vamshin , @sohami
The text was updated successfully, but these errors were encountered: