-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add rescoreContext for neuralsparse query's two-phase speed up. #695
Conversation
Signed-off-by: conggguan <[email protected]>
7e840a2
to
746458d
Compare
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
7783b41
to
8b12f42
Compare
Signed-off-by: conggguan <[email protected]>
8b12f42
to
feb9606
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My big concern is high number of integ tests with model deployment, the rest is less critical
src/main/java/org/opensearch/neuralsearch/search/util/NeuralSparseTwoPhaseUtil.java
Outdated
Show resolved
Hide resolved
if (query2weight.isEmpty()) { | ||
return; | ||
} else if (query2weight.size() == 1) { | ||
Map.Entry<Query, Float> entry = query2weight.entrySet().stream().findFirst().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entrySet().iterator().next()
should be more optimal. I just has fixed a performance bottleneck in hybrid query caused by Stream.findFirst()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also you don't need else
, can start from if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the extra else,
replaced the findFirst.
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQuery.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseTwoPhaseParameters.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseTwoPhaseParameters.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/search/util/NeuralSparseTwoPhaseUtil.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/query/NeuralSparseQueryIT.java
Outdated
Show resolved
Hide resolved
…eryToken, add some comments, replace some String with String.format. Signed-off-by: conggguan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets ensure that all BWC tests are successful before this change can be merged.
/** | ||
* Use this setting to manage if a neuralSparseQuery build a two-phase query of not. | ||
*/ | ||
public static final Setting<Boolean> NEURAL_SPARSE_TWO_PHASE_DEFAULT_ENABLED = Setting.boolSetting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these settings are at node scope? This will make the settings available for all the indices in the cluster. The setting should at the index level as for different indices user may choose to set or unset a particular value for a setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for setting it as a node-scope setting is that we believe the feature's settings are relatively stable. In our initial investigations, we observed great performance and accuracy with the default settings. We also think that significant differences in two-phase parameters might cause variations in accuracy across different indexes. Therefore, we made this a node-scope setting.
Additionally, for customization in a single query, we provide a one-time two-phase parameter that allows users to temporarily override this setting in an individual query."
{
"query": {
"neural_sparse":{
"VECTOR_FIELD":
{
"model_id": "model_id",
"query_text": "abcd",
"two_phase_settings":
{
"enabled": true
"window_size_expansion": 5,
"pruning_ratio": 0.4,
}
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the settings are stable and we already provide user a way to set the settings in the query itself then we don't need to explicitly provide setting at cluster level because the way I look at this is these settings are of no use. They just taking space in cluster state. Hence I would recommend not having the settings. Plus more I look at this, these are not setting these are more of Neural Sparse query attributes. Hence, we should just remove the settings completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The settings is stable for a concrete sparse model. If the sparse model is trained with thresholding in loss function, they can have different hyperparameters to work with. While the default settings (5, 0.4) works best for our sparse models.
Let's make the decision for keeping these settings based on pros and cons:
pros:
- make it more convinient for users with customized sparse models
- make it more convinient to disable the 2-phase search, there are use cases to use OpenSearch to benchmark the sparse model. For these cases we need accurate search relevance scores
cons:
- having 3 more cluster settings, taking dozens of bytes in memory
Many settings in OpenSearch are rarely changed. Most users won't feel this settings, but for users who need the settings, it is very important to be able to configure them. I don't think this is a critical for performance, hence one single embedding vector will take much more space than these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are defining these settings based on the pros you mentioned then these settings are much more suited at index level and not at cluster level. Because if I have 2 indices using 2 different sparse models, then this setting will be of use. But if we make them at index level then atleast they are useful as I can change setting for 1 index.
Moreover if you think little hard on that point, more than 1 model can be used in 2 different sparse fields on same index. In that case not even the index settings be useful. Hence if you see the settings are not very useful in that perspective because for most of the cases default works(as mentioned by @conggguan ) and for cases where default doesn't work user can update the query.
cons:
having 3 more cluster settings, taking dozens of bytes in memory
Replicate this dozen of bytes over more than 100 nodes then you will see the impact. Nevertheless the point for that comment was, if the settings are not useful as I mentioned above then just adding them in cluster setting is just waste of resources. I cannot comment on other settings which customer don't change and still present in Opensearch. But all I will say is if something is not correct in current code then that is not a justification to follow the same wrong pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a setting is supposed to be an index setting and we making it as part of cluster settings just on the name of performance, do we know how much performance boost we are getting here ? Plus index settings are also stored on the node. So, when settings change you might get hit for 1 query but other queries will be fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a setting is supposed to be an index setting and we making it as part of cluster settings just on the name of performance, do we know how much performance boost we are getting here?
The performance impact is very straightforward in code so we haven't benchmark it. For cluster settings, we just do value assignment; and for index settings, we need to iterate the environment settings. And this will happen for every neural sparse query.
Plus index settings are also stored on the node. So, when settings change you might get hit for 1 query but other queries will be fast.
Do you mean we cache these settings for each index? We need to implement another complexed controller module to manage the cache for each index and do update even manage their lifecycle with index create/delete. I think this is completely overdesigned. You just mentioned the overhead to add one setting, implmenting index settings will make this overhead much larger because in every index we have these settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we cache these settings for each index? We need to implement another complexed controller module to manage the cache for each index and do update even manage their lifecycle with index create/delete. I think this is completely overdesigned. You just mentioned the overhead to add one setting, implmenting index settings will make this overhead much larger because in every index we have these settings.
I am saying they are already cached at the node level. You don't need to cache it. So any time there is change in settings(index/cluster settings) for any index those settings are pushed to all the data nodes.
For cluster settings, we just do value assignment; and for index settings, we need to iterate the environment settings. And this will happen for every neural sparse query.
Not sure what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I get your point, I checked the settings API and found I misunderstand how it works.
So you put +1 to index settings now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to indexscope.
searchContext.addRescore(rescoreContext); | ||
} | ||
|
||
private static float populateQueryWeightsMapAndGetWindowSizeExpansion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needs to be split into 2 parts.
- Where are are just changing the rescore query/recore context.
- Where we are changing the actual Query clause provided by user to set what should be the currentQuery that needs to be executed.
We should also add some warnings which should tell us that query clauses was not boost, bool or neuralsparse hence this two phase optimization is not applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For First comments, I refactor the function addRescoreContextFromNeuralSparseQuery. You can take a look for further optimization.
For the warn logs, I think, If we want to detect all of compound queries that if there has a neuralSparseQuery which couldn't be speed up, maybe it cost some performance loss, such as when a hybrid query, constant score query, dismax query of a filter boolean query come in, we must to traversal all the sub queries of them and make a warn log.
So I think maybe make a illustration in OpenSearch Document is better for neural-search plugin's performance and log's simplicity.
If you think there is a very strong need to add warn log ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only consider the supported compound queires, we can early exit for unsupported cases.
However, if we want to log warn as you mentioned, we need to go through to the end for every compound queries.
Because here is the common entry for all queries, if users are not using semantic search and need to do many complexed compound queries, going through all of them will have performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.
Moreover now this check is going to happen for every bool query(simple or complex), it has a bigger blast radius and can impact performance of every user query. As the call is recursive so it will iterate over every query clause if it is bool.
Another point, if we are not adding any log around whether twoPhaseOptimization got added or not, how we are expecting admin/operators to Opensearch to figure out whether for a given query optimization worked or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.
This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.
With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.
But give an warning log for every query has large impact for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.
This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.
With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.
But give an warning log for every query has large impact for performance.
So in my message there 2 points. Seems like you read only one.
Here is the thing:
- The code is not going to do any early exit if there is bool clause every where. I am hoping that is clear.
- Comparing the latencies of 2 different queries is not an acceptable user explanation here why the queries didn't speed up. So if want operators to understand why a speed up didn't happen we better figure out a way. Otherwise there can be a flurry of questions around why the query is not speeding up. Thats where helpful log message are useful. If there is a better way than log I would suggest to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in my message there 2 points. Seems like you read only one.
I gave 2 response for the 2 points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reply for both of them is added. Do you agree with point 1 or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it clear for what's the blocker issue now?
If the blocker issue is the performance impact to iterate the queries:
We already have query visitors in OpenSearch and in neural-search (NeuralSearchQueryVisitor)? In these query visitors we go through the full query, too. And we don't see any call out from developers and customers about the performance impact about them.
If the blocker issue is the warn log:
We can use search profile API to get the finalized lucene query. For 2-phase query, we can find NeuralSparseQuery.currentQuery equals NeuralSparseQuery.highScoreTokenQuery. And when we don't use 2-phase, the NeuralSparseQuery.currentQuery is NeuralSparseQuery.highScoreTokenQuery + NeuralSparseQuery.lowScoreTokenQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.
This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.
With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.
But give an warning log for every query has large impact for performance.So in my message there 2 points. Seems like you read only one.
Here is the thing:
1. The code is not going to do any early exit if there is bool clause every where. I am hoping that is clear. 2. Comparing the latencies of 2 different queries is not an acceptable user explanation here why the queries didn't speed up. So if want operators to understand why a speed up didn't happen we better figure out a way. Otherwise there can be a flurry of questions around why the query is not speeding up. Thats where helpful log message are useful. If there is a better way than log I would suggest to do that.
For 1. I think if a deep compound query that starts with a boolean's filter/must/must not , hybrid or dis_max or constant score, there is a early exit existing, to reduce the influence for unsupported compound query.
For 2. I think explain in documentation is a enough way for user, maybe lots of warn log are little extra for user, and I think, different perform time diff in different compound query are indeed exist and normal, such as, a must clause's query are more fast than a should, because Lucene make la lot of optimization work, executing only a fraction of what they need to in different query types, I think that's one of the things that optimization is all about, finding logic that can be avoided and bypassing them as much as possible, but optimization isn't a functional change, and generating a large amount of warn logs can be a little bit of a problem if we're just avoiding some optimization operation due to a system feature rather than a system bug and thus give the impression that neural sparse is not a reliable or mature new feature. If we had to do that, we have to increase the potential burden on the searcher, I think it would be better to delay this change until the RFC I mentioned in the open search core is implemented?
src/test/java/org/opensearch/neuralsearch/query/NeuralSparseQueryIT.java
Outdated
Show resolved
Hide resolved
28e3539
to
48fbe0a
Compare
48fbe0a
to
3a9da75
Compare
Signed-off-by: conggguan <[email protected]>
Some BWC test failed because of there is a version check of stream. // stream in |
cdfa4ca
to
047382f
Compare
Signed-off-by: conggguan <[email protected]>
047382f
to
cacbdab
Compare
public static volatile Float DEFAULT_WINDOW_SIZE_EXPANSION; | ||
public static volatile Float DEFAULT_PRUNING_RATIO; | ||
public static volatile Boolean DEFAULT_ENABLED; | ||
public static volatile Integer MAX_WINDOW_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public and volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if field is static and final then the name should be all caps. But that is not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public and volatile?
Using a volatile field and settings update consumer is a common approach in OpenSearch to get real-time cluster settings from memory, without accessing the cluster settings object. You can just find many references in other plugins.
The public can be removed.
static final ParseField PRUNING_RATIO = new ParseField("pruning_ratio"); | ||
@VisibleForTesting | ||
static final ParseField ENABLED = new ParseField("enabled"); | ||
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version.Current is not right way. Wondering how it worked for other features? Because this a problem for every feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at what I respond you yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that, but my question is simple. This is not the first feature who is facing this problem. So can you tell me how this particular use case is special
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bwc framework is introduced after 2.11, and it's test cases(neural query, neural sparse query, and hybrid query) are already exists when we write this framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bwc framework is introduced after 2.11, and it's test cases(neural query, neural sparse query, and hybrid query) are already exists when we write this framework.
So what difference does it make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of what you think the same case, and we can take a look at whether they are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure minimal supported version should be current. It mean with each release it will be incremented and not supported from previous release? Say we merge this before 2.15, then in 2.16 branch 2.16 is minimal supported meaning 2.15 is not supported
…ort version check. Signed-off-by: conggguan <[email protected]>
@conggguan please add BWC test for the change. It gives more confidence. Also I see you have significant changes in constructors that too version specific. BWC test is a must for these type of changes. cc: @navneet1v |
Signed-off-by: conggguan <[email protected]>
6c6ac13
to
5ede643
Compare
} | ||
|
||
@Override | ||
protected Query doToQuery(QueryShardContext context) throws IOException { | ||
protected Query doToQuery(QueryShardContext context) { | ||
if (NeuralSparseTwoPhaseParameters.isClusterOnSameVersionForTwoPhaseSearchSupport()) completeTwoPhasePara( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix formatting, it should be easier to read if you put method call inside the if
block into curly braces
@AllArgsConstructor | ||
@Getter | ||
@NonNull | ||
public final class NeuralSparseQuery extends Query { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to have BWC test for higher confidence, can you add it @conggguan? I think there was same ask from Varun in this PR
static final ParseField ENABLED = new ParseField("enabled"); | ||
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT; | ||
private Float window_size_expansion; | ||
private Float pruning_ratio; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use lowerCamelCase for variable names, btw it also affects the getter method name
return builder.build(); | ||
} | ||
|
||
private void completeTwoPhasePara(IndexSettings indexSettings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there some letters missing in the method name, or it's a weird name? :)
return builder.build(); | ||
} | ||
|
||
private void completeTwoPhasePara(IndexSettings indexSettings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix formatting in this method, every method call should be inside the curly braces and on a new line
static final ParseField PRUNING_RATIO = new ParseField("pruning_ratio"); | ||
@VisibleForTesting | ||
static final ParseField ENABLED = new ParseField("enabled"); | ||
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure minimal supported version should be current. It mean with each release it will be incremented and not supported from previous release? Say we merge this before 2.15, then in 2.16 branch 2.16 is minimal supported meaning 2.15 is not supported
Description
This change implement for #646
Feature support query
Feature implementation optimize plan
Intro
Now implement two-phase feature by adding rescoreContext in the begin of HybridQueryPhaseSearcher's. It is harmless but little hack, so there is a long term plan to replace this implementation.
RFC in opensearch core
To add a correct rescoreContext to searchContext, we need both searchContext and whole compound query, but now if we can only get these together in the HybridQueryPhaseSearcher. In QueryBuilder, the searchContext is private, and whole query are not visible.
What we need to implement this feature is the whole query and searchContext, so if OpenSearch Core can open a API which can help me get it before the query performed, we can move the rescoreContext function to a more general interface of core.
RoadMap of this optimizing
Issues Resolved
Resolve #646
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.