-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Neural Sparse Query Two Phase Search pipeline #747
[Feature] Neural Sparse Query Two Phase Search pipeline #747
Conversation
Signed-off-by: conggguan <[email protected]>
Signed-off-by: conggguan <[email protected]>
…rocess funciton. Signed-off-by: conggguan <[email protected]>
…uilder. Signed-off-by: conggguan <[email protected]>
75a33fd
to
9a1d52c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments for this PR:
- please think of simpler and more optimal low level design, logic is a bit overcomplicated
- formatting, please put calls under if or else conditions into a braces even if it's one liner
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
float baseBoost | ||
) { | ||
baseBoost *= queryBuilder.boost(); | ||
neuralSparseQueryBuilderFloatMap.put(queryBuilder.getCopyNeuralSparseQueryBuilderForTwoPhase(ratio), baseBoost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment - seems we're creating copy of the NeuralSparceQueryBuilder every time we reach here. In worse case how many times we can hit this place? Do we really need a new copy every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, consolidating all query tokens to construct a single NeuralSparseQueryBuilder is a viable strategy. This approach could potentially increase the overall efficiency of the queries, but it might also complicate the logic. Queries involving a large number of NeuralSparseQuery components may not be very common. Therefore, do you think make a more complicate but more effective logic is essential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if can keep correctness and improve performance I would vote for such approach. Also that aligns with one of the PRs goals which is improve query latency. Is it possible for you to run a benchmark to see how much latency improvement we can get if we reuse a single object?
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
fd3d634
to
0a7bd74
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #747 +/- ##
============================================
- Coverage 85.02% 84.53% -0.50%
- Complexity 790 807 +17
============================================
Files 60 61 +1
Lines 2430 2534 +104
Branches 410 427 +17
============================================
+ Hits 2066 2142 +76
- Misses 202 220 +18
- Partials 162 172 +10 ☔ View full report in Codecov by Sentry. |
0a7bd74
to
91f6a25
Compare
Signed-off-by: conggguan <[email protected]>
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
if (queryBuilder instanceof BoolQueryBuilder) { | ||
BoolQueryBuilder boolQueryBuilder = (BoolQueryBuilder) queryBuilder; | ||
float updatedBoost = baseBoost * boolQueryBuilder.boost(); | ||
for (QueryBuilder subQuery : boolQueryBuilder.should()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In last PR we also supported BoostQuery, and use clause.isScoring() to determine whether to support the boolean clause. Are they same here?
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessorIT.java
Show resolved
Hide resolved
Signed-off-by: conggguan <[email protected]>
5fa1e74
to
7846f5d
Compare
Signed-off-by: conggguan <[email protected]>
7846f5d
to
a9adb72
Compare
Signed-off-by: conggguan <[email protected]>
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
12ce9ce
to
c5d0e99
Compare
Signed-off-by: conggguan <[email protected]>
c5d0e99
to
25edb27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
if (queryTokensSupplier != null) { | ||
builder.append(queryTokensSupplier.get()); | ||
} | ||
if (twoPhaseSharedQueryToken != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do the null check here
Signed-off-by: conggguan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a bwc for this change, are we covered by anything that already exists or we need a new test?
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NeuralSparseTwoPhaseProcessor.java
Outdated
Show resolved
Hide resolved
I think we need a BWC test, and it would be better to perform it after this code is merged. Currently, I can't build a BWC test to invoke the code from this PR since it hasn't been merged yet. I will add a BWC test as soon as this PR is merged. Is this a good solution? |
Signed-off-by: conggguan <[email protected]>
…e search pipeline to neural sparse query builder. Signed-off-by: conggguan <[email protected]>
That works, although I'm not sure what issue you're facing as bwc should be able to use code from active PR. Please make sure the PR with BWC is merged in the same release, which in case of this change is 2.15. |
* Poc of pipeline Signed-off-by: conggguan <[email protected]> * Complete some settings for two phase pipeline. Signed-off-by: conggguan <[email protected]> * Change the implement of two-phase from QueryBuilderVistor to custom process funciton. Signed-off-by: conggguan <[email protected]> * Add It and fix some bug on the state of multy same neuralsparsequerybuilder. Signed-off-by: conggguan <[email protected]> * Simplify some logic, and correct some format. Signed-off-by: conggguan <[email protected]> * Optimize some format. Signed-off-by: conggguan <[email protected]> * Add some test case. Signed-off-by: conggguan <[email protected]> * Optimize some logic for zhichao-aws's comments. Signed-off-by: conggguan <[email protected]> * Optimize a line without application. Signed-off-by: conggguan <[email protected]> * Add some comments, remove some redundant lines, fix some format. Signed-off-by: conggguan <[email protected]> * Remove a redundant null check, fix a if format. Signed-off-by: conggguan <[email protected]> * Fix a typo for a comment, camelcase format for some variable. Signed-off-by: conggguan <[email protected]> * Add some comments to illustrate the influence of the modify on 2-phase search pipeline to neural sparse query builder. Signed-off-by: conggguan <[email protected]> --------- Signed-off-by: conggguan <[email protected]> Signed-off-by: conggguan <[email protected]> (cherry picked from commit 2b21110)
* Poc of pipeline Signed-off-by: conggguan <[email protected]> * Complete some settings for two phase pipeline. Signed-off-by: conggguan <[email protected]> * Change the implement of two-phase from QueryBuilderVistor to custom process funciton. Signed-off-by: conggguan <[email protected]> * Add It and fix some bug on the state of multy same neuralsparsequerybuilder. Signed-off-by: conggguan <[email protected]> * Simplify some logic, and correct some format. Signed-off-by: conggguan <[email protected]> * Optimize some format. Signed-off-by: conggguan <[email protected]> * Add some test case. Signed-off-by: conggguan <[email protected]> * Optimize some logic for zhichao-aws's comments. Signed-off-by: conggguan <[email protected]> * Optimize a line without application. Signed-off-by: conggguan <[email protected]> * Add some comments, remove some redundant lines, fix some format. Signed-off-by: conggguan <[email protected]> * Remove a redundant null check, fix a if format. Signed-off-by: conggguan <[email protected]> * Fix a typo for a comment, camelcase format for some variable. Signed-off-by: conggguan <[email protected]> * Add some comments to illustrate the influence of the modify on 2-phase search pipeline to neural sparse query builder. Signed-off-by: conggguan <[email protected]> --------- Signed-off-by: conggguan <[email protected]> Signed-off-by: conggguan <[email protected]> (cherry picked from commit 2b21110) Co-authored-by: conggguan <[email protected]>
Description
This change implement for #646
Feature support query
Issues Resolved
Resolve #646
Check List
documentation-website issue
opensearch-project/documentation-website#7289
BWC PR
On the way...
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.