-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neural sparse query two-phase search processor's bwc test #777
Merged
zhichao-aws
merged 30 commits into
opensearch-project:main
from
conggguan:search-pipeline-bwc
Jul 9, 2024
Merged
Changes from 24 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
3d2bd18
Poc of pipeline
b8ef828
Complete some settings for two phase pipeline.
f678e93
Change the implement of two-phase from QueryBuilderVistor to custom p…
9a1d52c
Add It and fix some bug on the state of multy same neuralsparsequeryb…
3bb10fe
Simplify some logic, and correct some format.
dbc4269
Optimize some format.
a93c8cd
Merge branch 'opensearch-project:main' into search-pipeline
conggguan f190834
Add some test case.
5ee07d1
Optimize some logic for zhichao-aws's comments.
a9adb72
Merge branch 'main' into search-pipeline
conggguan 0f5eab9
Optimize a line without application.
25edb27
Add some comments, remove some redundant lines, fix some format.
61cac40
Remove a redundant null check, fix a if format.
83abb31
Fix a typo for a comment, camelcase format for some variable.
a53966c
Add some comments to illustrate the influence of the modify on 2-phas…
eb17594
Add restart and rolling upgrade bwc test for neural sparse two phase …
18e3e65
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 248dfb4
Spotless on qa.
e362373
Update change log for two-phase BWC test.
347e42e
Remove redundant lines of two-phase BWC test.
641152f
Merge branch 'bwc-copy' into search-pipeline-bwc
801d96a
Merge from main.
55544cb
Add changelog.
c901832
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 93e957f
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan b9cfdb6
Add the PR link and number for the CHANGELOG.md.
363bd18
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 820cbac
[Fix] NeuralSparseTwoPhaseProcessorIT created wrong ingest pipeline, …
aa42c07
Merge branch 'opensearch-project:main' into search-pipeline-bwc
conggguan 0540635
Merge branch 'main' into search-pipeline-bwc
conggguan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
64 changes: 64 additions & 0 deletions
64
...pgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralSparseTwoPhaseProcessorIT.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.neuralsearch.bwc; | ||
|
||
import org.opensearch.common.settings.Settings; | ||
import org.opensearch.neuralsearch.query.NeuralSparseQueryBuilder; | ||
import org.opensearch.neuralsearch.util.TestUtils; | ||
|
||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
|
||
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER; | ||
import static org.opensearch.neuralsearch.util.TestUtils.TEXT_EMBEDDING_PROCESSOR; | ||
|
||
public class NeuralSparseTwoPhaseProcessorIT extends AbstractRestartUpgradeRestTestCase { | ||
|
||
private static final String NEURAL_SPARSE_INGEST_PIPELINE_NAME = "nstp-nlp-ingest-pipeline-dense"; | ||
private static final String NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME = "nstp-nlp-two-phase-search-pipeline-sparse"; | ||
private static final String TEST_ENCODING_FIELD = "passage_embedding"; | ||
private static final String TEST_TEXT_FIELD = "passage_text"; | ||
private static final String TEXT_1 = "Hello world a b"; | ||
|
||
public void testNeuralSparseQueryTwoPhaseProcessor_NeuralSearch_E2EFlow() throws Exception { | ||
waitForClusterHealthGreen(NODES_BWC_CLUSTER); | ||
NeuralSparseQueryBuilder neuralSparseQueryBuilder = new NeuralSparseQueryBuilder().fieldName(TEST_ENCODING_FIELD).queryText(TEXT_1); | ||
if (isRunningAgainstOldCluster()) { | ||
String modelId = uploadSparseEncodingModel(); | ||
loadModel(modelId); | ||
neuralSparseQueryBuilder.modelId(modelId); | ||
createPipelineProcessor(modelId, NEURAL_SPARSE_INGEST_PIPELINE_NAME); | ||
createIndexWithConfiguration( | ||
getIndexNameForTest(), | ||
Files.readString(Path.of(classLoader.getResource("processor/IndexMappingMultipleShard.json").toURI())), | ||
NEURAL_SPARSE_INGEST_PIPELINE_NAME | ||
); | ||
addDocument(getIndexNameForTest(), "0", TEST_TEXT_FIELD, TEXT_1, null, null); | ||
createNeuralSparseTwoPhaseSearchProcessor(NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME); | ||
updateIndexSettings( | ||
getIndexNameForTest(), | ||
Settings.builder().put("index.search.default_pipeline", NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME) | ||
); | ||
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits"); | ||
assertNotNull(resultWith2PhasePipeline); | ||
} else { | ||
String modelId = null; | ||
try { | ||
modelId = TestUtils.getModelId(getIngestionPipeline(NEURAL_SPARSE_INGEST_PIPELINE_NAME), TEXT_EMBEDDING_PROCESSOR); | ||
loadModel(modelId); | ||
neuralSparseQueryBuilder.modelId(modelId); | ||
Object resultWith2PhasePipeline = search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits"); | ||
assertNotNull(resultWith2PhasePipeline); | ||
} finally { | ||
wipeOfTestResources( | ||
getIndexNameForTest(), | ||
NEURAL_SPARSE_INGEST_PIPELINE_NAME, | ||
modelId, | ||
NEURAL_SPARSE_TWO_PHASE_SEARCH_PIPELINE_NAME | ||
); | ||
} | ||
} | ||
} | ||
} |
16 changes: 16 additions & 0 deletions
16
...tart-upgrade/src/test/resources/processor/NeuralSparseTwoPhaseProcessorConfiguration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"request_processors": [ | ||
{ | ||
"neural_sparse_two_phase_processor": { | ||
"tag": "neural-sparse", | ||
"description": "This processor is making two-phase rescorer.", | ||
"enabled": true, | ||
"two_phase_parameter": { | ||
"prune_ratio": %f, | ||
"expansion_rate": %f, | ||
"max_window_size": %d | ||
} | ||
} | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
...pgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralSparseTwoPhaseProcessorIT.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package org.opensearch.neuralsearch.bwc; | ||
|
||
import org.opensearch.common.settings.Settings; | ||
import org.opensearch.neuralsearch.query.NeuralSparseQueryBuilder; | ||
import org.opensearch.neuralsearch.util.TestUtils; | ||
|
||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
import java.util.List; | ||
|
||
import static org.opensearch.neuralsearch.util.TestUtils.NODES_BWC_CLUSTER; | ||
import static org.opensearch.neuralsearch.util.TestUtils.SPARSE_ENCODING_PROCESSOR; | ||
|
||
public class NeuralSparseTwoPhaseProcessorIT extends AbstractRollingUpgradeTestCase { | ||
// add prefix to avoid conflicts with other IT class, since don't wipe resources after first round | ||
private static final String SPARSE_INGEST_PIPELINE_NAME = "nstp-nlp-ingest-pipeline-sparse"; | ||
private static final String SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME = "nstp-nlp-two-phase-search-pipeline-sparse"; | ||
private static final String TEST_ENCODING_FIELD = "passage_embedding"; | ||
private static final String TEST_TEXT_FIELD = "passage_text"; | ||
private static final String TEXT_1 = "Hello world a b"; | ||
private String sparseModelId = ""; | ||
|
||
// test of NeuralSparseTwoPhaseProcessor supports neural_sparse query's two phase speed up | ||
// the feature is introduced from 2.15 | ||
public void testNeuralSparseTwoPhaseProcessorIT_NeuralSparseSearch_E2EFlow() throws Exception { | ||
waitForClusterHealthGreen(NODES_BWC_CLUSTER); | ||
// will set the model_id after we obtain the id | ||
NeuralSparseQueryBuilder neuralSparseQueryBuilder = new NeuralSparseQueryBuilder().fieldName(TEST_ENCODING_FIELD).queryText(TEXT_1); | ||
|
||
switch (getClusterType()) { | ||
case OLD: | ||
sparseModelId = uploadSparseEncodingModel(); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
createPipelineForSparseEncodingProcessor(sparseModelId, SPARSE_INGEST_PIPELINE_NAME); | ||
createIndexWithConfiguration( | ||
getIndexNameForTest(), | ||
Files.readString(Path.of(classLoader.getResource("processor/SparseIndexMappings.json").toURI())), | ||
SPARSE_INGEST_PIPELINE_NAME | ||
); | ||
addSparseEncodingDoc(getIndexNameForTest(), "0", List.of(), List.of(), List.of(TEST_TEXT_FIELD), List.of(TEXT_1)); | ||
createNeuralSparseTwoPhaseSearchProcessor(SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME); | ||
updateIndexSettings( | ||
getIndexNameForTest(), | ||
Settings.builder().put("index.search.default_pipeline", SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME) | ||
); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
break; | ||
case MIXED: | ||
sparseModelId = TestUtils.getModelId(getIngestionPipeline(SPARSE_INGEST_PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
break; | ||
case UPGRADED: | ||
try { | ||
sparseModelId = TestUtils.getModelId(getIngestionPipeline(SPARSE_INGEST_PIPELINE_NAME), SPARSE_ENCODING_PROCESSOR); | ||
loadModel(sparseModelId); | ||
neuralSparseQueryBuilder.modelId(sparseModelId); | ||
assertNotNull(search(getIndexNameForTest(), neuralSparseQueryBuilder, 1).get("hits")); | ||
} finally { | ||
wipeOfTestResources( | ||
getIndexNameForTest(), | ||
SPARSE_INGEST_PIPELINE_NAME, | ||
sparseModelId, | ||
SPARSE_SEARCH_TWO_PHASE_PIPELINE_NAME | ||
); | ||
} | ||
break; | ||
default: | ||
throw new IllegalStateException("Unexpected value: " + getClusterType()); | ||
} | ||
} | ||
} |
16 changes: 16 additions & 0 deletions
16
...ling-upgrade/src/test/resources/processor/NeuralSparseTwoPhaseProcessorConfiguration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"request_processors": [ | ||
{ | ||
"neural_sparse_two_phase_processor": { | ||
"tag": "neural-sparse", | ||
"description": "This processor is making two-phase rescorer.", | ||
"enabled": true, | ||
"two_phase_parameter": { | ||
"prune_ratio": %f, | ||
"expansion_rate": %f, | ||
"max_window_size": %d | ||
} | ||
} | ||
} | ||
] | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make a proper changelog entry - pr number and link are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, have added it to the change log.