From b188880a7787100e959621763c944b77e36c9bec Mon Sep 17 00:00:00 2001 From: conggguan Date: Thu, 6 Jun 2024 17:04:54 +0800 Subject: [PATCH 01/12] Add doc for neural-sparse-query-two-phase-processor. Signed-off-by: conggguan --- _search-plugins/neural-sparse-search.md | 5 + ...neural-sparse-query-two-phase-processor.md | 124 ++++++++++++++++++ 2 files changed, 129 insertions(+) create mode 100644 _search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index fd86b3f6b0..585bd8d6c0 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -30,6 +30,7 @@ To use neural sparse search, follow these steps: 1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). 1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). 1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search). +1. [Create and enable two-phase processor (Optional)](#step-5-create-and-enable-two-phase-processor-optional). ## Step 1: Create an ingest pipeline @@ -261,6 +262,10 @@ GET my-nlp-index/_search } } ``` +## Step 5: Create and enable two-phase processor (Optional) +'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss +For more information, you can refer to [neural-sparse-query-two-phase-processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). + ## Setting a default model on an index or field diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md new file mode 100644 index 0000000000..992dea951c --- /dev/null +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -0,0 +1,124 @@ +--- +layout: default +title: NeuralSparse query two-phase processor +nav_order: 13 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# NeuralSparse query two-phase processor + +The `neural_sparse_two_phase_processor` search request processor is designed to set a speed-up pipeline for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps. In the first step, it uses high-weight tokens to score the documents and filters out the top documents; in the second step, it uses low-weight tokens to fine-tune the scores of the top documents. + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`enabled` | Boolean | Controls whether the two-phase is enabled with a default value of `true`. +`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. Optional. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. +`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's max score * prune_ratio. Default value is 0.4. Valid range is [0,1]. +`two_phase_parameter.expansion_rate` | Float | A rate that specifies how many documents will be fine-tuned during the second phase. The second phase doc number equals query size (default 10) * expansion rate. Default value is 5.0. Valid range is greater than 1.0. +`two_phase_parameter.max_window_size` | Int | A limit number of the two-phase fine-tune documents. Default value is 10000. Valid range is greater than 50. +`tag` | String | The processor's identifier. Optional. +`description` | String | A description of the processor. Optional. + +## Example + +### Create search pipeline + +The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific fields in the index: + +```json +PUT /_search/pipeline/two_phase_search_pipeline +{ + "request_processors": [ + { + "neural_sparse_two_phase_processor": { + "tag": "neural-sparse", + "description": "This processor is making two-phase processor.", + "enabled": true, + "two_phase_parameter": { + "prune_ratio": custom_prune_ratio, + "expansion_rate": custom_expansion_rate, + "max_window_size": custom_max_window_size + } + } + } + ] +} +``` +{% include copy-curl.html %} + +### Set search pipeline + +Then choose the proper index and set the `index.search.default_pipeline` to the pipeline name. +```json +PUT /index-name/_settings +{ + "index.search.default_pipeline" : "two_phase_search_pipeline" +} +``` +{% include copy-curl.html %} + +## Limitation +### Version support +`neural_sparse_two_phase_processor` is introduced in OpenSearch 2.15. You can use this pipeline in a cluster whose minimal version is greater than or equals to 2.15. + +### Compound query support +There is 6 types of [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/). And we only support bool query now. +- [x] bool (Boolean) +- [ ] boosting +- [ ] constant_score +- [ ] dis_max (disjunction max) +- [ ] function_score +- [ ] hybrid + +Notice, neural sparse query or bool query with a boost parameter (not same as boosting query) are also supported. + +#### Supported Example +##### Single neural sparse query + +``` +GET /my-nlp-index/_search +{ + "query": { + "neural_sparse": { + "passage_embedding": { + "query_text": "Hi world" + } + } + } +} +``` +{% include copy-curl.html %} +##### Neural sparse query nested in bool query + +``` +GET /my-nlp-index/_search +{ + "query": { + "bool": { + "should": [ + { + "neural_sparse": { + "passage_embedding": { + "query_text": "Hi world", + "model_id": + }, + "boost": 2.0 + } + } + ] + } + } +} +``` +{% include copy-curl.html %} + +## Metrics + +In doc-only mode, the two-phase processor will reduce the query latency by 20% to 50%, depending on the index configuration and two-phase parameters. +In bi-encoder mode, the two-phase processor can decrease the query latency by up to 90%, also depending on the index configuration and two-phase parameters. \ No newline at end of file From 7a0df87e97b3af540d41f4dace439cc3eeb92dcf Mon Sep 17 00:00:00 2001 From: conggguan Date: Tue, 11 Jun 2024 13:33:04 +0800 Subject: [PATCH 02/12] Make some edits for the comments. Signed-off-by: conggguan --- _search-plugins/neural-sparse-search.md | 33 +++++++++++++++++-- ...neural-sparse-query-two-phase-processor.md | 19 ++++++++--- 2 files changed, 46 insertions(+), 6 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 585bd8d6c0..5d6f3a00e3 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -263,8 +263,37 @@ GET my-nlp-index/_search } ``` ## Step 5: Create and enable two-phase processor (Optional) -'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss -For more information, you can refer to [neural-sparse-query-two-phase-processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). + +This step is optional but strongly recommended, as it significantly improves the performance of neural sparse queries with almost no side effects. + +'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss. + +You can quickly launch a pipeline based on the following API example. For more detailed information on the parameter settings and basic principles of this pipeline, please refer to [neural-sparse-query-two-phase-processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). + +```json +PUT /_search/pipeline/two_phase_search_pipeline +{ + "request_processors": [ + { + "neural_sparse_two_phase_processor": { + "tag": "neural-sparse", + "description": "This processor is making two-phase processor." + } + } + ] +} +``` +{% include copy-curl.html %} + +Then choose the proper index and set the `index.search.default_pipeline` to the pipeline name. Replace the `index-name` in url with your index name. +```json +PUT /index-name/_settings +{ + "index.search.default_pipeline" : "two_phase_search_pipeline" +} +``` +{% include copy-curl.html %} + ## Setting a default model on an index or field diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 992dea951c..38ec5ae1c8 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -9,7 +9,7 @@ grand_parent: Search pipelines # NeuralSparse query two-phase processor -The `neural_sparse_two_phase_processor` search request processor is designed to set a speed-up pipeline for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps. In the first step, it uses high-weight tokens to score the documents and filters out the top documents; in the second step, it uses low-weight tokens to fine-tune the scores of the top documents. +The `neural_sparse_two_phase_processor` search request processor is designed to set a speed-up pipeline for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps. In the first step, it uses high-weight tokens to score the documents and filters out the top documents; in the second step, it uses low-weight tokens to rescore the scores of the top documents. ## Request fields @@ -118,7 +118,18 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -## Metrics +## P99 Latency Metrics +On an OpenSearch cluster set up on 3 m5.4xlarge AWS EC2 instances, we conducted neural sparse query's P99 latency tests on indexes corresponding to over ten datasets. +### Doc-only mode latency metric +In doc-only mode, the two-phase processor can significantly decrease query latency. Analyzing the data: +- Average latency without 2-phase: 53.56 ms +- Average latency with 2-phase: 38.61 ms -In doc-only mode, the two-phase processor will reduce the query latency by 20% to 50%, depending on the index configuration and two-phase parameters. -In bi-encoder mode, the two-phase processor can decrease the query latency by up to 90%, also depending on the index configuration and two-phase parameters. \ No newline at end of file +This results in an overall reduction of approximately 27.92% in latency. Most index show a significant decrease in latency with the 2-phase processor, with reductions ranging from 5.14% to 84.6%, the specific latency optimization values depend on the data distribution within the indexes. +### Bi-encoder mode latency metric + +In bi-encoder mode, the two-phase processor can significantly decrease query latency. Analyzing the data: +- Average latency without 2-phase: 300.79 ms +- Average latency with 2-phase: 121.64 ms + +This results in an overall reduction of approximately 59.56% in latency. Most index show a significant decrease in latency with the 2-phase processor, with reductions ranging from 1.56% to 82.84%, the specific latency optimization values depend on the data distribution within the indexes. From 22d47391b30ac5859dee994deac129b034674f03 Mon Sep 17 00:00:00 2001 From: conggguan Date: Thu, 13 Jun 2024 10:39:25 +0800 Subject: [PATCH 03/12] Fix some typo and style-job. Signed-off-by: conggguan --- ...neural-sparse-query-two-phase-processor.md | 22 ++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 38ec5ae1c8..9c7f97417c 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -8,6 +8,8 @@ grand_parent: Search pipelines --- # NeuralSparse query two-phase processor +Introduced 2.15 +{: .label .label-purple } The `neural_sparse_two_phase_processor` search request processor is designed to set a speed-up pipeline for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps. In the first step, it uses high-weight tokens to score the documents and filters out the top documents; in the second step, it uses low-weight tokens to rescore the scores of the top documents. @@ -27,6 +29,8 @@ Field | Data type | Description ## Example +The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. + ### Create search pipeline The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific fields in the index: @@ -64,21 +68,28 @@ PUT /index-name/_settings {% include copy-curl.html %} ## Limitation + +The 'neural_sparse_two_phase_processor' has limitation with both OpenSearch cluster version and compound queries. In cases where compound queries are not supported, this pipeline will not alter the original logic. + ### Version support + `neural_sparse_two_phase_processor` is introduced in OpenSearch 2.15. You can use this pipeline in a cluster whose minimal version is greater than or equals to 2.15. ### Compound query support -There is 6 types of [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/). And we only support bool query now. -- [x] bool (Boolean) +There is 6 types of [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/). And we only support boolean query now. +- [x] boolean - [ ] boosting - [ ] constant_score - [ ] dis_max (disjunction max) - [ ] function_score - [ ] hybrid -Notice, neural sparse query or bool query with a boost parameter (not same as boosting query) are also supported. +Notice, neural sparse query or boolean query with a boost parameter (not same as boosting query) are also supported. + +#### Supported example + +Both single neural sparse queries and boolean queries with a boost parameter are supported. -#### Supported Example ##### Single neural sparse query ``` @@ -88,13 +99,14 @@ GET /my-nlp-index/_search "neural_sparse": { "passage_embedding": { "query_text": "Hi world" + "model_id": } } } } ``` {% include copy-curl.html %} -##### Neural sparse query nested in bool query +##### Neural sparse query nested in boolean query ``` GET /my-nlp-index/_search From d11c412445a7ccb70953273cb492d03bacfbfd8b Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 13 Jun 2024 16:03:49 -0500 Subject: [PATCH 04/12] Update neural-sparse-query-two-phase-processor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- ...neural-sparse-query-two-phase-processor.md | 55 ++++++++++--------- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 9c7f97417c..53c08cd215 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -1,17 +1,19 @@ --- layout: default -title: NeuralSparse query two-phase processor +title: Neural spare query two-phase processor nav_order: 13 -has_children: false parent: Search processors grand_parent: Search pipelines --- -# NeuralSparse query two-phase processor +# Neural sparse query two-phase processor Introduced 2.15 {: .label .label-purple } -The `neural_sparse_two_phase_processor` search request processor is designed to set a speed-up pipeline for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps. In the first step, it uses high-weight tokens to score the documents and filters out the top documents; in the second step, it uses low-weight tokens to rescore the scores of the top documents. +The `neural_sparse_two_phase_processor` search processer is designed to set a speed-up search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps: + +1. High-weight tokens score the documents and filter out the top documents. +2. Low-weight tokens rescore the scores of the top documents. ## Request fields @@ -19,17 +21,17 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- -`enabled` | Boolean | Controls whether the two-phase is enabled with a default value of `true`. -`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. Optional. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. -`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's max score * prune_ratio. Default value is 0.4. Valid range is [0,1]. -`two_phase_parameter.expansion_rate` | Float | A rate that specifies how many documents will be fine-tuned during the second phase. The second phase doc number equals query size (default 10) * expansion rate. Default value is 5.0. Valid range is greater than 1.0. -`two_phase_parameter.max_window_size` | Int | A limit number of the two-phase fine-tune documents. Default value is 10000. Valid range is greater than 50. +`enabled` | Boolean | Controls whether two-phase is enabled. Default is `true`. +`two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional. +`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's max score * prune_ratio. Valid range is [0,1]. Default is `0.4` +`two_phase_parameter.expansion_rate` | Float | A rate that specifies how many documents will be fine-tuned during the second phase. The second phase doc number equals query size (default 10) * expansion rate. Valid range is greater than 1.0. Default is `5.0` +`two_phase_parameter.max_window_size` | Int | A limit number of the two-phase fine-tune documents. Valid range is greater than 50. Default is `10000`. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. ## Example -The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. +The following example creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. ### Create search pipeline @@ -58,7 +60,8 @@ PUT /_search/pipeline/two_phase_search_pipeline ### Set search pipeline -Then choose the proper index and set the `index.search.default_pipeline` to the pipeline name. +After the two-phase pipeline is created, set the `index.search.default_pipeline` setting to the pipeline name of the index for which you want to use the pipeline: + ```json PUT /index-name/_settings { @@ -69,26 +72,21 @@ PUT /index-name/_settings ## Limitation -The 'neural_sparse_two_phase_processor' has limitation with both OpenSearch cluster version and compound queries. In cases where compound queries are not supported, this pipeline will not alter the original logic. +The 'neural_sparse_two_phase_processor' contains the following limitations: ### Version support -`neural_sparse_two_phase_processor` is introduced in OpenSearch 2.15. You can use this pipeline in a cluster whose minimal version is greater than or equals to 2.15. +`neural_sparse_two_phase_processor` can only be used with OpenSearch 2.15 or greater. ### Compound query support -There is 6 types of [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/). And we only support boolean query now. -- [x] boolean -- [ ] boosting -- [ ] constant_score -- [ ] dis_max (disjunction max) -- [ ] function_score -- [ ] hybrid -Notice, neural sparse query or boolean query with a boost parameter (not same as boosting query) are also supported. +As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported + +Neural sparse queries and boolean queries with a boost parameter (not a boosting query) are also supported. #### Supported example -Both single neural sparse queries and boolean queries with a boost parameter are supported. +The following examples show neural sparse queries with the supported query types. ##### Single neural sparse query @@ -106,6 +104,7 @@ GET /my-nlp-index/_search } ``` {% include copy-curl.html %} + ##### Neural sparse query nested in boolean query ``` @@ -131,17 +130,21 @@ GET /my-nlp-index/_search {% include copy-curl.html %} ## P99 Latency Metrics -On an OpenSearch cluster set up on 3 m5.4xlarge AWS EC2 instances, we conducted neural sparse query's P99 latency tests on indexes corresponding to over ten datasets. +On an OpenSearch cluster set up on 3 m5.4xlarge Amazon EC2 instances, OpenSearch conducts neural sparse query's P99 latency tests on indexes corresponding to over ten datasets. + ### Doc-only mode latency metric -In doc-only mode, the two-phase processor can significantly decrease query latency. Analyzing the data: + +In doc-only mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics: + - Average latency without 2-phase: 53.56 ms - Average latency with 2-phase: 38.61 ms -This results in an overall reduction of approximately 27.92% in latency. Most index show a significant decrease in latency with the 2-phase processor, with reductions ranging from 5.14% to 84.6%, the specific latency optimization values depend on the data distribution within the indexes. +This results in an overall reduction of approximately 27.92% in latency. Most indexes show a significant decrease in latency with the 2-phase processor, with reductions ranging from 5.14% to 84.6. The specific latency optimization values depend on the data distribution within the indexes. + ### Bi-encoder mode latency metric In bi-encoder mode, the two-phase processor can significantly decrease query latency. Analyzing the data: - Average latency without 2-phase: 300.79 ms - Average latency with 2-phase: 121.64 ms -This results in an overall reduction of approximately 59.56% in latency. Most index show a significant decrease in latency with the 2-phase processor, with reductions ranging from 1.56% to 82.84%, the specific latency optimization values depend on the data distribution within the indexes. +This results in an overall reduction of approximately 59.56% in latency. Most indexes show a significant decrease in latency with the 2-phase processor, with reductions ranging from 1.56% to 82.84%. The specific latency optimization values depend on the data distribution within the indexes. From 33a7c6e43b90308d80719467b7b6e8e0fb8471bb Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 13 Jun 2024 16:06:22 -0500 Subject: [PATCH 05/12] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 5d6f3a00e3..d18d20a65a 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -30,7 +30,7 @@ To use neural sparse search, follow these steps: 1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). 1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). 1. [Search the index using neural search](#step-4-search-the-index-using-neural-sparse-search). -1. [Create and enable two-phase processor (Optional)](#step-5-create-and-enable-two-phase-processor-optional). +1. _Optional_ [Create and enable the two-phase processor](#step-5-create-and-enable-the-two-phase-processor-optional). ## Step 1: Create an ingest pipeline @@ -266,9 +266,9 @@ GET my-nlp-index/_search This step is optional but strongly recommended, as it significantly improves the performance of neural sparse queries with almost no side effects. -'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss. +The 'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss. -You can quickly launch a pipeline based on the following API example. For more detailed information on the parameter settings and basic principles of this pipeline, please refer to [neural-sparse-query-two-phase-processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/). +To quickly launch a search pipeline with neural sparse search, use the following example: ```json PUT /_search/pipeline/two_phase_search_pipeline @@ -285,7 +285,7 @@ PUT /_search/pipeline/two_phase_search_pipeline ``` {% include copy-curl.html %} -Then choose the proper index and set the `index.search.default_pipeline` to the pipeline name. Replace the `index-name` in url with your index name. +Then choose the index you want to set up with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example: ```json PUT /index-name/_settings { From e1fd8ddcf35732b2e831130b40ed8bd09bc773c5 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Thu, 13 Jun 2024 16:09:48 -0500 Subject: [PATCH 06/12] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index d18d20a65a..1963f8d733 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -262,7 +262,7 @@ GET my-nlp-index/_search } } ``` -## Step 5: Create and enable two-phase processor (Optional) +## Step 5: Create and enable the two-phase processor (Optional) This step is optional but strongly recommended, as it significantly improves the performance of neural sparse queries with almost no side effects. From 074f340ba58da6222ead7b6bfa44a8739264fd56 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 07:16:02 -0500 Subject: [PATCH 07/12] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 3 +-- .../neural-sparse-query-two-phase-processor.md | 10 +++++----- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 1963f8d733..391ca7128b 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -264,9 +264,8 @@ GET my-nlp-index/_search ``` ## Step 5: Create and enable the two-phase processor (Optional) -This step is optional but strongly recommended, as it significantly improves the performance of neural sparse queries with almost no side effects. -The 'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. It can speed up the neural sparse query's time cost with negligible accurency loss. +The 'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. To quickly launch a search pipeline with neural sparse search, use the following example: diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 53c08cd215..8737e3cf02 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -10,7 +10,7 @@ grand_parent: Search pipelines Introduced 2.15 {: .label .label-purple } -The `neural_sparse_two_phase_processor` search processer is designed to set a speed-up search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps: +The `neural_sparse_two_phase_processor` search processor is designed to set a speed-up search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps: 1. High-weight tokens score the documents and filter out the top documents. 2. Low-weight tokens rescore the scores of the top documents. @@ -82,13 +82,13 @@ The 'neural_sparse_two_phase_processor' contains the following limitations: As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported -Neural sparse queries and boolean queries with a boost parameter (not a boosting query) are also supported. +Neural sparse queries and Boolean queries with a boost parameter (not boosting queries) are also supported. -#### Supported example +## Examples The following examples show neural sparse queries with the supported query types. -##### Single neural sparse query +### Single neural sparse query ``` GET /my-nlp-index/_search @@ -105,7 +105,7 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -##### Neural sparse query nested in boolean query +### Neural sparse query nested in Boolean query ``` GET /my-nlp-index/_search From 61f74ee7f2a9c843203680b8c09d2e1e6523b6e5 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 09:29:53 -0500 Subject: [PATCH 08/12] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 4 +-- ...neural-sparse-query-two-phase-processor.md | 32 +++++++++---------- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index 391ca7128b..eec0057987 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -267,7 +267,7 @@ GET my-nlp-index/_search The 'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. -To quickly launch a search pipeline with neural sparse search, use the following example: +To quickly launch a search pipeline with neural sparse search, use the following example pipeline: ```json PUT /_search/pipeline/two_phase_search_pipeline @@ -284,7 +284,7 @@ PUT /_search/pipeline/two_phase_search_pipeline ``` {% include copy-curl.html %} -Then choose the index you want to set up with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example: +Then choose the index you want to configure with the search pipeline and set the `index.search.default_pipeline` to the pipeline name, as shown in the following example: ```json PUT /index-name/_settings { diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 8737e3cf02..65b848401f 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -10,10 +10,10 @@ grand_parent: Search pipelines Introduced 2.15 {: .label .label-purple } -The `neural_sparse_two_phase_processor` search processor is designed to set a speed-up search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by breaking down the original method of scoring all documents with all tokens into two steps: +The `neural_sparse_two_phase_processor` search processor is designed to provide faster search pipelines for [neural sparse search]({{site.url}}{{site.baseurl}}/search-plugins/neural-sparse-search/). It accelerates the neural sparse query by dividing the original method of scoring all documents with all tokens into two steps: 1. High-weight tokens score the documents and filter out the top documents. -2. Low-weight tokens rescore the scores of the top documents. +2. Low-weight tokens rescore the top documents. ## Request fields @@ -21,7 +21,7 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- -`enabled` | Boolean | Controls whether two-phase is enabled. Default is `true`. +`enabled` | Boolean | Controls whether the two-phase processor is enabled. Default is `true`. `two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional. `two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's max score * prune_ratio. Valid range is [0,1]. Default is `0.4` `two_phase_parameter.expansion_rate` | Float | A rate that specifies how many documents will be fine-tuned during the second phase. The second phase doc number equals query size (default 10) * expansion rate. Valid range is greater than 1.0. Default is `5.0` @@ -60,7 +60,7 @@ PUT /_search/pipeline/two_phase_search_pipeline ### Set search pipeline -After the two-phase pipeline is created, set the `index.search.default_pipeline` setting to the pipeline name of the index for which you want to use the pipeline: +After the two-phase pipeline is created, set the `index.search.default_pipeline` setting to the name of the pipeline for the index on which you want to use the two-phase pipeline: ```json PUT /index-name/_settings @@ -72,15 +72,15 @@ PUT /index-name/_settings ## Limitation -The 'neural_sparse_two_phase_processor' contains the following limitations: +The `neural_sparse_two_phase_processor` has the following limitations. ### Version support -`neural_sparse_two_phase_processor` can only be used with OpenSearch 2.15 or greater. +The `neural_sparse_two_phase_processor` can only be used with OpenSearch 2.15 or later. ### Compound query support -As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported +As of OpenSearch 2.15, only the Boolean [compound query]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/) is supported. Neural sparse queries and Boolean queries with a boost parameter (not boosting queries) are also supported. @@ -105,7 +105,7 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -### Neural sparse query nested in Boolean query +### Neural sparse query nested in a Boolean query ``` GET /my-nlp-index/_search @@ -129,22 +129,22 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} -## P99 Latency Metrics -On an OpenSearch cluster set up on 3 m5.4xlarge Amazon EC2 instances, OpenSearch conducts neural sparse query's P99 latency tests on indexes corresponding to over ten datasets. +## P99 latency metrics +On an OpenSearch cluster set up on 3 m5.4xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, OpenSearch conducts neural sparse query P99 latency tests on indexes corresponding to more than 10 datasets. ### Doc-only mode latency metric In doc-only mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics: - Average latency without 2-phase: 53.56 ms -- Average latency with 2-phase: 38.61 ms +- Average latency with the two-phase processor: 38.61 ms -This results in an overall reduction of approximately 27.92% in latency. Most indexes show a significant decrease in latency with the 2-phase processor, with reductions ranging from 5.14% to 84.6. The specific latency optimization values depend on the data distribution within the indexes. +This results in an overall latency reduction of approximately 27.92%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 5.14 to 84.6%. The specific latency optimization values depend on the data distribution within the indexes. ### Bi-encoder mode latency metric -In bi-encoder mode, the two-phase processor can significantly decrease query latency. Analyzing the data: -- Average latency without 2-phase: 300.79 ms -- Average latency with 2-phase: 121.64 ms +In bi-encoder mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics: +- Average latency without the two-phase processor: 300.79 ms +- Average latency with the two-phase processor: 121.64 ms -This results in an overall reduction of approximately 59.56% in latency. Most indexes show a significant decrease in latency with the 2-phase processor, with reductions ranging from 1.56% to 82.84%. The specific latency optimization values depend on the data distribution within the indexes. +This results in an overall latency reduction of approximately 59.56%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 1.56 to 82.84%. The specific latency optimization values depend on the data distribution within the indexes. From d05dc86c502dd553fe04c807ae523ac855d9bd3e Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 09:33:13 -0500 Subject: [PATCH 09/12] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- _search-plugins/neural-sparse-search.md | 2 +- .../neural-sparse-query-two-phase-processor.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md index eec0057987..41367d56e0 100644 --- a/_search-plugins/neural-sparse-search.md +++ b/_search-plugins/neural-sparse-search.md @@ -265,7 +265,7 @@ GET my-nlp-index/_search ## Step 5: Create and enable the two-phase processor (Optional) -The 'neural_sparse_two_phase_processor' is a new feature which introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. +The `neural_sparse_two_phase_processor` is a new feature introduced in OpenSearch 2.15. Using the two-phase processor can significantly improve the performance of neural sparse queries. To quickly launch a search pipeline with neural sparse search, use the following example pipeline: diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 65b848401f..9f353ee107 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -23,8 +23,8 @@ Field | Data type | Description :--- | :--- | :--- `enabled` | Boolean | Controls whether the two-phase processor is enabled. Default is `true`. `two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional. -`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's max score * prune_ratio. Valid range is [0,1]. Default is `0.4` -`two_phase_parameter.expansion_rate` | Float | A rate that specifies how many documents will be fine-tuned during the second phase. The second phase doc number equals query size (default 10) * expansion rate. Valid range is greater than 1.0. Default is `5.0` +`two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1]. Default is `0.4` +`two_phase_parameter.expansion_rate` | Float | The rate at which documents will be fine-tuned during the second phase. The second-phase document number equals the query size (default is 10) multiplied by its expansion rate. Valid range is greater than 1.0. Default is `5.0` `two_phase_parameter.max_window_size` | Int | A limit number of the two-phase fine-tune documents. Valid range is greater than 50. Default is `10000`. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. @@ -35,7 +35,7 @@ The following example creates a search pipeline with a `neural_sparse_two_phase_ ### Create search pipeline -The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific fields in the index: +The following example request creates a search pipeline with a `neural_sparse_two_phase_processor` search request processor. The processor sets a custom model ID at the index level and provides different default model IDs for two specific index fields: ```json PUT /_search/pipeline/two_phase_search_pipeline From d5f068bc8e285dd8db3ef49313e7f068b8926a53 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 09:57:49 -0500 Subject: [PATCH 10/12] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../search-pipelines/neural-sparse-query-two-phase-processor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 9f353ee107..dd9c6e7b8d 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -136,7 +136,7 @@ On an OpenSearch cluster set up on 3 m5.4xlarge Amazon Elastic Compute Cloud (Am In doc-only mode, the two-phase processor can significantly decrease query latency, as shown by the following latency metrics: -- Average latency without 2-phase: 53.56 ms +- Average latency without the two-phase processor: 53.56 ms - Average latency with the two-phase processor: 38.61 ms This results in an overall latency reduction of approximately 27.92%. Most indexes show a significant latency reduction when using the two-phase processor, with reductions ranging from 5.14 to 84.6%. The specific latency optimization values depend on the data distribution within the indexes. From 7f4b04fdc76b7adfb379aa7be93e9814676cbe90 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 09:58:16 -0500 Subject: [PATCH 11/12] Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../search-pipelines/neural-sparse-query-two-phase-processor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index dd9c6e7b8d..217c23b1a4 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -130,7 +130,7 @@ GET /my-nlp-index/_search {% include copy-curl.html %} ## P99 latency metrics -On an OpenSearch cluster set up on 3 m5.4xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, OpenSearch conducts neural sparse query P99 latency tests on indexes corresponding to more than 10 datasets. +Using an OpenSearch cluster set up on three m5.4xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, OpenSearch conducts neural sparse query P99 latency tests on indexes corresponding to more than 10 datasets. ### Doc-only mode latency metric From 5fa23062e8aa86d957868dd50e3f2ef8824b101c Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Fri, 14 Jun 2024 10:11:08 -0500 Subject: [PATCH 12/12] Update _search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../search-pipelines/neural-sparse-query-two-phase-processor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md index 217c23b1a4..53d69c1cc2 100644 --- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md +++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md @@ -25,7 +25,7 @@ Field | Data type | Description `two_phase_parameter` | Object | A map of key-value pairs representing the two-phase parameters and their associated values. You can specify the value of `prune_ratio`, `expansion_rate`, `max_window_size`, or any combination of these three parameters. Optional. `two_phase_parameter.prune_ratio` | Float | A ratio that represents how to split the high-weight tokens and low-weight tokens. The threshold is the token's maximum score multiplied by its `prune_ratio`. Valid range is [0,1]. Default is `0.4` `two_phase_parameter.expansion_rate` | Float | The rate at which documents will be fine-tuned during the second phase. The second-phase document number equals the query size (default is 10) multiplied by its expansion rate. Valid range is greater than 1.0. Default is `5.0` -`two_phase_parameter.max_window_size` | Int | A limit number of the two-phase fine-tune documents. Valid range is greater than 50. Default is `10000`. +`two_phase_parameter.max_window_size` | Int | The maximum number of documents that can be processed using the two-phase processor. Valid range is greater than 50. Default is `10000`. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional.