From f220b27a196665f1fe706d404dece45c438c4d02 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Sep 2023 18:11:57 -0400 Subject: [PATCH 01/10] Add documentation about setting a default model for neural search Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 231 +++++++++++------- .../search-pipelines/neural-query-enricher.md | 47 ++++ 2 files changed, 193 insertions(+), 85 deletions(-) create mode 100644 _search-plugins/search-pipelines/neural-query-enricher.md diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index 0bf232eb83..2a6abf50fa 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -1,6 +1,6 @@ --- layout: default -title: Neural Search plugin +title: Neural search nav_order: 200 has_children: false has_toc: false @@ -8,43 +8,66 @@ redirect_from: - /neural-search-plugin/index/ --- -# Neural Search plugin +# Neural search -The Neural Search plugin is Generally Available as of OpenSearch 2.9 +The Neural Search plugin is Generally Available as of OpenSearch 2.9. {: .note} -The OpenSearch Neural Search plugin enables the integration of machine learning (ML) language models into your search workloads. During ingestion and search, the Neural Search plugin transforms text into vectors. Then, Neural Search uses the transformed vectors in vector-based search. +Neural search facilitates vector search during ingestion and search. During ingestion, neural search transforms text into vector embeddings and indexes a document containing both the text and its vector embeddings in a k-NN index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results. The Neural Search plugin comes bundled with OpenSearch. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins). -## Ingest data with Neural Search +## Using neural search -In order to ingest vectorized documents, you need to create a Neural Search ingest _pipeline_. An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized. The following API operation creates a Neural Search ingest pipeline: +To use neural search, follow these steps: -``` +1. [Create an ingest pipeline](#step-1-create-an-ingest-pipeline). +1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). +1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). +1. [Search the index using neural search](#step-4-search-the-index-using-neural-search). + +## Step 1: Create an ingest pipeline + +To generate vector embeddings for text fields, you need to create a neural search [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/). An ingest pipeline consists of a series of processors that manipulate documents during ingestion, allowing the documents to be vectorized. + +### Path and HTTP method + +The following API operation creates a neural search ingest pipeline: + +```json PUT _ingest/pipeline/ ``` -In the pipeline request body, the `text_embedding` processor, the only processor supported by Neural Search, converts a document's text to vector embeddings. `text_embedding` uses `field_map`s to determine what fields from which to generate vector embeddings and also which field to store the embedding. - ### Path parameter -Use `pipeline_name` to create a name for your Neural Search ingest pipeline. +Use `pipeline_name` to create a name for your neural search ingest pipeline. ### Request fields +In the pipeline request body, you must set up a `text_embedding` processor, the only processor supported by neural search, which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields into which to store the embeddings: + +```json +"text_embedding": { + "model_id": "bxoDJ7IHGM14UqatWc_2j", + "field_map": { + "": "" + } +} +``` + +The following table lists the `text_embedding` processor request fields. + Field | Data type | Description :--- | :--- | :--- -description | string | A description of the processor. -model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search. For more information, see [Model Serving Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-serving-framework/) -input_field_name | string | The field name used to cache text for text embeddings. -output_field_name | string | The name of the field in which output text is stored. +`model_id` | String | The ID of the model that will be used to generate the embeddings. The model must be indexed in OpenSearch before it can be used in neural search. For more information, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/) and [Semantic search]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search/). +`field_map.` | String | The name of the field from which to obtain text for generating text embeddings. +`field_map.` | String | The name of the vector field in which to store the generated text embeddings. ### Example request -Use the following example request to create a pipeline: +The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`: -``` +```json PUT _ingest/pipeline/nlp-pipeline { "description": "An example neural search pipeline", @@ -60,117 +83,110 @@ PUT _ingest/pipeline/nlp-pipeline ] } ``` +{% include copy-curl.html %} -### Example response +## Step 2: Create an index for ingestion -OpenSearch responds with an acknowledgment of the pipeline's creation. - -```json -PUT _ingest/pipeline/nlp-pipeline -{ - "acknowledged" : true -} -``` - -## Create an index for ingestion - -In order to use the text embedding processor defined in your pipelines, create an index with mapping data that aligns with the maps specified in your pipeline. For example, the `output_fields` defined in the `field_map` field of your processor request must map to the k-NN vector fields with a dimension that matches the model. Similarly, the `text_fields` defined in your processor should map to the `text_fields` in your index. +In order to use the text embedding processor defined in your pipelines, create a k-NN index with mapping data that aligns with the maps specified in your pipeline. For example, the `` defined in the `field_map` of your processor must be mapped as a k-NN vector field with a dimension that matches the model dimension. Similarly, the `` defined in your processor should be mapped as `text` in your index. ### Example request - -The following example request creates an index that attaches to a Neural Search pipeline. Because the index maps to k-NN vector fields, the index setting field `index-knn` is set to `true`. To match the maps defined in the Neural Search pipeline, `mapping` settings use [k-NN method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#method-definitions). +The following example request creates a k-NN index that is set up with a default ingest pipeline: ```json PUT /my-nlp-index-1 { - "settings": { - "index.knn": true, - "default_pipeline": "" - }, - "mappings": { - "properties": { - "passage_embedding": { - "type": "knn_vector", - "dimension": int, - "method": { - "name": "string", - "space_type": "string", - "engine": "string", - "parameters": json_object - } - }, - "passage_text": { - "type": "text" - }, + "settings": { + "index.knn": true, + "default_pipeline": "nlp-pipeline" + }, + "mappings": { + "properties": { + "passage_embedding": { + "type": "knn_vector", + "dimension": 768, + "method": { + "name": "hnsw", + "space_type": "l2", + "engine": "lucene", + "parameters": {} } + }, + "passage_text": { + "type": "text" + } } + } } ``` +{% include copy-curl.html %} -### Example response +For more information about creating a k-NN index and the methods it supports, see [k-NN index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/). -OpenSearch responds with information about your new index: +## Step 3: Ingest documents into the index -```json -{ - "acknowledged" : true, - "shards_acknowledged" : true, - "index" : "my-nlp-index-1" -} -``` - -## Ingest documents into Neural Search - -OpenSearch's [Ingest API]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) manages document ingestion, similar to other OpenSearch indexes. For example, you can ingest a document that contains the `passage_text: "Hello world"` with a simple POST method: +To ingest documents into the index created in the previous section, send a POST request for each document: ```json -POST /my-nlp-index-1/_doc +POST /my-nlp-index-1/_doc/1 { "passage_text": "Hello world" } ``` +{% include copy-curl.html %} + +Before the document is ingested into the index, the ingest pipeline runs the `text_embedding` processor on the document, generating text embeddings for the `passage_text` field. The indexed document contains the `passage_text` field that has the original text and the `passage_embedding` field that has the vector embeddings. + +## Step 4: Search the index using neural search -With the text_embedding processor in place through a Neural Search ingest pipeline, the example indexes "Hello world" as a `text_field` and converts "Hello world" into an associated k-NN vector field. +To perform vector search on your index, use the `neural` query clause either in the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) or [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) queries. You can refine the results by using a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). -## Search a neural index +### Neural query request fields -To convert a text query into a k-NN vector query by using a language model, use the `neural` query fields in your query. The neural query request fields can be used in both the [k-NN plugin API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api/#search-model) and [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/). Furthermore, you can use a [k-NN search filter]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/) to refine your neural search query. +Include the following request fields under the `neural` query clause: -### Neural request fields +```json +"neural": { + "": { + "query_text": "Hello world", + "model_id": "bxoDJ7IHGM14UqatWc_2j", + "k": 100 + } +} +``` -Include the following request fields under the `neural` field in your query: +The top-level `vector_field` specifies the vector field against which to run a search query. The following table lists the other neural query fields. Field | Data type | Description :--- | :--- | :--- -vector_field | string | The vector field against which to run a search query. -query_text | string | The query text from which to produce queries. -model_id | string | The ID of the model that will be used in the embedding interface. The model must be indexed in OpenSearch before it can be used in Neural Search. -k | integer | The number of results the k-NN search returns. +`query_text` | String | The query text from which to generate text embeddings. +`model_id` | String | The ID of the model that will be used to generate text embeddings from the query text. The model must be indexed in OpenSearch before it can be used in neural search. +`k` | Integer | The number of results the k-NN search returns. ### Example request -The following example request uses a search query that returns vectors for the "Hello World" query text: - +The following example request uses a Boolean query to combine a filter clause and two query clauses---a neural query and a `match` query. The `script_score` query assigns custom weights to the query clauses: ```json -GET my_index/_search +GET /my-nlp-index-1/_search { "query": { - "bool" : { + "bool": { "filter": { "range": { - "distance": { "lte" : 20 } + "distance": { + "lte": 20 + } } }, - "should" : [ + "should": [ { "script_score": { "query": { "neural": { - "passage_vector": { - "query_text": "Hello world", - "model_id": "xzy76xswsd", + "passage_embedding": { + "query_text": "Hi world", + "model_id": "bxoDJ7IHGM14UqatWc_2j", "k": 100 } } @@ -179,12 +195,13 @@ GET my_index/_search "source": "_score * 1.5" } } - } - , + }, { "script_score": { "query": { - "match": { "passage_text": "Hello world" } + "match": { + "passage_text": "Hi world" + } }, "script": { "source": "_score * 1.7" @@ -196,7 +213,51 @@ GET my_index/_search } } ``` +{% include copy-curl.html %} + +### Setting a default model on an index or field + +To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index. + +First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a `neural_query_enricher` request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map: + +```json +PUT /_search/pipeline/default_model_pipeline +{ + "request_processors": [ + { + "neural_query_enricher" : { + "default_model_id": "u5j0qYoBMtvQlfhaxOsa", + "neural_field_default_id": { + "my_field_1": "uZj0qYoBMtvQlfhaYeud", + "my_field_2": "upj0qYoBMtvQlfhaZOuM" + } + } + } + ] +} +``` +{% include copy-curl.html %} +Then set the default model for your index: +```json +PUT /my-nlp-index-1/_settings +{ + "index.search.default_pipeline" : "default_model_pipeline" +} +``` +{% include copy-curl.html %} +You can now omit the model ID when searching: +```json +"query": { + "neural": { + "passage_embedding": { + "query_text": "Hi world", + "k": 100 + } + } +} +``` \ No newline at end of file diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md new file mode 100644 index 0000000000..92e6f4cc36 --- /dev/null +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -0,0 +1,47 @@ +--- +layout: default +title: Neural query enrich +nav_order: 12 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# Neural query enrich processor + +The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/). + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`default_model_id` | String | The model ID of the default model for an index. Optional. +`neural_field_default_id` | String | A map of key-value pairs representing document field names and their associated default model IDs. Optional. +`tag` | String | The processor's identifier. Optional. +`description` | String | A description of the processor. Optional. + +## Example + +The following request creates a search pipeline with a `neural_query_enricher` request processor. The processor sets a default model ID at index level and provides different default model IDs for two specific fields in the index: + +```json +PUT /_search/pipeline/default_model_pipeline +{ + "request_processors": [ + { + "neural_query_enricher" : { + "tag": "tag1", + "description": "Sets the default model ID at index and field levels", + "default_model_id": "u5j0qYoBMtvQlfhaxOsa", + "neural_field_default_id": { + "my_field_1": "uZj0qYoBMtvQlfhaYeud", + "my_field_2": "upj0qYoBMtvQlfhaZOuM" + } + } + } + ] +} +``` +{% include copy-curl.html %} From 2d22bd6736e90105a81b5dafa03ddb48948f8a2e Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Sep 2023 18:24:57 -0400 Subject: [PATCH 02/10] Add new processor to the processor list Signed-off-by: Fanit Kolchina --- _search-plugins/search-pipelines/neural-query-enricher.md | 2 +- _search-plugins/search-pipelines/search-processors.md | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index 92e6f4cc36..559726bf7b 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -18,7 +18,7 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- `default_model_id` | String | The model ID of the default model for an index. Optional. -`neural_field_default_id` | String | A map of key-value pairs representing document field names and their associated default model IDs. Optional. +`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 3bf4061cd9..5b9e264b32 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -23,8 +23,9 @@ The following table lists all supported search request processors. Processor | Description | Earliest available version :--- | :--- | :--- -[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8 +[`neural_query_enrich`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model at index or field level for neural search. | 2.11 +[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 ## Search response processors @@ -34,8 +35,8 @@ The following table lists all supported search response processors. Processor | Description | Earliest available version :--- | :--- | :--- -[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9 +[`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 ## Search phase results processors From bf9d4b5be519677d04b433915d999045b20cd5e1 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Sep 2023 22:26:53 -0400 Subject: [PATCH 03/10] More tweaks Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 4 ++-- _search-plugins/search-pipelines/search-processors.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index 2a6abf50fa..4ac18521da 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -217,9 +217,9 @@ GET /my-nlp-index-1/_search ### Setting a default model on an index or field -To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index. +To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. -First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a `neural_query_enricher` request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map: +First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map: ```json PUT /_search/pipeline/default_model_pipeline diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 5b9e264b32..52711d3705 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -24,7 +24,7 @@ The following table lists all supported search request processors. Processor | Description | Earliest available version :--- | :--- | :--- [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8 -[`neural_query_enrich`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model at index or field level for neural search. | 2.11 +[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model at index or field level for neural search. | 2.11 [`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 ## Search response processors From b717f28c88d09d6497007f7e40d264ade5443960 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Sep 2023 22:50:11 -0400 Subject: [PATCH 04/10] Refactor search pipeline documentation Signed-off-by: Fanit Kolchina --- .../creating-search-pipeline.md | 156 ++++++++++++++++++ .../filter-query-processor.md | 2 +- _search-plugins/search-pipelines/index.md | 120 +------------- .../personalize-search-ranking.md | 2 +- .../rename-field-processor.md | 2 +- .../search-pipelines/script-processor.md | 2 +- .../search-pipeline-metrics.md | 2 +- .../search-pipelines/search-processors.md | 2 +- .../search-pipelines/using-search-pipeline.md | 2 +- 9 files changed, 166 insertions(+), 124 deletions(-) create mode 100644 _search-plugins/search-pipelines/creating-search-pipeline.md diff --git a/_search-plugins/search-pipelines/creating-search-pipeline.md b/_search-plugins/search-pipelines/creating-search-pipeline.md new file mode 100644 index 0000000000..92fdae23b7 --- /dev/null +++ b/_search-plugins/search-pipelines/creating-search-pipeline.md @@ -0,0 +1,156 @@ +--- +layout: default +title: Creating a search pipeline +nav_order: 10 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Creating a search pipeline + +Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type. + +#### Example request + +The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`: + +```json +PUT /_search/pipeline/my_pipeline +{ + "request_processors": [ + { + "filter_query" : { + "tag" : "tag1", + "description" : "This processor is going to restrict to publicly visible documents", + "query" : { + "term": { + "visibility": "public" + } + } + } + } + ], + "response_processors": [ + { + "rename_field": { + "field": "message", + "target_field": "notification" + } + } + ] +} +``` +{% include copy-curl.html %} + +## Ignoring processor failures + +By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: + +```json +"filter_query" : { + "tag" : "tag1", + "description" : "This processor is going to restrict to publicly visible documents", + "ignore_failure": true, + "query" : { + "term": { + "visibility": "public" + } + } +} +``` + +If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/). + +## Updating a search pipeline + +To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API. + +#### Example request + +The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor: + +```json +PUT /_search/pipeline/my_pipeline +{ + "request_processors": [ + { + "filter_query": { + "tag": "tag1", + "description": "This processor returns only publicly visible documents", + "query": { + "term": { + "visibility": "public" + } + } + } + } + ], + "response_processors": [ + { + "rename_field": { + "field": "message", + "target_field": "notification" + } + } + ] +} +``` +{% include copy-curl.html %} + +## Search pipeline versions + +When creating your pipeline, you can specify a version for it in the `version` parameter: + +```json +PUT _search/pipeline/my_pipeline +{ + "version": 1234, + "request_processors": [ + { + "script": { + "source": """ + if (ctx._source['size'] > 100) { + ctx._source['explain'] = false; + } + """ + } + } + ] +} +``` +{% include copy-curl.html %} + +The version is provided in all subsequent responses to `get pipeline` requests: + +```json +GET _search/pipeline/my_pipeline +``` + +The response contains the pipeline version: + +
+ + Response + + {: .text-delta} + +```json +{ + "my_pipeline": { + "version": 1234, + "request_processors": [ + { + "script": { + "source": """ + if (ctx._source['size'] > 100) { + ctx._source['explain'] = false; + } + """ + } + } + ] + } +} +``` +
diff --git a/_search-plugins/search-pipelines/filter-query-processor.md b/_search-plugins/search-pipelines/filter-query-processor.md index 1fe396eb20..e9b175696a 100644 --- a/_search-plugins/search-pipelines/filter-query-processor.md +++ b/_search-plugins/search-pipelines/filter-query-processor.md @@ -20,7 +20,7 @@ Field | Data type | Description `query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index a9ff3cd18e..7d611e8e9c 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -29,13 +29,10 @@ Both request and response processing for the pipeline are performed on the coord To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/). -## Creating a search pipeline -Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type. +## Example -#### Example request - -The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`: +To create a search pipeline, send a request to the search pipeline endpoint, specifying the ordered list of processors, which will be applied sequentially: ```json PUT /_search/pipeline/my_pipeline @@ -65,26 +62,7 @@ PUT /_search/pipeline/my_pipeline ``` {% include copy-curl.html %} -### Ignoring processor failures - -By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline: - -```json -"filter_query" : { - "tag" : "tag1", - "description" : "This processor is going to restrict to publicly visible documents", - "ignore_failure": true, - "query" : { - "term": { - "visibility": "public" - } - } -} -``` - -If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics). - -## Using search pipelines +For more information about creating and updating a search pipeline, see [Creating a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/). To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter: @@ -148,98 +126,6 @@ GET /_search/pipeline/my* ``` {% include copy-curl.html %} -## Updating a search pipeline - -To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API. - -#### Example request - -The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor: - -```json -PUT /_search/pipeline/my_pipeline -{ - "request_processors": [ - { - "filter_query": { - "tag": "tag1", - "description": "This processor returns only publicly visible documents", - "query": { - "term": { - "visibility": "public" - } - } - } - } - ], - "response_processors": [ - { - "rename_field": { - "field": "message", - "target_field": "notification" - } - } - ] -} -``` -{% include copy-curl.html %} - -## Search pipeline versions - -When creating your pipeline, you can specify a version for it in the `version` parameter: - -```json -PUT _search/pipeline/my_pipeline -{ - "version": 1234, - "request_processors": [ - { - "script": { - "source": """ - if (ctx._source['size'] > 100) { - ctx._source['explain'] = false; - } - """ - } - } - ] -} -``` -{% include copy-curl.html %} - -The version is provided in all subsequent responses to `get pipeline` requests: - -```json -GET _search/pipeline/my_pipeline -``` - -The response contains the pipeline version: - -
- - Response - - {: .text-delta} - -```json -{ - "my_pipeline": { - "version": 1234, - "request_processors": [ - { - "script": { - "source": """ - if (ctx._source['size'] > 100) { - ctx._source['explain'] = false; - } - """ - } - } - ] - } -} -``` -
## Search pipeline metrics diff --git a/_search-plugins/search-pipelines/personalize-search-ranking.md b/_search-plugins/search-pipelines/personalize-search-ranking.md index 64b2ef2017..b73ebb7476 100644 --- a/_search-plugins/search-pipelines/personalize-search-ranking.md +++ b/_search-plugins/search-pipelines/personalize-search-ranking.md @@ -27,7 +27,7 @@ Field | Data type | Description `iam_role_arn` | String | If you use multiple roles to restrict permissions for different groups of users in your organization, specify the ARN of the role that has permission to access Amazon Personalize. If you use only the AWS credentials in your OpenSearch keystore, you can omit this field. Optional. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/rename-field-processor.md b/_search-plugins/search-pipelines/rename-field-processor.md index 5ad8a367dc..0068e3b747 100644 --- a/_search-plugins/search-pipelines/rename-field-processor.md +++ b/_search-plugins/search-pipelines/rename-field-processor.md @@ -21,7 +21,7 @@ Field | Data type | Description `target_field` | String | The new field name. Required. `tag` | String | The processor's identifier. `description` | String | A description of the processor. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/script-processor.md b/_search-plugins/search-pipelines/script-processor.md index f4bc7d43db..c007d6e6d6 100644 --- a/_search-plugins/search-pipelines/script-processor.md +++ b/_search-plugins/search-pipelines/script-processor.md @@ -34,7 +34,7 @@ Field | Data type | Description `lang` | String | The script language. Optional. Only `painless` is supported. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/search-pipeline-metrics.md b/_search-plugins/search-pipelines/search-pipeline-metrics.md index 840db42238..85d9e8e7c8 100644 --- a/_search-plugins/search-pipelines/search-pipeline-metrics.md +++ b/_search-plugins/search-pipelines/search-pipeline-metrics.md @@ -1,7 +1,7 @@ --- layout: default title: Search pipeline metrics -nav_order: 40 +nav_order: 50 has_children: false parent: Search pipelines grand_parent: Search diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 52711d3705..4c9e7dfba0 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -1,7 +1,7 @@ --- layout: default title: Search processors -nav_order: 50 +nav_order: 40 has_children: true parent: Search pipelines grand_parent: Search diff --git a/_search-plugins/search-pipelines/using-search-pipeline.md b/_search-plugins/search-pipelines/using-search-pipeline.md index e01d1dad51..7b721ecdb5 100644 --- a/_search-plugins/search-pipelines/using-search-pipeline.md +++ b/_search-plugins/search-pipelines/using-search-pipeline.md @@ -17,7 +17,7 @@ You can use a search pipeline in the following ways: ## Specifying an existing search pipeline for a request -After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index#creating-a-search-pipeline), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter: +After you [create a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/), you can use the pipeline with a query by specifying the pipeline name in the `search_pipeline` query parameter: ```json GET /my_index/_search?search_pipeline=my_pipeline From 543ce7003e2196da59efbe99542634a3878d304a Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Fri, 29 Sep 2023 23:01:55 -0400 Subject: [PATCH 05/10] Refactor retrieving search pipelines Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 7 +-- _search-plugins/search-pipelines/index.md | 55 +---------------- .../retrieving-search-pipeline.md | 61 +++++++++++++++++++ 3 files changed, 65 insertions(+), 58 deletions(-) create mode 100644 _search-plugins/search-pipelines/retrieving-search-pipeline.md diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index 4ac18521da..a5b79f33f7 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -10,12 +10,9 @@ redirect_from: # Neural search -The Neural Search plugin is Generally Available as of OpenSearch 2.9. -{: .note} +Neural search transforms text into vectors and facilitates vector search both at ingestion time and at search time. During ingestion, neural search transforms document text into vector embeddings and indexes both the text and its vector embeddings in a k-NN index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results. -Neural search facilitates vector search during ingestion and search. During ingestion, neural search transforms text into vector embeddings and indexes a document containing both the text and its vector embeddings in a k-NN index. When you use a neural query during search, neural search converts the query text into vector embeddings, uses vector search to compare the query and document embeddings, and returns the closest results. - -The Neural Search plugin comes bundled with OpenSearch. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins). +The Neural Search plugin comes bundled with OpenSearch and is generally available as of OpenSearch 2.9. For more information, see [Managing plugins]({{site.url}}{{site.baseurl}}/opensearch/install/plugins#managing-plugins). ## Using neural search diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index 7d611e8e9c..088e398a04 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -32,7 +32,7 @@ To learn more about available search processors, see [Search processors]({{site. ## Example -To create a search pipeline, send a request to the search pipeline endpoint, specifying the ordered list of processors, which will be applied sequentially: +To create a search pipeline, send a request to the search pipeline endpoint, specifying an ordered list of processors, which will be applied sequentially: ```json PUT /_search/pipeline/my_pipeline @@ -73,58 +73,7 @@ GET /my_index/_search?search_pipeline=my_pipeline Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/). -## Retrieving search pipelines - -To retrieve the details of an existing search pipeline, use the Search Pipeline API. - -To view all search pipelines, use the following request: - -```json -GET /_search/pipeline -``` -{% include copy-curl.html %} - -The response contains the pipeline that you set up in the previous section: -
- - Response - - {: .text-delta} - -```json -{ - "my_pipeline" : { - "request_processors" : [ - { - "filter_query" : { - "tag" : "tag1", - "description" : "This processor is going to restrict to publicly visible documents", - "query" : { - "term" : { - "visibility" : "public" - } - } - } - } - ] - } -} -``` -
- -To view a particular pipeline, specify the pipeline name as a path parameter: - -```json -GET /_search/pipeline/my_pipeline -``` -{% include copy-curl.html %} - -You can also use wildcard patterns to view a subset of pipelines, for example: - -```json -GET /_search/pipeline/my* -``` -{% include copy-curl.html %} +To learn about retrieving details for an existing search pipeline, see [Retrieving search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/retrieving-search-pipeline/). ## Search pipeline metrics diff --git a/_search-plugins/search-pipelines/retrieving-search-pipeline.md b/_search-plugins/search-pipelines/retrieving-search-pipeline.md new file mode 100644 index 0000000000..41b213e5ff --- /dev/null +++ b/_search-plugins/search-pipelines/retrieving-search-pipeline.md @@ -0,0 +1,61 @@ +--- +layout: default +title: Retrieving search pipelines +nav_order: 25 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Retrieving search pipelines + +To retrieve the details of an existing search pipeline, use the Search Pipeline API. + +To view all search pipelines, use the following request: + +```json +GET /_search/pipeline +``` +{% include copy-curl.html %} + +The response contains the pipeline that you set up in the previous section: +
+ + Response + + {: .text-delta} + +```json +{ + "my_pipeline" : { + "request_processors" : [ + { + "filter_query" : { + "tag" : "tag1", + "description" : "This processor is going to restrict to publicly visible documents", + "query" : { + "term" : { + "visibility" : "public" + } + } + } + } + ] + } +} +``` +
+ +To view a particular pipeline, specify the pipeline name as a path parameter: + +```json +GET /_search/pipeline/my_pipeline +``` +{% include copy-curl.html %} + +You can also use wildcard patterns to view a subset of pipelines, for example: + +```json +GET /_search/pipeline/my* +``` +{% include copy-curl.html %} From 225446b12bfbf65b1758981e21cc7ac335a3d01b Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 3 Oct 2023 10:06:28 -0400 Subject: [PATCH 06/10] Add working examples Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 83 ++++++++++++------- .../search-pipelines/neural-query-enricher.md | 4 +- 2 files changed, 55 insertions(+), 32 deletions(-) diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index a5b79f33f7..94dfb1ee2b 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -45,7 +45,7 @@ In the pipeline request body, you must set up a `text_embedding` processor, the ```json "text_embedding": { - "model_id": "bxoDJ7IHGM14UqatWc_2j", + "model_id": "", "field_map": { "": "" } @@ -65,15 +65,15 @@ Field | Data type | Description The following example request creates an ingest pipeline where the text from `passage_text` will be converted into text embeddings and the embeddings will be stored in `passage_embedding`: ```json -PUT _ingest/pipeline/nlp-pipeline +PUT /_ingest/pipeline/nlp-ingest-pipeline { - "description": "An example neural search pipeline", - "processors" : [ + "description": "An NLP ingest pipeline", + "processors": [ { "text_embedding": { - "model_id": "bxoDJ7IHGM14UqatWc_2j", + "model_id": "bQ1J8ooBpBj3wT4HVUsb", "field_map": { - "passage_text": "passage_embedding" + "passage_text": "passage_embedding" } } } @@ -91,21 +91,24 @@ In order to use the text embedding processor defined in your pipelines, create a The following example request creates a k-NN index that is set up with a default ingest pipeline: ```json -PUT /my-nlp-index-1 +PUT /my-nlp-index { "settings": { "index.knn": true, - "default_pipeline": "nlp-pipeline" + "default_pipeline": "nlp-ingest-pipeline" }, "mappings": { "properties": { + "id": { + "type": "text" + }, "passage_embedding": { "type": "knn_vector", "dimension": 768, "method": { - "name": "hnsw", - "space_type": "l2", "engine": "lucene", + "space_type": "l2", + "name": "hnsw", "parameters": {} } }, @@ -125,9 +128,19 @@ For more information about creating a k-NN index and the methods it supports, se To ingest documents into the index created in the previous section, send a POST request for each document: ```json -POST /my-nlp-index-1/_doc/1 +PUT /my-nlp-index/_doc/1 { - "passage_text": "Hello world" + "passage_text": "Hello world", + "id": "s1" +} +``` +{% include copy-curl.html %} + +```json +PUT /my-nlp-index/_doc/2 +{ + "passage_text": "Hi planet", + "id": "s2" } ``` {% include copy-curl.html %} @@ -145,8 +158,8 @@ Include the following request fields under the `neural` query clause: ```json "neural": { "": { - "query_text": "Hello world", - "model_id": "bxoDJ7IHGM14UqatWc_2j", + "query_text": "", + "model_id": "", "k": 100 } } @@ -165,16 +178,17 @@ Field | Data type | Description The following example request uses a Boolean query to combine a filter clause and two query clauses---a neural query and a `match` query. The `script_score` query assigns custom weights to the query clauses: ```json -GET /my-nlp-index-1/_search +GET /my-nlp-index/_search { + "_source": { + "excludes": [ + "passage_embedding" + ] + }, "query": { "bool": { - "filter": { - "range": { - "distance": { - "lte": 20 - } - } + "filter": { + "wildcard": { "id": "*1" } }, "should": [ { @@ -183,7 +197,7 @@ GET /my-nlp-index-1/_search "neural": { "passage_embedding": { "query_text": "Hi world", - "model_id": "bxoDJ7IHGM14UqatWc_2j", + "model_id": "bQ1J8ooBpBj3wT4HVUsb", "k": 100 } } @@ -224,7 +238,7 @@ PUT /_search/pipeline/default_model_pipeline "request_processors": [ { "neural_query_enricher" : { - "default_model_id": "u5j0qYoBMtvQlfhaxOsa", + "default_model_id": "bQ1J8ooBpBj3wT4HVUsb", "neural_field_default_id": { "my_field_1": "uZj0qYoBMtvQlfhaYeud", "my_field_2": "upj0qYoBMtvQlfhaZOuM" @@ -239,7 +253,7 @@ PUT /_search/pipeline/default_model_pipeline Then set the default model for your index: ```json -PUT /my-nlp-index-1/_settings +PUT /my-nlp-index/_settings { "index.search.default_pipeline" : "default_model_pipeline" } @@ -249,12 +263,21 @@ PUT /my-nlp-index-1/_settings You can now omit the model ID when searching: ```json -"query": { - "neural": { - "passage_embedding": { - "query_text": "Hi world", - "k": 100 +GET /my-nlp-index/_search +{ + "_source": { + "excludes": [ + "passage_embedding" + ] + }, + "query": { + "neural": { + "passage_embedding": { + "query_text": "Hi world", + "k": 100 + } } } } -``` \ No newline at end of file +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index 559726bf7b..bc35960363 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -17,8 +17,8 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- -`default_model_id` | String | The model ID of the default model for an index. Optional. -`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. +`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. +`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. From 1420ff1ddfa972f2c5441da66f1e50204878910d Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 3 Oct 2023 13:21:18 -0400 Subject: [PATCH 07/10] Implement tech review comments Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 4 ++-- _search-plugins/search-pipelines/neural-query-enricher.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index 94dfb1ee2b..c17e8583f2 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -187,7 +187,7 @@ GET /my-nlp-index/_search }, "query": { "bool": { - "filter": { + "filter": { "wildcard": { "id": "*1" } }, "should": [ @@ -230,7 +230,7 @@ GET /my-nlp-index/_search To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. -First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map: +First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: ```json PUT /_search/pipeline/default_model_pipeline diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index bc35960363..928827452a 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -17,8 +17,8 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- -`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. -`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. +`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. +`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. From e18b296e697f98582dd2a8c4cd49bc37169fd600 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Tue, 3 Oct 2023 15:37:04 -0400 Subject: [PATCH 08/10] Add responses to documentation Signed-off-by: Fanit Kolchina --- _search-plugins/neural-search.md | 77 +++++++++++++++++++++++++++++++- 1 file changed, 76 insertions(+), 1 deletion(-) diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index c17e8583f2..972c4b4d29 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -226,6 +226,39 @@ GET /my-nlp-index/_search ``` {% include copy-curl.html %} +The response contains the matching document: + +```json +{ + "took" : 36, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.2251667, + "hits" : [ + { + "_index" : "my-nlp-index", + "_id" : "1", + "_score" : 1.2251667, + "_source" : { + "passage_text" : "Hello world", + "id" : "s1" + } + } + ] + } +} +``` + ### Setting a default model on an index or field To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. @@ -280,4 +313,46 @@ GET /my-nlp-index/_search } } ``` -{% include copy-curl.html %} \ No newline at end of file +{% include copy-curl.html %} + +The response contains both documents: + +```json +{ + "took" : 41, + "timed_out" : false, + "_shards" : { + "total" : 1, + "successful" : 1, + "skipped" : 0, + "failed" : 0 + }, + "hits" : { + "total" : { + "value" : 2, + "relation" : "eq" + }, + "max_score" : 1.22762, + "hits" : [ + { + "_index" : "my-nlp-index", + "_id" : "2", + "_score" : 1.22762, + "_source" : { + "passage_text" : "Hi planet", + "id" : "s2" + } + }, + { + "_index" : "my-nlp-index", + "_id" : "1", + "_score" : 1.2251667, + "_source" : { + "passage_text" : "Hello world", + "id" : "s1" + } + } + ] + } +} +``` \ No newline at end of file From 6e974ab2f206e2e64de7460af4a5ce267a415f15 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 4 Oct 2023 11:52:02 -0400 Subject: [PATCH 09/10] Update _search-plugins/search-pipelines/neural-query-enricher.md Co-authored-by: Melissa Vagi Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/search-pipelines/neural-query-enricher.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index 928827452a..0bd13fee49 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -24,7 +24,7 @@ Field | Data type | Description ## Example -The following request creates a search pipeline with a `neural_query_enricher` request processor. The processor sets a default model ID at index level and provides different default model IDs for two specific fields in the index: +The following request creates a search pipeline with a `neural_query_enricher` search request processor. The processor sets a default model ID at index level and provides different default model IDs for two specific fields in the index: ```json PUT /_search/pipeline/default_model_pipeline From 1cc6a4b8cc32e0dcba511423fabea0d6e4fd4bd1 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 4 Oct 2023 13:34:46 -0400 Subject: [PATCH 10/10] Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _search-plugins/neural-search.md | 8 ++++---- .../search-pipelines/creating-search-pipeline.md | 4 ++-- .../search-pipelines/filter-query-processor.md | 2 +- _search-plugins/search-pipelines/index.md | 2 +- .../search-pipelines/neural-query-enricher.md | 12 ++++++------ .../search-pipelines/rename-field-processor.md | 2 +- _search-plugins/search-pipelines/script-processor.md | 2 +- .../search-pipelines/search-processors.md | 2 +- 8 files changed, 17 insertions(+), 17 deletions(-) diff --git a/_search-plugins/neural-search.md b/_search-plugins/neural-search.md index 972c4b4d29..2b2c1e1c21 100644 --- a/_search-plugins/neural-search.md +++ b/_search-plugins/neural-search.md @@ -41,7 +41,7 @@ Use `pipeline_name` to create a name for your neural search ingest pipeline. ### Request fields -In the pipeline request body, you must set up a `text_embedding` processor, the only processor supported by neural search, which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields into which to store the embeddings: +In the pipeline request body, you must set up a `text_embedding` processor (the only processor supported by neural search), which will convert the text in a document field to vector embeddings. The processor's `field_map` determines the input fields from which to generate vector embeddings and the output fields in which to store the embeddings: ```json "text_embedding": { @@ -125,7 +125,7 @@ For more information about creating a k-NN index and the methods it supports, se ## Step 3: Ingest documents into the index -To ingest documents into the index created in the previous section, send a POST request for each document: +To ingest documents into the index created in the previous step, send a POST request for each document: ```json PUT /my-nlp-index/_doc/1 @@ -171,7 +171,7 @@ Field | Data type | Description :--- | :--- | :--- `query_text` | String | The query text from which to generate text embeddings. `model_id` | String | The ID of the model that will be used to generate text embeddings from the query text. The model must be indexed in OpenSearch before it can be used in neural search. -`k` | Integer | The number of results the k-NN search returns. +`k` | Integer | The number of results returned by the k-NN search. ### Example request @@ -263,7 +263,7 @@ The response contains the matching document: To eliminate passing the model ID with each neural query request, you can set a default model on a k-NN index or a field. -First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model on an index, provide the model ID in the `default_model_id` parameter. To set a default model on a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: +First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence: ```json PUT /_search/pipeline/default_model_pipeline diff --git a/_search-plugins/search-pipelines/creating-search-pipeline.md b/_search-plugins/search-pipelines/creating-search-pipeline.md index 92fdae23b7..c33a763e15 100644 --- a/_search-plugins/search-pipelines/creating-search-pipeline.md +++ b/_search-plugins/search-pipelines/creating-search-pipeline.md @@ -9,7 +9,7 @@ grand_parent: Search # Creating a search pipeline -Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type. +Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful when debugging error messages, especially if you add multiple processors of the same type. #### Example request @@ -68,7 +68,7 @@ To update a search pipeline dynamically, replace the search pipeline using the S #### Example request -The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor: +The following example request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor: ```json PUT /_search/pipeline/my_pipeline diff --git a/_search-plugins/search-pipelines/filter-query-processor.md b/_search-plugins/search-pipelines/filter-query-processor.md index e9b175696a..6c68821a27 100644 --- a/_search-plugins/search-pipelines/filter-query-processor.md +++ b/_search-plugins/search-pipelines/filter-query-processor.md @@ -20,7 +20,7 @@ Field | Data type | Description `query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/index.md b/_search-plugins/search-pipelines/index.md index 088e398a04..d4edc289d3 100644 --- a/_search-plugins/search-pipelines/index.md +++ b/_search-plugins/search-pipelines/index.md @@ -32,7 +32,7 @@ To learn more about available search processors, see [Search processors]({{site. ## Example -To create a search pipeline, send a request to the search pipeline endpoint, specifying an ordered list of processors, which will be applied sequentially: +To create a search pipeline, send a request to the search pipeline endpoint specifying an ordered list of processors, which will be applied sequentially: ```json PUT /_search/pipeline/my_pipeline diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md index 0bd13fee49..610b050342 100644 --- a/_search-plugins/search-pipelines/neural-query-enricher.md +++ b/_search-plugins/search-pipelines/neural-query-enricher.md @@ -1,15 +1,15 @@ --- layout: default -title: Neural query enrich +title: Neural query enricher nav_order: 12 has_children: false parent: Search processors grand_parent: Search pipelines --- -# Neural query enrich processor +# Neural query enricher processor -The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/). +The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [ML Framework]({{site.url}}{{site.baseurl}}/ml-commons-plugin/ml-framework/). ## Request fields @@ -17,14 +17,14 @@ The following table lists all available request fields. Field | Data type | Description :--- | :--- | :--- -`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. -`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one of `default_model_id` and `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. +`default_model_id` | String | The model ID of the default model for an index. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. +`neural_field_default_id` | Object | A map of key-value pairs representing document field names and their associated default model IDs. Optional. You must specify at least one `default_model_id` or `neural_field_default_id`. If both are provided, `neural_field_default_id` takes precedence. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. ## Example -The following request creates a search pipeline with a `neural_query_enricher` search request processor. The processor sets a default model ID at index level and provides different default model IDs for two specific fields in the index: +The following example request creates a search pipeline with a `neural_query_enricher` search request processor. The processor sets a default model ID at the index level and provides different default model IDs for two specific fields in the index: ```json PUT /_search/pipeline/default_model_pipeline diff --git a/_search-plugins/search-pipelines/rename-field-processor.md b/_search-plugins/search-pipelines/rename-field-processor.md index 0068e3b747..cb01125df5 100644 --- a/_search-plugins/search-pipelines/rename-field-processor.md +++ b/_search-plugins/search-pipelines/rename-field-processor.md @@ -21,7 +21,7 @@ Field | Data type | Description `target_field` | String | The new field name. Required. `tag` | String | The processor's identifier. `description` | String | A description of the processor. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/script-processor.md b/_search-plugins/search-pipelines/script-processor.md index c007d6e6d6..e1e629e398 100644 --- a/_search-plugins/search-pipelines/script-processor.md +++ b/_search-plugins/search-pipelines/script-processor.md @@ -34,7 +34,7 @@ Field | Data type | Description `lang` | String | The script language. Optional. Only `painless` is supported. `tag` | String | The processor's identifier. Optional. `description` | String | A description of the processor. Optional. -`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. +`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. ## Example diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 4c9e7dfba0..808ddf5457 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -24,7 +24,7 @@ The following table lists all supported search request processors. Processor | Description | Earliest available version :--- | :--- | :--- [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8 -[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model at index or field level for neural search. | 2.11 +[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11 [`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8 ## Search response processors