forked from opensearch-project/documentation-website
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for new reranking feature in 2.12 (opensearch-proje…
…ct#6368) * Create reranking.md document new reranking feature in 2.12 Signed-off-by: HenryL27 <[email protected]> * Doc review and address comments Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Update _search-plugins/search-pipelines/rerank-processor.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Update _search-plugins/search-pipelines/rerank-processor.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: HenryL27 <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Tianjing Li <[email protected]>
- Loading branch information
1 parent
1fc1517
commit 24bf143
Showing
5 changed files
with
241 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
layout: default | ||
title: Rerank | ||
nav_order: 25 | ||
has_children: false | ||
parent: Search processors | ||
grand_parent: Search pipelines | ||
--- | ||
|
||
# Rerank processor | ||
|
||
The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores. | ||
|
||
## Request fields | ||
|
||
The following table lists all available request fields. | ||
|
||
Field | Data type | Description | ||
:--- | :--- | :--- | ||
`<reranker_type>` | Object | The reranker type provides the rerank processor with static information needed across all reranking calls. Required. | ||
`context` | Object | Provides the rerank processor with information necessary for generating reranking context at query time. | ||
`tag` | String | The processor's identifier. Optional. | ||
`description` | String | A description of the processor. Optional. | ||
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. | ||
|
||
### The `ml_opensearch` reranker type | ||
|
||
The `ml_opensearch` reranker type is designed to work with the cross-encoder model provided by OpenSearch. For this reranker type, specify the following fields. | ||
|
||
Field | Data type | Description | ||
:--- | :--- | :--- | ||
`ml_opensearch` | Object | Provides the rerank processor with model information. Required. | ||
`ml_opensearch.model_id` | String | The model ID for the cross-encoder model. Required. For more information, see [Using ML models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/). | ||
`context.document_fields` | Array | An array of document fields that specifies the fields from which to retrieve context for the cross-encoder model. Required. | ||
|
||
## Example | ||
|
||
The following example demonstrates using a search pipeline with a `rerank` processor. | ||
|
||
### Creating a search pipeline | ||
|
||
The following request creates a search pipeline with a `rerank` response processor: | ||
|
||
```json | ||
PUT /_search/pipeline/rerank_pipeline | ||
{ | ||
"response_processors": [ | ||
{ | ||
"rerank": { | ||
"ml_opensearch": { | ||
"model_id": "gnDIbI0BfUsSoeNT_jAw" | ||
}, | ||
"context": { | ||
"document_fields": [ "title", "text_representation"] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
### Using a search pipeline | ||
|
||
Combine an OpenSearch query with an `ext` object that contains the query context for the large language model (LLM). Provide the `query_text` that will be used to rerank the results: | ||
|
||
```json | ||
POST /_search?search_pipeline=rerank_pipeline | ||
{ | ||
"query": { | ||
"match": { | ||
"text_representation": "Where is Albuquerque?" | ||
} | ||
}, | ||
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text": "Where is Albuquerque?" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Instead of specifying `query_text`, you can provide a full path to the field containing text to use for reranking. For example, if you specify a subfield `query` in the `text_representation` object, specify its path in the `query_text_path` parameter: | ||
|
||
```json | ||
POST /_search?search_pipeline=rerank_pipeline | ||
{ | ||
"query": { | ||
"match": { | ||
"text_representation": { | ||
"query": "Where is Albuquerque?" | ||
} | ||
} | ||
}, | ||
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text_path": "query.match.text_representation.query" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The `query_context` object contains the following fields. | ||
|
||
Field name | Description | ||
:--- | :--- | ||
`query_text` | The natural language text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. | ||
`query_text_path` | The full JSON path to the text of the question that you want to use to rerank the search results. Either `query_text` or `query_text_path` (not both) is required. The maximum number of characters in the path is `1000`. | ||
|
||
For more information about setting up reranking, see [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
118 changes: 118 additions & 0 deletions
118
_search-plugins/search-relevance/reranking-search-results.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
--- | ||
layout: default | ||
title: Reranking search results | ||
parent: Search relevance | ||
has_children: false | ||
nav_order: 60 | ||
--- | ||
|
||
# Reranking search results | ||
Introduced 2.12 | ||
{: .label .label-purple } | ||
|
||
You can rerank search results using a cross-encoder reranker in order to improve search relevance. To implement reranking, you need to configure a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline intercepts search results and applies the [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) to them. The `rerank` processor evaluates the search results and sorts them based on the new scores provided by the cross-encoder model. | ||
|
||
**PREREQUISITE**<br> | ||
Before using hybrid search, you must set up a cross-encoder model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). | ||
{: .note} | ||
|
||
## Running a search with reranking | ||
|
||
To run a search with reranking, follow these steps: | ||
|
||
1. [Configure a search pipeline](#step-1-configure-a-search-pipeline). | ||
1. [Create an index for ingestion](#step-2-create-an-index-for-ingestion). | ||
1. [Ingest documents into the index](#step-3-ingest-documents-into-the-index). | ||
1. [Search using reranking](#step-4-search-using-reranking). | ||
|
||
## Step 1: Configure a search pipeline | ||
|
||
Next, configure a search pipeline with a [`rerank` processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/). | ||
|
||
The following example request creates a search pipeline with an `ml_opensearch` rerank processor. In the request, provide a model ID for the cross-encoder model and the document fields to use as context: | ||
|
||
```json | ||
PUT /_search/pipeline/my_pipeline | ||
{ | ||
"description": "Pipeline for reranking with a cross-encoder", | ||
"response_processors": [ | ||
{ | ||
"rerank": { | ||
"ml_opensearch": { | ||
"model_id": "gnDIbI0BfUsSoeNT_jAw" | ||
}, | ||
"context": { | ||
"document_fields": [ | ||
"passage_text" | ||
] | ||
} | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
For more information about the request fields, see [Request fields]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#request-fields). | ||
|
||
## Step 2: Create an index for ingestion | ||
|
||
In order to use the rerank processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline: | ||
|
||
```json | ||
PUT /my-index | ||
{ | ||
"settings": { | ||
"index.search.default_pipeline" : "my_pipeline" | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"passage_text": { | ||
"type": "text" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Step 3: Ingest documents into the index | ||
|
||
To ingest documents into the index created in the previous step, send the following bulk request: | ||
|
||
```json | ||
POST /_bulk | ||
{ "index": { "_index": "my-index" } } | ||
{ "passage_text" : "I said welcome to them and we entered the house" } | ||
{ "index": { "_index": "my-index" } } | ||
{ "passage_text" : "I feel welcomed in their family" } | ||
{ "index": { "_index": "my-index" } } | ||
{ "passage_text" : "Welcoming gifts are great" } | ||
|
||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Step 4: Search using reranking | ||
|
||
To perform reranking search on your index, use any OpenSearch query and provide an additional `ext.rerank` field: | ||
|
||
```json | ||
POST /my-index/_search | ||
{ | ||
"query": { | ||
"match": { | ||
"passage_text": "how to welcome in family" | ||
} | ||
}, | ||
"ext": { | ||
"rerank": { | ||
"query_context": { | ||
"query_text": "how to welcome in family" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Alternatively, you can provide the full path to the field containing the context. For more information, see [Rerank processor example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/#example). |