Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about setting a default model for neural search #5121

Merged
merged 11 commits into from
Oct 4, 2023
340 changes: 248 additions & 92 deletions _search-plugins/neural-search.md

Large diffs are not rendered by default.

156 changes: 156 additions & 0 deletions _search-plugins/search-pipelines/creating-search-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
---
layout: default
title: Creating a search pipeline
nav_order: 10
has_children: false
parent: Search pipelines
grand_parent: Search
---

# Creating a search pipeline

Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

#### Example request

The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:

```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}

## Ignoring processor failures

By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:

```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```

If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-pipeline-metrics/).

## Updating a search pipeline

To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include a link to the search pipeline API documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's really not a separate API page for this and I think in this case I'm just referring to the /_search/pipeline endpoint so it should be understandable.


#### Example request

The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query": {
"tag": "tag1",
"description": "This processor returns only publicly visible documents",
"query": {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}

## Search pipeline versions

When creating your pipeline, you can specify a version for it in the `version` parameter:

```json
PUT _search/pipeline/my_pipeline
{
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
```
{% include copy-curl.html %}

The version is provided in all subsequent responses to `get pipeline` requests:

```json
GET _search/pipeline/my_pipeline
```

The response contains the pipeline version:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"my_pipeline": {
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
}
```
</details>
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Field | Data type | Description
`query` | Object | A query in query domain-specific language (DSL). For a list of OpenSearch query types, see [Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/). Required.
`tag` | String | The processor's identifier. Optional.
`description` | String | A description of the processor. Optional.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
`ignore_failure` | Boolean | If `true`, OpenSearch [ignores a failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Example

Expand Down
173 changes: 4 additions & 169 deletions _search-plugins/search-pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,10 @@ Both request and response processing for the pipeline are performed on the coord

To learn more about available search processors, see [Search processors]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/search-processors/).

## Creating a search pipeline

Search pipelines are stored in the cluster state. To create a search pipeline, you must configure an ordered list of processors in your OpenSearch cluster. You can have more than one processor of the same type in the pipeline. Each processor has a `tag` identifier that distinguishes it from the others. Tagging a specific processor can be helpful for debugging error messages, especially if you add multiple processors of the same type.
## Example

#### Example request

The following request creates a search pipeline with a `filter_query` request processor that uses a term query to return only public messages and a response processor that renames the field `message` to `notification`:
To create a search pipeline, send a request to the search pipeline endpoint, specifying an ordered list of processors, which will be applied sequentially:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job writing this sentence succinctly. Your wording is perfect.

kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
PUT /_search/pipeline/my_pipeline
Expand Down Expand Up @@ -65,26 +62,7 @@ PUT /_search/pipeline/my_pipeline
```
{% include copy-curl.html %}

### Ignoring processor failures

By default, a search pipeline stops if one of its processors fails. If you want the pipeline to continue running when a processor fails, you can set the `ignore_failure` parameter for that processor to `true` when creating the pipeline:

```json
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"ignore_failure": true,
"query" : {
"term": {
"visibility": "public"
}
}
}
```

If the processor fails, OpenSearch logs the failure and continues to run all remaining processors in the search pipeline. To check whether there were any failures, you can use [search pipeline metrics](#search-pipeline-metrics).

## Using search pipelines
For more information about creating and updating a search pipeline, see [Creating a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/).

To use a pipeline with a query, specify the pipeline name in the `search_pipeline` query parameter:

Expand All @@ -95,151 +73,8 @@ GET /my_index/_search?search_pipeline=my_pipeline

Alternatively, you can use a temporary pipeline with a request or set a default pipeline for an index. To learn more, see [Using a search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/using-search-pipeline/).

## Retrieving search pipelines

To retrieve the details of an existing search pipeline, use the Search Pipeline API.

To view all search pipelines, use the following request:

```json
GET /_search/pipeline
```
{% include copy-curl.html %}

The response contains the pipeline that you set up in the previous section:
<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"my_pipeline" : {
"request_processors" : [
{
"filter_query" : {
"tag" : "tag1",
"description" : "This processor is going to restrict to publicly visible documents",
"query" : {
"term" : {
"visibility" : "public"
}
}
}
}
]
}
}
```
</details>

To view a particular pipeline, specify the pipeline name as a path parameter:

```json
GET /_search/pipeline/my_pipeline
```
{% include copy-curl.html %}

You can also use wildcard patterns to view a subset of pipelines, for example:

```json
GET /_search/pipeline/my*
```
{% include copy-curl.html %}

## Updating a search pipeline

To update a search pipeline dynamically, replace the search pipeline using the Search Pipeline API.

#### Example request

The following request upserts `my_pipeline` by adding a `filter_query` request processor and a `rename_field` response processor:

```json
PUT /_search/pipeline/my_pipeline
{
"request_processors": [
{
"filter_query": {
"tag": "tag1",
"description": "This processor returns only publicly visible documents",
"query": {
"term": {
"visibility": "public"
}
}
}
}
],
"response_processors": [
{
"rename_field": {
"field": "message",
"target_field": "notification"
}
}
]
}
```
{% include copy-curl.html %}

## Search pipeline versions

When creating your pipeline, you can specify a version for it in the `version` parameter:
To learn about retrieving details for an existing search pipeline, see [Retrieving search pipelines]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/retrieving-search-pipeline/).

```json
PUT _search/pipeline/my_pipeline
{
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
```
{% include copy-curl.html %}

The version is provided in all subsequent responses to `get pipeline` requests:

```json
GET _search/pipeline/my_pipeline
```

The response contains the pipeline version:

<details open markdown="block">
<summary>
Response
</summary>
{: .text-delta}

```json
{
"my_pipeline": {
"version": 1234,
"request_processors": [
{
"script": {
"source": """
if (ctx._source['size'] > 100) {
ctx._source['explain'] = false;
}
"""
}
}
]
}
}
```
</details>

## Search pipeline metrics

Expand Down
Loading