From 8f60f3f35fa76b62e241d4b90ef667d6386b8680 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 22 Dec 2023 14:12:06 -0700 Subject: [PATCH 1/9] Add trim processor documentation Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 86 ++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 _ingest-pipelines/processors/trim.md diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md new file mode 100644 index 0000000000..1705851235 --- /dev/null +++ b/_ingest-pipelines/processors/trim.md @@ -0,0 +1,86 @@ +--- +layout: default +title: Trim +parent: Ingest processors +nav_order: 300 +--- + +# Trim processor + +The `trim` processor is used to . + +The following is the syntax for the `trim` processor: + +```json + +``` +{% include copy-curl.html %} + +## Configuration parameters + +The following table lists the required and optional parameters for the `trim` processor. + +Parameter | Required/Optional | Description | +|-----------|-----------|-----------| + + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +### Step 1: Create a pipeline + +The following query creates a pipeline, named , that uses the `trim` processor to : + +```json + +``` +{% include copy-curl.html %} + +### Step 2 (Optional): Test the pipeline + +It is recommended that you test your pipeline before you ingest documents. +{: .tip} + +To test the pipeline, run the following query: + +```json + +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms that the pipeline is working as expected: + +```json + +``` + +### Step 3: Ingest a document + +The following query ingests a document into an index named `testindex1`: + +```json + +``` +{% include copy-curl.html %} + +#### Response + +The request indexes the document into the index and will index all documents with . + +```json + +``` + +### Step 4 (Optional): Retrieve the document + +To retrieve the document, run the following query: + +```json + +``` +{% include copy-curl.html %} + + \ No newline at end of file From d9be1b006acbc94a3d6959fd5bbb0a4ca136b5ae Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 26 Apr 2024 15:28:44 -0600 Subject: [PATCH 2/9] Writing and editing Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 104 +++++++++++++++++++++++---- 1 file changed, 92 insertions(+), 12 deletions(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index 1705851235..e738132090 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -7,12 +7,17 @@ nav_order: 300 # Trim processor -The `trim` processor is used to . +The `trim` processor is used to remove leading and trailing white space characters from a specified field. The following is the syntax for the `trim` processor: ```json - +{ + "trim": { + "field": "field_to_trim", + "target_field": "trimmed_field" + } +} ``` {% include copy-curl.html %} @@ -22,7 +27,16 @@ The following table lists the required and optional parameters for the `trim` pr Parameter | Required/Optional | Description | |-----------|-----------|-----------| - +`field` | Required | The field containing the text to be trimmed. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). +`target_field` | Required | The field where the trimmed text is stored. If not specified, the trimmed text is stored in the same field as the original text. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). +`ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified +field. If set to `true`, the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. +`override_target` | Optional | Determines what happens when `target_field` exists in the document. If set to `true`, the processor overwrites the existing `target_field` value with the new value. If set to `false`, the existing value remains and the processor does not overwrite it. Default is `false`. +`description` | Optional | A brief description of the processor. +`if` | Optional | A condition for running the processor. +`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, failures are ignored. Default is `false`. +`on_failure` | Optional | A list of processors to run if the processor fails. +`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. ## Using the processor @@ -30,10 +44,21 @@ Follow these steps to use the processor in a pipeline. ### Step 1: Create a pipeline -The following query creates a pipeline, named , that uses the `trim` processor to : +The following query creates a pipeline named `trim_pipeline` that uses the `trim` processor to remove leading and trailing white space from the `raw_text` field and store the trimmed text in the `trimmed_text` field: ```json - +PUT _ingest/pipeline/trim_pipeline +{ + "description": "Trim leading and trailing white space", + "processors": [ + { + "trim": { + "field": "raw_text", + "target_field": "trimmed_text" + } + } + ] +} ``` {% include copy-curl.html %} @@ -45,7 +70,16 @@ It is recommended that you test your pipeline before you ingest documents. To test the pipeline, run the following query: ```json - +POST _ingest/pipeline/trim_pipeline/_simulate +{ + "docs": [ + { + "_source": { + "raw_text": " Hello, world! " + } + } + ] +} ``` {% include copy-curl.html %} @@ -54,7 +88,23 @@ To test the pipeline, run the following query: The following example response confirms that the pipeline is working as expected: ```json - +{ + "docs": [ + { + "doc": { + "_index": "_index", + "_id": "_id", + "_source": { + "raw_text": " Hello, world! ", + "trimmed_text": "Hello, world!" + }, + "_ingest": { + "timestamp": "2024-04-26T20:58:17.418006805Z" + } + } + } + ] +} ``` ### Step 3: Ingest a document @@ -62,25 +112,55 @@ The following example response confirms that the pipeline is working as expected The following query ingests a document into an index named `testindex1`: ```json - +PUT testindex1/_doc/1?pipeline=trim_pipeline +{ + "message": " This is a test document. " +} ``` {% include copy-curl.html %} #### Response -The request indexes the document into the index and will index all documents with . +The request indexes the document into the index `testindex1` and indexes all documents with the `raw_text` field, which is processed by the `trim_pipeline` to populate the `trimmed_text` field. ```json - + "_index": "testindex1", + "_id": "1", + "_version": 68, + "result": "updated", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 70, + "_primary_term": 47 +} ``` +{% include copy-curl.html %} ### Step 4 (Optional): Retrieve the document To retrieve the document, run the following query: ```json - +GET testindex1/_doc/1 ``` {% include copy-curl.html %} - \ No newline at end of file +The response should include the `trimmed_text` field with the leading and trailing white space removed: + +```json +{ + "_index": "testindex1", + "_id": "1", + "_version": 69, + "_seq_no": 71, + "_primary_term": 47, + "found": true, + "_source": { + "raw_text": " This is a test document. ", + "trimmed_text": "This is a test document." + } +} +``` From 179e65abe8cef086d5e70c58657f1eb80b299561 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 26 Apr 2024 15:49:19 -0600 Subject: [PATCH 3/9] Writing and editing Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index e738132090..46b8d0d673 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -121,7 +121,7 @@ PUT testindex1/_doc/1?pipeline=trim_pipeline #### Response -The request indexes the document into the index `testindex1` and indexes all documents with the `raw_text` field, which is processed by the `trim_pipeline` to populate the `trimmed_text` field. +The request indexes the document into the index `testindex1` and indexes all documents with the `raw_text` field, which is processed by the `trim_pipeline`, to populate the `trimmed_text` field. ```json "_index": "testindex1", From 7cc8623d2aed2fb48753a469189166bf9ad35b86 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 7 Jun 2024 12:53:24 -0600 Subject: [PATCH 4/9] Update trim.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index 46b8d0d673..d00f337370 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -27,11 +27,10 @@ The following table lists the required and optional parameters for the `trim` pr Parameter | Required/Optional | Description | |-----------|-----------|-----------| -`field` | Required | The field containing the text to be trimmed. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). -`target_field` | Required | The field where the trimmed text is stored. If not specified, the trimmed text is stored in the same field as the original text. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). +`field` | Required | The field containing the text to be trimmed. +`target_field` | Required | The field where the trimmed text is stored. If not specified, then the field is updated in-place. `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. -`override_target` | Optional | Determines what happens when `target_field` exists in the document. If set to `true`, the processor overwrites the existing `target_field` value with the new value. If set to `false`, the existing value remains and the processor does not overwrite it. Default is `false`. `description` | Optional | A brief description of the processor. `if` | Optional | A condition for running the processor. `ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, failures are ignored. Default is `false`. From 883d10c2a4cc885094ccd2a81e2671d3c8d1176f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 10 Jun 2024 08:20:51 -0600 Subject: [PATCH 5/9] Update _ingest-pipelines/processors/trim.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index d00f337370..a5325bcdbd 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -28,7 +28,7 @@ The following table lists the required and optional parameters for the `trim` pr Parameter | Required/Optional | Description | |-----------|-----------|-----------| `field` | Required | The field containing the text to be trimmed. -`target_field` | Required | The field where the trimmed text is stored. If not specified, then the field is updated in-place. +`target_field` | Required | The field in which the trimmed text is stored. If not specified, then the field is updated in-place. `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. `description` | Optional | A brief description of the processor. From d3d6c61a40ee8876b1792905665387da382d6cea Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 10 Jun 2024 08:20:58 -0600 Subject: [PATCH 6/9] Update _ingest-pipelines/processors/trim.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index a5325bcdbd..b7a7a6ef5c 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -30,7 +30,7 @@ Parameter | Required/Optional | Description | `field` | Required | The field containing the text to be trimmed. `target_field` | Required | The field in which the trimmed text is stored. If not specified, then the field is updated in-place. `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified -field. If set to `true`, the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. +field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. `description` | Optional | A brief description of the processor. `if` | Optional | A condition for running the processor. `ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, failures are ignored. Default is `false`. From ccdd1d9199421ca26d486381fbd4dafb6ab2d281 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 10 Jun 2024 08:21:04 -0600 Subject: [PATCH 7/9] Update _ingest-pipelines/processors/trim.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index b7a7a6ef5c..f150af786c 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -33,7 +33,7 @@ Parameter | Required/Optional | Description | field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. `description` | Optional | A brief description of the processor. `if` | Optional | A condition for running the processor. -`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, failures are ignored. Default is `false`. +`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`. `on_failure` | Optional | A list of processors to run if the processor fails. `tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. From e9dc6ec3917a0c6f8a2fdce7aed2138769762369 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 10 Jun 2024 08:21:12 -0600 Subject: [PATCH 8/9] Update _ingest-pipelines/processors/trim.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index f150af786c..7cee19b46f 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -120,7 +120,7 @@ PUT testindex1/_doc/1?pipeline=trim_pipeline #### Response -The request indexes the document into the index `testindex1` and indexes all documents with the `raw_text` field, which is processed by the `trim_pipeline`, to populate the `trimmed_text` field. +The request indexes the document into the index `testindex1` and indexes all documents with the `raw_text` field, which is processed by the `trim_pipeline`, to populate the `trimmed_text` field, as shown in the following response: ```json "_index": "testindex1", From 9ee5722488fb86c20c747a7a2c0cca68f51faac0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 10 Jun 2024 08:21:21 -0600 Subject: [PATCH 9/9] Update _ingest-pipelines/processors/trim.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi --- _ingest-pipelines/processors/trim.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/trim.md b/_ingest-pipelines/processors/trim.md index 7cee19b46f..9c1999aeb2 100644 --- a/_ingest-pipelines/processors/trim.md +++ b/_ingest-pipelines/processors/trim.md @@ -147,7 +147,7 @@ GET testindex1/_doc/1 ``` {% include copy-curl.html %} -The response should include the `trimmed_text` field with the leading and trailing white space removed: +The response includes the `trimmed_text` field with the leading and trailing white space removed: ```json {