diff --git a/docs/plugins/ingest.asciidoc b/docs/plugins/ingest.asciidoc index 0d66f41ef681c..b9717485f6769 100644 --- a/docs/plugins/ingest.asciidoc +++ b/docs/plugins/ingest.asciidoc @@ -13,20 +13,4 @@ The core ingest plugins are: The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library https://tika.apache.org/[Tika]. -<>:: - -The `geoip` processor adds information about the geographical location of IP -addresses, based on data from the Maxmind databases. This processor adds this -information by default under the `geoip` field. The `geoip` processor is no -longer distributed as a plugin, but is now a module distributed by default with -Elasticsearch. See {ref}/geoip-processor.html[GeoIP processor] for more -details. - -<>:: - -A processor that extracts details from the User-Agent header value. The -`user_agent` processor is no longer distributed as a plugin, but is now a module -distributed by default with Elasticsearch. See -{ref}/user-agent-processor.html[User Agent processor] for more details. - include::ingest-attachment.asciidoc[] diff --git a/docs/reference/images/ingest/ingest-pipeline-list.png b/docs/reference/images/ingest/ingest-pipeline-list.png new file mode 100644 index 0000000000000..1ad12c1640d10 Binary files /dev/null and b/docs/reference/images/ingest/ingest-pipeline-list.png differ diff --git a/docs/reference/images/ingest/ingest-pipeline-processor.png b/docs/reference/images/ingest/ingest-pipeline-processor.png new file mode 100644 index 0000000000000..2de7449affd0c Binary files /dev/null and b/docs/reference/images/ingest/ingest-pipeline-processor.png differ diff --git a/docs/reference/images/ingest/test-a-pipeline.png b/docs/reference/images/ingest/test-a-pipeline.png new file mode 100644 index 0000000000000..117b83c120c8e Binary files /dev/null and b/docs/reference/images/ingest/test-a-pipeline.png differ diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc index 19c025aafbfea..aeb8a779fe08e 100644 --- a/docs/reference/index-modules.asciidoc +++ b/docs/reference/index-modules.asciidoc @@ -273,6 +273,7 @@ are ignored for this index. The length of time that a <> remains available for <>. Defaults to `60s`. +[[index-default-pipeline]] `index.default_pipeline`:: The default <> pipeline for this index. Index requests will fail @@ -280,6 +281,7 @@ are ignored for this index. overridden using the `pipeline` parameter. The special pipeline name `_none` indicates no ingest pipeline should be run. +[[index-final-pipeline]] `index.final_pipeline`:: The final <> pipeline for this index. Index requests will fail if the final pipeline is set and the pipeline does not exist. diff --git a/docs/reference/ingest.asciidoc b/docs/reference/ingest.asciidoc index 127604ec6d700..2e5ea43ca490b 100644 --- a/docs/reference/ingest.asciidoc +++ b/docs/reference/ingest.asciidoc @@ -1,92 +1,684 @@ [[ingest]] -= Ingest node += Ingest pipelines -[partintro] --- -Use an ingest node to pre-process documents before the actual document indexing happens. -The ingest node intercepts bulk and index requests, it applies transformations, and it then -passes the documents back to the index or bulk APIs. +Ingest pipelines let you perform common transformations on your data before +indexing. For example, you can use pipelines to remove fields, extract values +from text, and enrich your data. -All nodes enable ingest by default, so any node can handle ingest tasks. To -create a dedicated ingest node, configure the <> -setting in `elasticsearch.yml` as follows: +A pipeline consists of a series of configurable tasks called +<>. Each processor runs sequentially, making specific +changes to incoming documents. After the processors have run, {es} adds the +transformed documents to your data stream or index. -[source,yaml] +image::images/ingest/ingest-process.svg[Ingest pipeline diagram,align="center"] + +You can create and manage ingest pipelines using {kib}'s **Ingest Node +Pipelines** feature or the <>. {es} stores pipelines in +the <>. + +[discrete] +[[ingest-prerequisites]] +=== Prerequisites + +* Nodes with the <> node role handle pipeline +processing. To use ingest pipelines, your cluster must have at least one node +with the `ingest` role. For heavy ingest loads, we recommend creating +<>. + +* If the {es} security features are enabled, you must have the `manage_pipeline` +<> to manage ingest pipelines. To use +{kib}'s **Ingest Node Pipelines** feature, you also need the +`cluster:monitor/nodes/info` cluster privileges. + +* Pipelines including the `enrich` processor require additional setup. See +<>. + +[discrete] +[[create-manage-ingest-pipelines]] +== Create and manage pipelines + +In {kib}, open the main menu and click **Stack Management** > **Ingest Node +Pipelines**. From the list view, you can: + +* View a list of your pipelines and drill down into details +* Edit or clone existing pipelines +* Delete pipelines + +To create a new pipeline, click **Create a pipeline**. For an example tutorial, +see <>. + +[role="screenshot"] +image::images/ingest/ingest-pipeline-list.png[Kibana's Ingest Node Pipelines list view,align="center"] + +You can also use the <> to create and manage pipelines. +The following <> request creates +a pipeline containing two <> processors followed by a +<> processor. The processors run sequentially +in the order specified. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "description": "My pipeline description", + "processors": [ + { + "set": { + "field": "my-long-field", + "value": 10 + } + }, + { + "set": { + "field": "my-boolean-field", + "value": true + } + }, + { + "lowercase": { + "field": "my-keyword-field" + } + } + ] +} +---- +// TESTSETUP + +[discrete] +[[test-pipeline]] +=== Test a pipeline + +Before using a pipeline in production, we recommend you test it using sample +documents. When creating or editing a pipeline in {kib}, click **Add +documents**. In the **Documents** tab, provide sample documents and click **Run +the pipeline**. + +[role="screenshot"] +image::images/ingest/test-a-pipeline.png[Test a pipeline in Kibana,align="center"] + +You can also test pipelines using the <>. + +[source,console] +---- +POST _ingest/pipeline/my-pipeline/_simulate +{ + "docs": [ + { + "_source": { + "my-keyword-field": "FOO" + } + }, + { + "_source": { + "my-keyword-field": "BAR" + } + } + ] +} ---- -node.roles: [ ingest ] + +The API returns transformed documents: + +[source,console-result] +---- +{ + "docs": [ + { + "doc": { + "_index": "_index", + "_type": "_doc", + "_id": "_id", + "_source": { + "my-long-field": 10, + "my-boolean-field": true, + "my-keyword-field": "foo" + }, + "_ingest": { + "timestamp": "2099-02-30T22:30:03.187Z" + } + } + }, + { + "doc": { + "_index": "_index", + "_type": "_doc", + "_id": "_id", + "_source": { + "my-long-field": 10, + "my-boolean-field": true, + "my-keyword-field": "bar" + }, + "_ingest": { + "timestamp": "2099-02-30T22:30:03.188Z" + } + } + } + ] +} ---- +// TESTRESPONSE[s/"2099-02-30T22:30:03.187Z"/$body.docs.0.doc._ingest.timestamp/] +// TESTRESPONSE[s/"2099-02-30T22:30:03.188Z"/$body.docs.1.doc._ingest.timestamp/] -To disable ingest for a node, specify the `node.roles` setting and exclude -`ingest` from the listed roles. +[discrete] +[[add-pipeline-to-indexing-request]] +=== Add a pipeline to an indexing request -To pre-process documents before indexing, <> that specifies a series of -<>. Each processor transforms the document in some specific way. For example, a -pipeline might have one processor that removes a field from the document, followed by -another processor that renames a field. The <> then stores -the configured pipelines. +Use the `pipeline` query parameter to apply a pipeline to documents in +<> or <> indexing requests. -To use a pipeline, simply specify the `pipeline` parameter on an index or bulk request. This -way, the ingest node knows which pipeline to use. +[source,console] +---- +POST my-data-stream/_doc?pipeline=my-pipeline +{ + "@timestamp": "2099-03-07T11:04:05.000Z", + "my-keyword-field": "foo" +} -For example: -Create a pipeline +PUT my-data-stream/_bulk?pipeline=my-pipeline +{ "create":{ } } +{ "@timestamp": "2099-03-08T11:04:05.000Z", "my-keyword-field" : "foo" } +{ "create":{ } } +{ "@timestamp": "2099-03-08T11:06:07.000Z", "my-keyword-field" : "bar" } +---- + +You can also use the `pipeline` parameter with the <> or <> APIs. [source,console] --------------------------------------------------- -PUT _ingest/pipeline/my_pipeline_id +---- +POST my-data-stream/_update_by_query?pipeline=my-pipeline + +POST _reindex +{ + "source": { + "index": "my-data-stream" + }, + "dest": { + "index": "my-new-data-stream", + "op_type": "create", + "pipeline": "my-pipeline" + } +} +---- +// TEST[continued] + +[discrete] +[[set-default-pipeline]] +=== Set a default pipeline + +Use the <> index setting to set +a default pipeline. {es} applies this pipeline if no `pipeline` parameter +is specified. + +[discrete] +[[set-final-pipeline]] +=== Set a final pipeline + +Use the <> index setting to set a +final pipeline. {es} applies this pipeline after the request or default +pipeline, even if neither is specified. + +[discrete] +[[access-source-fields]] +=== Access source fields in a processor + +Processors have read and write access to an incoming document's source fields. +To access a field key in a processor, use its field name. The following `set` +processor accesses `my-long-field`. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "set": { + "field": "my-long-field", + "value": 10 + } + } + ] +} +---- + +You can also prepend the `_source` prefix. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "set": { + "field": "_source.my-long-field", + "value": 10 + } + } + ] +} +---- + +Use dot notation to access object fields. + +IMPORTANT: If your document contains flattened objects, use the +<> processor to expand them first. Other +ingest processors cannot access flattened objects. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "dot_expander": { + "field": "my-object-field.my-property" + } + }, + { + "set": { + "field": "my-object-field.my-property", + "value": 10 + } + } + ] +} +---- + +[[template-snippets]] +To access field values, enclose the field name in double curly brackets `{{ }}` +to create a https://mustache.github.io[Mustache] template snippet. You can use +template snippets to dynamically set field names. The following processor sets a +field name as the `service` field value. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "set": { + "field": "{{service}}", + "value": "{{code}}" + } + } + ] +} +---- + +[discrete] +[[access-metadata-fields]] +=== Access metadata fields in a processor + +Processors can access the following metadata fields by name: + +* `_index` +* `_id` +* `_routing` + +For example, the following `set` processor sets the document's routing value as +the `geoip.country_iso_code` field value. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline { - "description" : "describe pipeline", "processors" : [ { "set" : { + "field": "_routing", + "value": "{{geoip.country_iso_code}}" + } + } + ] +} +---- + +Use a Mustache template snippet to access metadata field values. For example, +`{{_routing}}` retrieves a document's routing value. + +WARNING: If you <> +document IDs, you cannot use `{{_id}}` in a processor. {es} assigns +auto-generated `_id` values after ingest. + +[discrete] +[[access-ingest-metadata]] +=== Access ingest metadata in a processor + +Ingest processors can add and access ingest metadata using the `_ingest` key. + +Unlike source and metadata fields, {es} does not index ingest metadata fields by +default. {es} also allows source fields that start with an `_ingest` key. If +your data includes such source fields, use `_source._ingest` to access them. + +Pipelines only create the `_ingest.timestamp` ingest metadata field by default. +This field contains a timestamp of when {es} received the document's indexing +request. To index `_ingest.timestamp` or other ingest metadata fields, use the +`set` processor. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "set": { + "field": "received", + "value": "{{_ingest.timestamp}}" + } + } + ] +} +---- + +[discrete] +[[handling-pipeline-failures]] +=== Handing pipeline failures + +A pipeline's processors run sequentially. By default, pipeline processing stops +when one of these processors fails or encounters an error. + +To ignore a processor failure and run the pipeline's remaining processors, set +`ignore_failure` to `true`. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "rename": { "field": "foo", - "value": "new" + "target_field": "bar", + "ignore_failure": true } } ] } --------------------------------------------------- +---- -Index with defined pipeline +Use the `on_failure` parameter to specify a list of processors to run +immediately after a processor failure. If `on_failure` is specified, {es} +afterward runs the pipeline's remaining processors , even if the `on_failure` +configuration is empty. [source,console] --------------------------------------------------- -PUT my-index-00001/_doc/my-id?pipeline=my_pipeline_id +---- +PUT _ingest/pipeline/my-pipeline { - "foo": "bar" + "processors": [ + { + "rename": { + "field": "foo", + "target_field": "bar", + "on_failure": [ + { + "set": { + "field": "error.message", + "value": "field \"foo\" does not exist, cannot rename to \"bar\"", + "override": false + } + } + ] + } + } + ] } --------------------------------------------------- -// TEST[continued] +---- -Response: +Nest a list of `on_failure` processors for nested error handling. -[source,console-result] --------------------------------------------------- -{ - "_index" : "my-index-00001", - "_type" : "_doc", - "_id" : "my-id", - "_version" : 1, - "result" : "created", - "_shards" : { - "total" : 2, - "successful" : 2, - "failed" : 0 - }, - "_seq_no" : 0, - "_primary_term" : 1 +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "rename": { + "field": "foo", + "target_field": "bar", + "on_failure": [ + { + "set": { + "field": "error.message", + "value": "field \"foo\" does not exist, cannot rename to \"bar\"", + "override": false, + "on_failure": [ + { + "set": { + "field": "error.message.multi", + "value": "Document encountered multiple ingest errors", + "override": true + } + } + ] + } + } + ] + } + } + ] +} +---- + +You can also specify `on_failure` for a pipeline. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ ... ], + "on_failure": [ + { + "set": { + "field": "_index", + "value": "failed-{{ _index }}" + } + } + ] +} +---- +// TEST[s/\.\.\./{"lowercase": {"field":"my-keyword-field"}}/] + +[discrete] +[[conditionally-run-processor]] +=== Conditionally run a processor + +Each processor supports an optional `if` condition, written as a +{painless}/painless-guide.html[Painless script]. If provided, the processor only +runs when the `if` condition is `true`. + +IMPORTANT: `if` condition scripts run in Painless's +{painless}/painless-ingest-processor-context.html[ingest processor context]. In +`if` conditions, `ctx` values are read-only. + +The following <> processor uses an `if` condition to drop +documents with a `network_name` of `Guest`. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "drop": { + "if": "ctx?.network_name == 'Guest'" + } + } + ] +} +---- + +If the static `script.painless.regex.enabled` cluster setting is enabled, you +can use regular expressions in your `if` condition scripts. For supported +syntax, see the {painless}/painless-regexes.html[Painless regexes] +documentation. + +TIP: If possible, avoid using regular expressions. Expensive regular expressions +can slow indexing speeds. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "set": { + "if": "ctx.href?.url =~ /^http[^s]/", + "field": "href.insecure", + "value": true + } + } + ] +} +---- + +You must specify `if` conditions as valid JSON on a single line. However, you +can use the {kibana-ref}/console-kibana.html#configuring-console[{kib} +console]'s triple quote syntax to write and debug larger scripts. + +TIP: If possible, avoid using complex or expensive `if` condition scripts. +Expensive condition scripts can slow indexing speeds. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "drop": { + "if": """ + Collection tags = ctx.tags; + if(tags != null){ + for (String tag : tags) { + if (tag.toLowerCase().contains('prod')) { + return false; + } + } + } + return true; + """ + } + } + ] } --------------------------------------------------- -// TESTRESPONSE[s/"successful" : 2/"successful" : 1/] +---- -An index may also declare a <> that will be used in the -absence of the `pipeline` parameter. +You can also specify a <> as the +`if` condition. + +[source,console] +---- +PUT _scripts/my-stored-script +{ + "script": { + "lang": "painless", + "source": """ + Collection tags = ctx.tags; + if(tags != null){ + for (String tag : tags) { + if (tag.toLowerCase().contains('prod')) { + return false; + } + } + } + return true; + """ + } +} + +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "drop": { + "if": { "id": "my-stored-script" } + } + } + ] +} +---- -Finally, an index may also declare a <> -that will be executed after any request or default pipeline (if any). +Incoming documents often contain object fields. If a processor script attempts +to access a field whose parent object does not exist, {es} returns a +`NullPointerException`. To avoid these exceptions, use +{painless}/painless-operators-reference.html#null-safe-operator[null safe +operators], such as `?.`, and write your scripts to be null safe. -See <> for more information about creating, adding, and deleting pipelines. +For example, `ctx.network?.name.equalsIgnoreCase('Guest')` is not null safe. +`ctx.network?.name` can return null. Rewrite the script as +`'Guest'.equalsIgnoreCase(ctx.network?.name)`, which is null safe because +`Guest` is always non-null. --- +If you can't rewrite a script to be null safe, include an explicit null check. + +[source,console] +---- +PUT _ingest/pipeline/my-pipeline +{ + "processors": [ + { + "drop": { + "if": "ctx.network?.name != null && ctx.network.name.contains('Guest')" + } + } + ] +} +---- + +[discrete] +[[conditionally-apply-pipelines]] +=== Conditionally apply pipelines + +Combine an `if` condition with the <> processor +to apply other pipelines to documents based on your criteria. You can use this +pipeline as the <> in an +<> used to configure multiple data streams or +indices. + +The following pipeline applies different pipelines to incoming documents based +on the `service.name` field value. + +[source,console] +---- +PUT _ingest/pipeline/one-pipeline-to-rule-them-all +{ + "processors": [ + { + "pipeline": { + "if": "ctx.service?.name == 'apache_httpd'", + "name": "httpd_pipeline" + } + }, + { + "pipeline": { + "if": "ctx.service?.name == 'syslog'", + "name": "syslog_pipeline" + } + }, + { + "fail": { + "if": "ctx.service?.name != 'apache_httpd' && ctx.service?.name != 'syslog'", + "message": "This pipeline requires service.name to be either `syslog` or `apache_httpd`" + } + } + ] +} +---- + +[discrete] +[[get-pipeline-usage-stats]] +=== Get pipeline usage statistics + +Use the <> API to get global and per-pipeline +ingest statistics. Use these stats to determine which pipelines run most +frequently or spend the most time processing. + +[source,console] +---- +GET _nodes/stats/ingest?filter_path=nodes.*.ingest +---- -include::ingest/ingest-node.asciidoc[] +include::ingest/common-log-format-example.asciidoc[] +include::ingest/enrich.asciidoc[] +include::ingest/processors.asciidoc[] diff --git a/docs/reference/ingest/apis/put-pipeline.asciidoc b/docs/reference/ingest/apis/put-pipeline.asciidoc index 32b5dbbc38460..6d6eb82ff1e0a 100644 --- a/docs/reference/ingest/apis/put-pipeline.asciidoc +++ b/docs/reference/ingest/apis/put-pipeline.asciidoc @@ -54,20 +54,19 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=master-timeout] ==== {api-response-body-title} `description`:: -(Required, string) +(Optional, string) Description of the ingest pipeline. `processors`:: + -- -(Required, array of <>) +(Required, array of <>) Array of processors used to pre-process documents before indexing. Processors are executed in the order provided. -See <> for processor object definitions -and a list of built-in processors. +See <>. -- `version`:: diff --git a/docs/reference/ingest/apis/simulate-pipeline.asciidoc b/docs/reference/ingest/apis/simulate-pipeline.asciidoc index 2f76c2b42f0eb..f7f6fcc80137d 100644 --- a/docs/reference/ingest/apis/simulate-pipeline.asciidoc +++ b/docs/reference/ingest/apis/simulate-pipeline.asciidoc @@ -108,13 +108,13 @@ Description of the ingest pipeline. `processors`:: + -- -(Optional, array of <>) +(Optional, array of <>) Array of processors used to pre-process documents during ingest. Processors are executed in the order provided. -See <> for processor object definitions +See <> for processor object definitions and a list of built-in processors. -- diff --git a/docs/reference/ingest/common-log-format-example.asciidoc b/docs/reference/ingest/common-log-format-example.asciidoc new file mode 100644 index 0000000000000..c9e110d6bf503 --- /dev/null +++ b/docs/reference/ingest/common-log-format-example.asciidoc @@ -0,0 +1,197 @@ +[[common-log-format-example]] +== Example: Parse logs in the Common Log Format +++++ +Example: Parse logs +++++ + +In this example tutorial, you’ll use an <> to parse +server logs in the {wikipedia}/Common_Log_Format[Common Log Format] before +indexing. Before starting, check the <> for +ingest pipelines. + +The logs you want to parse look similar to this: + +[source,js] +---- +212.87.37.154 - - [30/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" +200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) +AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\" +---- +// NOTCONSOLE + +These logs contain an IP address, timestamp, and user agent. You want to give +these three items their own field in {es} for faster searches and +visualizations. You also want to know where the request is coming from. + +. In {kib}, open the main menu and click **Stack Management** > **Ingest Node +Pipelines**. ++ +[role="screenshot"] +image::images/ingest/ingest-pipeline-list.png[Kibana's Ingest Node Pipelines list view,align="center"] + +. Click **Create a pipeline**. +. Provide a name and description for the pipeline. +. Add a <> to parse the log message: + +.. Click **Add a processor** and select the **Grok** processor type. +.. Set the field input to `message` and enter the following <>: ++ +[source,js] +---- +%{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:@timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent} +---- +// NOTCONSOLE ++ +.. Click **Add** to save the processor. + +. Add processors to map the date, IP, and user agent fields. Map the appropriate +field to each processor type: ++ +-- +* <>: `@timestamp` +* <>: `client.ip` +* <>: `user_agent` + +In the **Date** processor, specify the date format you want to use: +`dd/MMM/yyyy:HH:mm:ss Z`. +-- +Your form should look similar to this: ++ +[role="screenshot"] +image::images/ingest/ingest-pipeline-processor.png[Processors for Ingest Node Pipelines,align="center"] ++ +The four processors will run sequentially: + +Grok > Date > GeoIP > User agent + +You can reorder processors using the arrow icons. ++ +Alternatively, you can click the **Import processors** link and define the +processors as JSON: ++ +[source,console] +---- +{ + "processors": [ + { + "grok": { + "field": "message", + "patterns": ["%{IPORHOST:client.ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:@timestamp}\\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:user_agent}"] + } + }, + { + "date": { + "field": "@timestamp", + "formats": [ "dd/MMM/yyyy:HH:mm:ss Z" ] + } + }, + { + "geoip": { + "field": "client.ip" + } + }, + { + "user_agent": { + "field": "user_agent" + } + } + ] +} +---- +// TEST[s/^/PUT _ingest\/pipeline\/my-pipeline\n/] + +. To test the pipeline, click **Add documents**. + +. In the **Documents** tab, provide a sample document for testing: ++ +[source,js] +---- +[ + { + "_source": { + "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"" + } + } +] +---- +// NOTCONSOLE + +. Click **Run the pipeline** and verify the pipeline worked as expected. + +. If everything looks correct, close the panel, and then click **Create +pipeline**. ++ +You’re now ready to load the logs data using the <>. + +. Index a document with the pipeline you created. ++ +[source,console] +---- +PUT my-index/_doc/1?pipeline=my-pipeline +{ + "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"" +} +---- +// TEST[continued] + +. To verify, run: ++ +[source,console] +---- +GET my-index/_doc/1 +---- +// TEST[continued] + +//// +[source,console-result] +---- +{ + "_index": "my-index", + "_type": "_doc", + "_id": "1", + "_version": 1, + "_seq_no": 0, + "_primary_term": 1, + "found": true, + "_source": { + "request": "/favicon.ico", + "geoip": { + "continent_name": "Europe", + "region_iso_code": "DE-BE", + "city_name": "Berlin", + "country_iso_code": "DE", + "country_name": "Germany", + "region_name": "Land Berlin", + "location": { + "lon": 13.4978, + "lat": 52.411 + } + }, + "auth": "-", + "ident": "-", + "verb": "GET", + "message": "212.87.37.154 - - [05/May/2099:16:21:15 +0000] \"GET /favicon.ico HTTP/1.1\" 200 3638 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"", + "referrer": "\"-\"", + "@timestamp": "2099-05-05T16:21:15.000Z", + "response": 200, + "bytes": 3638, + "client": { + "ip": "212.87.37.154" + }, + "httpversion": "1.1", + "user_agent": { + "original": "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\"", + "os": { + "name": "Mac OS X", + "version": "10.11.6", + "full": "Mac OS X 10.11.6" + }, + "name": "Chrome", + "device": { + "name": "Mac" + }, + "version": "52.0.2743.116" + } + } +} +---- +//// diff --git a/docs/reference/ingest/enrich.asciidoc b/docs/reference/ingest/enrich.asciidoc index 536371b9891ce..9185eba3d15c9 100644 --- a/docs/reference/ingest/enrich.asciidoc +++ b/docs/reference/ingest/enrich.asciidoc @@ -17,18 +17,13 @@ For example, you can use the enrich processor to: [[how-enrich-works]] === How the enrich processor works -An <> changes documents before they are actually -indexed. You can think of an ingest pipeline as an assembly line made up of a -series of workers, called <>. Each processor makes -specific changes, like lowercasing field values, to incoming documents before -moving on to the next. When all the processors in a pipeline are done, the -finished document is added to the target index. +Most processors are self-contained and only change _existing_ data in incoming +documents. image::images/ingest/ingest-process.svg[align="center"] -Most processors are self-contained and only change _existing_ data in incoming -documents. But the enrich processor adds _new_ data to incoming documents -and requires a few special components: +The enrich processor adds _new_ data to incoming documents and requires a few +special components: image::images/ingest/enrich/enrich-process.svg[align="center"] @@ -193,7 +188,7 @@ added as an array. See <> for a full list of configuration options. -You also can add other <> to your ingest pipeline. +You also can add other <> to your ingest pipeline. [[ingest-enrich-docs]] ==== Ingest and enrich documents diff --git a/docs/reference/ingest/ingest-node.asciidoc b/docs/reference/ingest/ingest-node.asciidoc deleted file mode 100644 index b745f5501b2e1..0000000000000 --- a/docs/reference/ingest/ingest-node.asciidoc +++ /dev/null @@ -1,911 +0,0 @@ -[[pipeline]] -== Pipeline Definition - -A pipeline is a definition of a series of <> that are to be executed -in the same order as they are declared. A pipeline consists of two main fields: a `description` -and a list of `processors`: - -[source,js] --------------------------------------------------- -{ - "description" : "...", - "processors" : [ ... ] -} --------------------------------------------------- -// NOTCONSOLE - -The `description` is a special field to store a helpful description of -what the pipeline does. - -The `processors` parameter defines a list of processors to be executed in -order. - -[[accessing-data-in-pipelines]] -== Accessing Data in Pipelines - -The processors in a pipeline have read and write access to documents that pass through the pipeline. -The processors can access fields in the source of a document and the document's metadata fields. - -[discrete] -[[accessing-source-fields]] -=== Accessing Fields in the Source -Accessing a field in the source is straightforward. You simply refer to fields by -their name. For example: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "my_field", - "value": 582.1 - } -} --------------------------------------------------- -// NOTCONSOLE - -On top of this, fields from the source are always accessible via the `_source` prefix: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "_source.my_field", - "value": 582.1 - } -} --------------------------------------------------- -// NOTCONSOLE - -[discrete] -[[accessing-metadata-fields]] -=== Accessing Metadata Fields -You can access metadata fields in the same way that you access fields in the source. This -is possible because Elasticsearch doesn't allow fields in the source that have the -same name as metadata fields. - -The following metadata fields are accessible by a processor: - -* `_index` -* `_type` -* `_id` -* `_routing` - -The following example sets the `_id` metadata field of a document to `1`: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "_id", - "value": "1" - } -} --------------------------------------------------- -// NOTCONSOLE - -You can access a metadata field's value by surrounding it in double -curly brackets `"{{ }}"`. For example, `{{_index}}` retrieves the name of a -document's index. - -WARNING: If you <> -document IDs, you cannot use the `{{_id}}` value in an ingest processor. {es} -assigns auto-generated `_id` values after ingest. - -[discrete] -[[accessing-ingest-metadata]] -=== Accessing Ingest Metadata Fields -Beyond metadata fields and source fields, ingest also adds ingest metadata to the documents that it processes. -These metadata properties are accessible under the `_ingest` key. Currently ingest adds the ingest timestamp -under the `_ingest.timestamp` key of the ingest metadata. The ingest timestamp is the time when Elasticsearch -received the index or bulk request to pre-process the document. - -Any processor can add ingest-related metadata during document processing. Ingest metadata is transient -and is lost after a document has been processed by the pipeline. Therefore, ingest metadata won't be indexed. - -The following example adds a field with the name `received`. The value is the ingest timestamp: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "received", - "value": "{{_ingest.timestamp}}" - } -} --------------------------------------------------- -// NOTCONSOLE - -Unlike Elasticsearch metadata fields, the ingest metadata field name `_ingest` can be used as a valid field name -in the source of a document. Use `_source._ingest` to refer to the field in the source document. Otherwise, `_ingest` -will be interpreted as an ingest metadata field. - -[discrete] -[[accessing-template-fields]] -=== Accessing Fields and Metafields in Templates -A number of processor settings also support templating. Settings that support templating can have zero or more -template snippets. A template snippet begins with `{{` and ends with `}}`. -Accessing fields and metafields in templates is exactly the same as via regular processor field settings. - -The following example adds a field named `field_c`. Its value is a concatenation of -the values of `field_a` and `field_b`. - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "field_c", - "value": "{{field_a}} {{field_b}}" - } -} --------------------------------------------------- -// NOTCONSOLE - -The following example uses the value of the `geoip.country_iso_code` field in the source -to set the index that the document will be indexed into: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "_index", - "value": "{{geoip.country_iso_code}}" - } -} --------------------------------------------------- -// NOTCONSOLE - -Dynamic field names are also supported. This example sets the field named after the -value of `service` to the value of the field `code`: - -[source,js] --------------------------------------------------- -{ - "set": { - "field": "{{service}}", - "value": "{{code}}" - } -} --------------------------------------------------- -// NOTCONSOLE - -[[ingest-conditionals]] -== Conditional Execution in Pipelines - -Each processor allows for an optional `if` condition to determine if that -processor should be executed or skipped. The value of the `if` is a -<> script that needs to evaluate -to `true` or `false`. - -For example the following processor will <> the document -(i.e. not index it) if the input document has a field named `network_name` -and it is equal to `Guest`. - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/drop_guests_network -{ - "processors": [ - { - "drop": { - "if": "ctx.network_name == 'Guest'" - } - } - ] -} --------------------------------------------------- - -Using that pipeline for an index request: - -[source,console] --------------------------------------------------- -POST test/_doc/1?pipeline=drop_guests_network -{ - "network_name" : "Guest" -} --------------------------------------------------- -// TEST[continued] - -Results in nothing indexed since the conditional evaluated to `true`. - -[source,console-result] --------------------------------------------------- -{ - "_index": "test", - "_type": "_doc", - "_id": "1", - "_version": -3, - "result": "noop", - "_shards": { - "total": 0, - "successful": 0, - "failed": 0 - } -} --------------------------------------------------- - - -[[ingest-conditional-nullcheck]] -=== Handling Nested Fields in Conditionals - -Source documents often contain nested fields. Care should be taken -to avoid NullPointerExceptions if the parent object does not exist -in the document. For example `ctx.a.b.c` can throw an NullPointerExceptions -if the source document does not have top level `a` object, or a second -level `b` object. - -To help protect against NullPointerExceptions, null safe operations should be used. -Fortunately, Painless makes {painless}/painless-operators-reference.html#null-safe-operator[null safe] -operations easy with the `?.` operator. - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/drop_guests_network -{ - "processors": [ - { - "drop": { - "if": "ctx.network?.name == 'Guest'" - } - } - ] -} --------------------------------------------------- - -The following document will get <> correctly: - -[source,console] --------------------------------------------------- -POST test/_doc/1?pipeline=drop_guests_network -{ - "network": { - "name": "Guest" - } -} --------------------------------------------------- -// TEST[continued] - -Thanks to the `?.` operator the following document will not throw an error. -If the pipeline used a `.` the following document would throw a NullPointerException -since the `network` object is not part of the source document. - -[source,console] --------------------------------------------------- -POST test/_doc/2?pipeline=drop_guests_network -{ - "foo" : "bar" -} --------------------------------------------------- -// TEST[continued] - -//// -Hidden example assertion: -[source,console] --------------------------------------------------- -GET test/_doc/2 --------------------------------------------------- -// TEST[continued] - -[source,console-result] --------------------------------------------------- -{ - "_index": "test", - "_type": "_doc", - "_id": "2", - "_version": 1, - "_seq_no": 22, - "_primary_term": 1, - "found": true, - "_source": { - "foo": "bar" - } -} --------------------------------------------------- -// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/] -//// - -The source document can also use dot delimited fields to represent nested fields. - -For example instead the source document defining the fields nested: - -[source,js] --------------------------------------------------- -{ - "network": { - "name": "Guest" - } -} --------------------------------------------------- -// NOTCONSOLE - -The source document may have the nested fields flattened as such: -[source,js] --------------------------------------------------- -{ - "network.name": "Guest" -} --------------------------------------------------- -// NOTCONSOLE - -If this is the case, use the <> -so that the nested fields may be used in a conditional. - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/drop_guests_network -{ - "processors": [ - { - "dot_expander": { - "field": "network.name" - } - }, - { - "drop": { - "if": "ctx.network?.name == 'Guest'" - } - } - ] -} --------------------------------------------------- - -Now the following input document can be used with a conditional in the pipeline. - -[source,console] --------------------------------------------------- -POST test/_doc/3?pipeline=drop_guests_network -{ - "network.name": "Guest" -} --------------------------------------------------- -// TEST[continued] - -The `?.` operators works well for use in the `if` conditional -because the {painless}/painless-operators-reference.html#null-safe-operator[null safe operator] -returns null if the object is null and `==` is null safe (as well as many other -{painless}/painless-operators.html[painless operators]). - -However, calling a method such as `.equalsIgnoreCase` is not null safe -and can result in a NullPointerException. - -Some situations allow for the same functionality but done so in a null safe manner. -For example: `'Guest'.equalsIgnoreCase(ctx.network?.name)` is null safe because -`Guest` is always non null, but `ctx.network?.name.equalsIgnoreCase('Guest')` is not null safe -since `ctx.network?.name` can return null. - -Some situations require an explicit null check. In the following example there -is not null safe alternative, so an explicit null check is needed. - -[source,js] --------------------------------------------------- -{ - "drop": { - "if": "ctx.network?.name != null && ctx.network.name.contains('Guest')" - } -} --------------------------------------------------- -// NOTCONSOLE - -[[ingest-conditional-complex]] -=== Complex Conditionals -The `if` condition can be more complex than a simple equality check. -The full power of the <> is available and -running in the {painless}/painless-ingest-processor-context.html[ingest processor context]. - -IMPORTANT: The value of ctx is read-only in `if` conditions. - -A more complex `if` condition that drops the document (i.e. not index it) -unless it has a multi-valued tag field with at least one value that contains the characters -`prod` (case insensitive). - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/not_prod_dropper -{ - "processors": [ - { - "drop": { - "if": "Collection tags = ctx.tags;if(tags != null){for (String tag : tags) {if (tag.toLowerCase().contains('prod')) { return false;}}} return true;" - } - } - ] -} --------------------------------------------------- - -The conditional needs to be all on one line since JSON does not -support new line characters. However, Kibana's console supports -a triple quote syntax to help with writing and debugging -scripts like these. - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/not_prod_dropper -{ - "processors": [ - { - "drop": { - "if": """ - Collection tags = ctx.tags; - if(tags != null){ - for (String tag : tags) { - if (tag.toLowerCase().contains('prod')) { - return false; - } - } - } - return true; - """ - } - } - ] -} --------------------------------------------------- -// TEST[continued] - -or it can be built with a stored script: - -[source,console] --------------------------------------------------- -PUT _scripts/not_prod -{ - "script": { - "lang": "painless", - "source": """ - Collection tags = ctx.tags; - if(tags != null){ - for (String tag : tags) { - if (tag.toLowerCase().contains('prod')) { - return false; - } - } - } - return true; - """ - } -} -PUT _ingest/pipeline/not_prod_dropper -{ - "processors": [ - { - "drop": { - "if": { "id": "not_prod" } - } - } - ] -} --------------------------------------------------- -// TEST[continued] - -Either way, you can run it with: - -[source,console] --------------------------------------------------- -POST test/_doc/1?pipeline=not_prod_dropper -{ - "tags": ["application:myapp", "env:Stage"] -} --------------------------------------------------- -// TEST[continued] - -The document is <> since `prod` (case insensitive) -is not found in the tags. - -The following document is indexed (i.e. not dropped) since -`prod` (case insensitive) is found in the tags. - -[source,console] --------------------------------------------------- -POST test/_doc/2?pipeline=not_prod_dropper -{ - "tags": ["application:myapp", "env:Production"] -} --------------------------------------------------- -// TEST[continued] - -//// -Hidden example assertion: -[source,console] --------------------------------------------------- -GET test/_doc/2 --------------------------------------------------- -// TEST[continued] - -[source,console-result] --------------------------------------------------- -{ - "_index": "test", - "_type": "_doc", - "_id": "2", - "_version": 1, - "_seq_no": 34, - "_primary_term": 1, - "found": true, - "_source": { - "tags": [ - "application:myapp", - "env:Production" - ] - } -} --------------------------------------------------- -// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] -//// - - - -The <> with verbose can be used to help build out -complex conditionals. If the conditional evaluates to false it will be -omitted from the verbose results of the simulation since the document will not change. - -Care should be taken to avoid overly complex or expensive conditional checks -since the condition needs to be checked for each and every document. - -[[conditionals-with-multiple-pipelines]] -=== Conditionals with the Pipeline Processor -The combination of the `if` conditional and the <> can result in a simple, -yet powerful means to process heterogeneous input. For example, you can define a single pipeline -that delegates to other pipelines based on some criteria. - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/logs_pipeline -{ - "description": "A pipeline of pipelines for log files", - "version": 1, - "processors": [ - { - "pipeline": { - "if": "ctx.service?.name == 'apache_httpd'", - "name": "httpd_pipeline" - } - }, - { - "pipeline": { - "if": "ctx.service?.name == 'syslog'", - "name": "syslog_pipeline" - } - }, - { - "fail": { - "if": "ctx.service?.name != 'apache_httpd' && ctx.service?.name != 'syslog'", - "message": "This pipeline requires service.name to be either `syslog` or `apache_httpd`" - } - } - ] -} --------------------------------------------------- - -The above example allows consumers to point to a single pipeline for all log based index requests. -Based on the conditional, the correct pipeline will be called to process that type of data. - -This pattern works well with a <> defined in an index mapping -template for all indexes that hold data that needs pre-index processing. - -[[conditionals-with-regex]] -=== Conditionals with the Regular Expressions -The `if` conditional is implemented as a Painless script, which requires -{painless}//painless-regexes.html[explicit support for regular expressions]. - -`script.painless.regex.enabled: true` must be set in `elasticsearch.yml` to use regular -expressions in the `if` condition. - -If regular expressions are enabled, operators such as `=~` can be used against a `/pattern/` for conditions. - -For example: - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/check_url -{ - "processors": [ - { - "set": { - "if": "ctx.href?.url =~ /^http[^s]/", - "field": "href.insecure", - "value": true - } - } - ] -} --------------------------------------------------- - -[source,console] --------------------------------------------------- -POST test/_doc/1?pipeline=check_url -{ - "href": { - "url": "http://www.elastic.co/" - } -} --------------------------------------------------- -// TEST[continued] - -Results in: - -//// -Hidden example assertion: -[source,console] --------------------------------------------------- -GET test/_doc/1 --------------------------------------------------- -// TEST[continued] -//// - -[source,console-result] --------------------------------------------------- -{ - "_index": "test", - "_type": "_doc", - "_id": "1", - "_version": 1, - "_seq_no": 60, - "_primary_term": 1, - "found": true, - "_source": { - "href": { - "insecure": true, - "url": "http://www.elastic.co/" - } - } -} --------------------------------------------------- -// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] - - -Regular expressions can be expensive and should be avoided if viable -alternatives exist. - -For example in this case `startsWith` can be used to get the same result -without using a regular expression: - -[source,console] --------------------------------------------------- -PUT _ingest/pipeline/check_url -{ - "processors": [ - { - "set": { - "if": "ctx.href?.url != null && ctx.href.url.startsWith('http://')", - "field": "href.insecure", - "value": true - } - } - ] -} --------------------------------------------------- - -[[handling-failure-in-pipelines]] -== Handling Failures in Pipelines - -In its simplest use case, a pipeline defines a list of processors that -are executed sequentially, and processing halts at the first exception. This -behavior may not be desirable when failures are expected. For example, you may have logs -that don't match the specified grok expression. Instead of halting execution, you may -want to index such documents into a separate index. - -To enable this behavior, you can use the `on_failure` parameter. The `on_failure` parameter -defines a list of processors to be executed immediately following the failed processor. -You can specify this parameter at the pipeline level, as well as at the processor -level. If a processor specifies an `on_failure` configuration, whether -it is empty or not, any exceptions that are thrown by the processor are caught, and the -pipeline continues executing the remaining processors. Because you can define further processors -within the scope of an `on_failure` statement, you can nest failure handling. - -The following example defines a pipeline that renames the `foo` field in -the processed document to `bar`. If the document does not contain the `foo` field, the processor -attaches an error message to the document for later analysis within -Elasticsearch. - -[source,js] --------------------------------------------------- -{ - "description" : "my first pipeline with handled exceptions", - "processors" : [ - { - "rename" : { - "field" : "foo", - "target_field" : "bar", - "on_failure" : [ - { - "set" : { - "field" : "error.message", - "value" : "field \"foo\" does not exist, cannot rename to \"bar\"" - } - } - ] - } - } - ] -} --------------------------------------------------- -// NOTCONSOLE - -The following example defines an `on_failure` block on a whole pipeline to change -the index to which failed documents get sent. - -[source,js] --------------------------------------------------- -{ - "description" : "my first pipeline with handled exceptions", - "processors" : [ ... ], - "on_failure" : [ - { - "set" : { - "field" : "_index", - "value" : "failed-{{ _index }}" - } - } - ] -} --------------------------------------------------- -// NOTCONSOLE - -Alternatively instead of defining behaviour in case of processor failure, it is also possible -to ignore a failure and continue with the next processor by specifying the `ignore_failure` setting. - -In case in the example below the field `foo` doesn't exist the failure will be caught and the pipeline -continues to execute, which in this case means that the pipeline does nothing. - -[source,js] --------------------------------------------------- -{ - "description" : "my first pipeline with handled exceptions", - "processors" : [ - { - "rename" : { - "field" : "foo", - "target_field" : "bar", - "ignore_failure" : true - } - } - ] -} --------------------------------------------------- -// NOTCONSOLE - -The `ignore_failure` can be set on any processor and defaults to `false`. - -[discrete] -[[accessing-error-metadata]] -=== Accessing Error Metadata From Processors Handling Exceptions - -You may want to retrieve the actual error message that was thrown -by a failed processor. To do so you can access metadata fields called -`on_failure_message`, `on_failure_processor_type`, `on_failure_processor_tag` and -`on_failure_pipeline` (in case an error occurred inside a pipeline processor). -These fields are only accessible from within the context of an `on_failure` block. - -Here is an updated version of the example that you -saw earlier. But instead of setting the error message manually, the example leverages the `on_failure_message` -metadata field to provide the error message. - -[source,js] --------------------------------------------------- -{ - "description" : "my first pipeline with handled exceptions", - "processors" : [ - { - "rename" : { - "field" : "foo", - "to" : "bar", - "on_failure" : [ - { - "set" : { - "field" : "error.message", - "value" : "{{ _ingest.on_failure_message }}" - } - } - ] - } - } - ] -} --------------------------------------------------- -// NOTCONSOLE - - -include::enrich.asciidoc[] - - -[[ingest-processors]] -== Processors - -All processors are defined in the following way within a pipeline definition: - -[source,js] --------------------------------------------------- -{ - "PROCESSOR_NAME" : { - ... processor configuration options ... - } -} --------------------------------------------------- -// NOTCONSOLE - -Each processor defines its own configuration parameters, but all processors have -the ability to declare `tag`, `on_failure` and `if` fields. These fields are optional. - -A `tag` is simply a string identifier of the specific instantiation of a certain -processor in a pipeline. The `tag` field does not affect the processor's behavior, -but is very useful for bookkeeping and tracing errors to specific processors. - -The `if` field must contain a script that returns a boolean value. If the script evaluates to `true` -then the processor will be executed for the given document otherwise it will be skipped. -The `if` field takes an object with the script fields defined in <> -and accesses a read only version of the document via the same `ctx` variable used by scripts in the -<>. - -[source,js] --------------------------------------------------- -{ - "set": { - "if": "ctx.foo == 'someValue'", - "field": "found", - "value": true - } -} --------------------------------------------------- -// NOTCONSOLE - -See <> to learn more about the `if` field and conditional execution. - -See <> to learn more about the `on_failure` field and error handling in pipelines. - -The <> will provide a per node list of what processors are available. - -Custom processors must be installed on all nodes. The put pipeline API will fail if a processor specified in a pipeline -doesn't exist on all nodes. If you rely on custom processor plugins make sure to mark these plugins as mandatory by adding -`plugin.mandatory` setting to the `config/elasticsearch.yml` file, for example: - -[source,yaml] --------------------------------------------------- -plugin.mandatory: ingest-attachment --------------------------------------------------- - -A node will not start if this plugin is not available. - -The <> can be used to fetch ingest usage statistics, globally and on a per -pipeline basis. Useful to find out which pipelines are used the most or spent the most time on preprocessing. - -[discrete] -=== Ingest Processor Plugins - -Additional ingest processors can be implemented and installed as Elasticsearch {plugins}/intro.html[plugins]. -See {plugins}/ingest.html[Ingest plugins] for information about the available ingest plugins. - -include::processors/append.asciidoc[] -include::processors/bytes.asciidoc[] -include::processors/circle.asciidoc[] -include::processors/community-id.asciidoc[] -include::processors/convert.asciidoc[] -include::processors/csv.asciidoc[] -include::processors/date.asciidoc[] -include::processors/date-index-name.asciidoc[] -include::processors/dissect.asciidoc[] -include::processors/dot-expand.asciidoc[] -include::processors/drop.asciidoc[] -include::processors/enrich.asciidoc[] -include::processors/fail.asciidoc[] -include::processors/foreach.asciidoc[] -include::processors/geoip.asciidoc[] -include::processors/grok.asciidoc[] -include::processors/gsub.asciidoc[] -include::processors/html_strip.asciidoc[] -include::processors/inference.asciidoc[] -include::processors/join.asciidoc[] -include::processors/json.asciidoc[] -include::processors/kv.asciidoc[] -include::processors/lowercase.asciidoc[] -include::processors/network-direction.asciidoc[] -include::processors/pipeline.asciidoc[] -include::processors/remove.asciidoc[] -include::processors/rename.asciidoc[] -include::processors/script.asciidoc[] -include::processors/set.asciidoc[] -include::processors/set-security-user.asciidoc[] -include::processors/sort.asciidoc[] -include::processors/split.asciidoc[] -include::processors/trim.asciidoc[] -include::processors/uppercase.asciidoc[] -include::processors/url-decode.asciidoc[] -include::processors/uri-parts.asciidoc[] -include::processors/user-agent.asciidoc[] diff --git a/docs/reference/ingest/processors.asciidoc b/docs/reference/ingest/processors.asciidoc new file mode 100644 index 0000000000000..bf3ce47852c4f --- /dev/null +++ b/docs/reference/ingest/processors.asciidoc @@ -0,0 +1,71 @@ +[[processors]] +== Ingest processor reference +++++ +Processor reference +++++ + +{es} includes several configurable processors. To get a list of available +processors, use the <> API. + +[source,console] +---- +GET _nodes/ingest?filter_path=nodes.*.ingest.processors +---- + +The pages in this section contain reference documentation for each processor. + +[discrete] +[[ingest-process-plugins]] +=== Processor plugins + +You can install additional processors as {plugins}/ingest.html[plugins]. + +You must install any plugin processors on all nodes in your cluster. Otherwise, +{es} will fail to create pipelines containing the processor. + +Mark a plugin as mandatory by setting `plugin.mandatory` in +`elasticsearch.yml`. A node will fail to start is a mandatory plugin is not +installed. + +[source,yaml] +---- +plugin.mandatory: ingest-attachment +---- + +include::processors/append.asciidoc[] +include::processors/bytes.asciidoc[] +include::processors/circle.asciidoc[] +include::processors/community-id.asciidoc[] +include::processors/convert.asciidoc[] +include::processors/csv.asciidoc[] +include::processors/date.asciidoc[] +include::processors/date-index-name.asciidoc[] +include::processors/dissect.asciidoc[] +include::processors/dot-expand.asciidoc[] +include::processors/drop.asciidoc[] +include::processors/enrich.asciidoc[] +include::processors/fail.asciidoc[] +include::processors/foreach.asciidoc[] +include::processors/geoip.asciidoc[] +include::processors/grok.asciidoc[] +include::processors/gsub.asciidoc[] +include::processors/html_strip.asciidoc[] +include::processors/inference.asciidoc[] +include::processors/join.asciidoc[] +include::processors/json.asciidoc[] +include::processors/kv.asciidoc[] +include::processors/lowercase.asciidoc[] +include::processors/network-direction.asciidoc[] +include::processors/pipeline.asciidoc[] +include::processors/remove.asciidoc[] +include::processors/rename.asciidoc[] +include::processors/script.asciidoc[] +include::processors/set.asciidoc[] +include::processors/set-security-user.asciidoc[] +include::processors/sort.asciidoc[] +include::processors/split.asciidoc[] +include::processors/trim.asciidoc[] +include::processors/uppercase.asciidoc[] +include::processors/url-decode.asciidoc[] +include::processors/uri-parts.asciidoc[] +include::processors/user-agent.asciidoc[] diff --git a/docs/reference/ingest/processors/append.asciidoc b/docs/reference/ingest/processors/append.asciidoc index 839fec7e4eaaa..2aa616c9393e3 100644 --- a/docs/reference/ingest/processors/append.asciidoc +++ b/docs/reference/ingest/processors/append.asciidoc @@ -15,8 +15,8 @@ Accepts a single value or an array of values. [options="header"] |====== | Name | Required | Default | Description -| `field` | yes | - | The field to be appended to. Supports <>. -| `value` | yes | - | The value to be appended. Supports <>. +| `field` | yes | - | The field to be appended to. Supports <>. +| `value` | yes | - | The value to be appended. Supports <>. | `allow_duplicates` | no | true | If `false`, the processor does not append values already present in the field. include::common-options.asciidoc[] diff --git a/docs/reference/ingest/processors/date-index-name.asciidoc b/docs/reference/ingest/processors/date-index-name.asciidoc index e4607a0567cf1..2613bb63b5b73 100644 --- a/docs/reference/ingest/processors/date-index-name.asciidoc +++ b/docs/reference/ingest/processors/date-index-name.asciidoc @@ -135,11 +135,11 @@ understands this to mean `2016-04-01` as is explained in the <>. -| `date_rounding` | yes | - | How to round the date when formatting the date into the index name. Valid values are: `y` (year), `M` (month), `w` (week), `d` (day), `h` (hour), `m` (minute) and `s` (second). Supports <>. +| `index_name_prefix` | no | - | A prefix of the index name to be prepended before the printed date. Supports <>. +| `date_rounding` | yes | - | How to round the date when formatting the date into the index name. Valid values are: `y` (year), `M` (month), `w` (week), `d` (day), `h` (hour), `m` (minute) and `s` (second). Supports <>. | `date_formats` | no | yyyy-MM-dd+++'T'+++HH:mm:ss.SSSXX | An array of the expected date formats for parsing dates / timestamps in the document being preprocessed. Can be a java time pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. | `timezone` | no | UTC | The timezone to use when parsing the date and when date math index supports resolves expressions into concrete index names. | `locale` | no | ENGLISH | The locale to use when parsing the date from the document being preprocessed, relevant when parsing month names or week days. -| `index_name_format` | no | yyyy-MM-dd | The format to be used when printing the parsed date into the index name. A valid java time pattern is expected here. Supports <>. +| `index_name_format` | no | yyyy-MM-dd | The format to be used when printing the parsed date into the index name. A valid java time pattern is expected here. Supports <>. include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/date.asciidoc b/docs/reference/ingest/processors/date.asciidoc index ae05afa422c5d..805a76dc1a701 100644 --- a/docs/reference/ingest/processors/date.asciidoc +++ b/docs/reference/ingest/processors/date.asciidoc @@ -18,8 +18,8 @@ in the same order they were defined as part of the processor definition. | `field` | yes | - | The field to get the date from. | `target_field` | no | @timestamp | The field that will hold the parsed date. | `formats` | yes | - | An array of the expected date formats. Can be a <> or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. -| `timezone` | no | UTC | The timezone to use when parsing the date. Supports <>. -| `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days. Supports <>. +| `timezone` | no | UTC | The timezone to use when parsing the date. Supports <>. +| `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days. Supports <>. | `output_format` | no | `yyyy-MM-dd'T'HH:mm:ss.SSSXXX` | The format to use when writing the date to `target_field`. Can be a <> or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N. include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/dot-expand.asciidoc b/docs/reference/ingest/processors/dot-expand.asciidoc index 13cc6e7214572..4d6eb6106cc31 100644 --- a/docs/reference/ingest/processors/dot-expand.asciidoc +++ b/docs/reference/ingest/processors/dot-expand.asciidoc @@ -6,7 +6,7 @@ Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. -Otherwise these <> can't be accessed by any processor. +Otherwise these fields can't be accessed by any processor. [[dot-expander-options]] .Dot Expand Options diff --git a/docs/reference/ingest/processors/enrich.asciidoc b/docs/reference/ingest/processors/enrich.asciidoc index 26fb2f1769c64..78d52f1a72dee 100644 --- a/docs/reference/ingest/processors/enrich.asciidoc +++ b/docs/reference/ingest/processors/enrich.asciidoc @@ -15,8 +15,8 @@ See <> section for more information about how |====== | Name | Required | Default | Description | `policy_name` | yes | - | The name of the enrich policy to use. -| `field` | yes | - | The field in the input document that matches the policies match_field used to retrieve the enrichment data. Supports <>. -| `target_field` | yes | - | Field added to incoming documents to contain enrich data. This field contains both the `match_field` and `enrich_fields` specified in the <>. Supports <>. +| `field` | yes | - | The field in the input document that matches the policies match_field used to retrieve the enrichment data. Supports <>. +| `target_field` | yes | - | Field added to incoming documents to contain enrich data. This field contains both the `match_field` and `enrich_fields` specified in the <>. Supports <>. | `ignore_missing` | no | false | If `true` and `field` does not exist, the processor quietly exits without modifying the document | `override` | no | true | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched. | `max_matches` | no | 1 | The maximum number of matched documents to include under the configured target field. The `target_field` will be turned into a json array if `max_matches` is higher than 1, otherwise `target_field` will become a json object. In order to avoid documents getting too large, the maximum allowed value is 128. diff --git a/docs/reference/ingest/processors/fail.asciidoc b/docs/reference/ingest/processors/fail.asciidoc index 4446b941db3e4..991d5de9a5d1c 100644 --- a/docs/reference/ingest/processors/fail.asciidoc +++ b/docs/reference/ingest/processors/fail.asciidoc @@ -13,7 +13,7 @@ to the requester. [options="header"] |====== | Name | Required | Default | Description -| `message` | yes | - | The error message thrown by the processor. Supports <>. +| `message` | yes | - | The error message thrown by the processor. Supports <>. include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/pipeline.asciidoc b/docs/reference/ingest/processors/pipeline.asciidoc index a663b7042928f..8c4c6dd3b0f16 100644 --- a/docs/reference/ingest/processors/pipeline.asciidoc +++ b/docs/reference/ingest/processors/pipeline.asciidoc @@ -11,7 +11,7 @@ Executes another pipeline. [options="header"] |====== | Name | Required | Default | Description -| `name` | yes | - | The name of the pipeline to execute. Supports <>. +| `name` | yes | - | The name of the pipeline to execute. Supports <>. include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/remove.asciidoc b/docs/reference/ingest/processors/remove.asciidoc index 57e785c2de764..6e9b4f24ff515 100644 --- a/docs/reference/ingest/processors/remove.asciidoc +++ b/docs/reference/ingest/processors/remove.asciidoc @@ -11,7 +11,7 @@ Removes existing fields. If one field doesn't exist, an exception will be thrown [options="header"] |====== | Name | Required | Default | Description -| `field` | yes | - | Fields to be removed. Supports <>. +| `field` | yes | - | Fields to be removed. Supports <>. | `ignore_missing` | no | `false` | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/rename.asciidoc b/docs/reference/ingest/processors/rename.asciidoc index 538cfb048a8e1..9b0eeaa157d55 100644 --- a/docs/reference/ingest/processors/rename.asciidoc +++ b/docs/reference/ingest/processors/rename.asciidoc @@ -11,8 +11,8 @@ Renames an existing field. If the field doesn't exist or the new name is already [options="header"] |====== | Name | Required | Default | Description -| `field` | yes | - | The field to be renamed. Supports <>. -| `target_field` | yes | - | The new name of the field. Supports <>. +| `field` | yes | - | The field to be renamed. Supports <>. +| `target_field` | yes | - | The new name of the field. Supports <>. | `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document include::common-options.asciidoc[] |====== diff --git a/docs/reference/ingest/processors/set.asciidoc b/docs/reference/ingest/processors/set.asciidoc index a7c1e7206517d..c9da8e626e6e8 100644 --- a/docs/reference/ingest/processors/set.asciidoc +++ b/docs/reference/ingest/processors/set.asciidoc @@ -12,12 +12,12 @@ its value will be replaced with the provided one. [options="header"] |====== | Name | Required | Default | Description -| `field` | yes | - | The field to insert, upsert, or update. Supports <>. -| `value` | yes* | - | The value to be set for the field. Supports <>. May specify only one of `value` or `copy_from`. +| `field` | yes | - | The field to insert, upsert, or update. Supports <>. +| `value` | yes* | - | The value to be set for the field. Supports <>. May specify only one of `value` or `copy_from`. | `copy_from` | no | - | The origin field which will be copied to `field`, cannot set `value` simultaneously. Supported data types are `boolean`, `number`, `array`, `object`, `string`, `date`, etc. | `override` | no | `true` | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched. -| `ignore_empty_value` | no | `false` | If `true` and `value` is a <> that evaluates to `null` or the empty string, the processor quietly exits without modifying the document -| `media_type` | no | `application/json` | The media type for encoding `value`. Applies only when `value` is a <>. Must be one of `application/json`, `text/plain`, or `application/x-www-form-urlencoded`. +| `ignore_empty_value` | no | `false` | If `true` and `value` is a <> that evaluates to `null` or the empty string, the processor quietly exits without modifying the document +| `media_type` | no | `application/json` | The media type for encoding `value`. Applies only when `value` is a <>. Must be one of `application/json`, `text/plain`, or `application/x-www-form-urlencoded`. include::common-options.asciidoc[] |====== diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc index fda7130b02d07..e78cfe6f454cd 100644 --- a/docs/reference/redirects.asciidoc +++ b/docs/reference/redirects.asciidoc @@ -1373,3 +1373,49 @@ include::redirects.asciidoc[tag=legacy-rollup-redirect] include::redirects.asciidoc[tag=legacy-rollup-redirect] endif::[] + +[role="exclude",id="pipeline"] +=== Pipeline definition + +See <>. + +[role="exclude",id="accessing-data-in-pipelines"] +=== Accessing data in pipelines + +See <>, <>, and +<>. + +[role="exclude",id="ingest-conditionals"] +=== Conditional execution in pipelines + +See <>. + +[role="exclude",id="ingest-conditional-nullcheck"] +=== Handling nested fields in conditionals + +See <>. + +[role="exclude",id="ingest-conditional-complex"] +=== Complex conditionals + +See <>. + +[role="exclude",id="conditionals-with-multiple-pipelines"] +=== Conditionals with the pipeline processor + +See <>. + +[role="exclude",id="conditionals-with-regex"] +=== Conditionals with the regular expressions + +See <>. + +[role="exclude",id="handling-failure-in-pipelines"] +=== Handling failures in pipelines + +See <>. + +[role="exclude",id="ingest-processors"] +=== Ingest processors + +See <>. diff --git a/docs/reference/transform/apis/put-transform.asciidoc b/docs/reference/transform/apis/put-transform.asciidoc index bcf9f5a9fc72e..e1d46370d378a 100644 --- a/docs/reference/transform/apis/put-transform.asciidoc +++ b/docs/reference/transform/apis/put-transform.asciidoc @@ -247,7 +247,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sync-time-field] + -- TIP: In general, it’s a good idea to use a field that contains the -<>. If you use a different field, +<>. If you use a different field, you might need to set the `delay` such that it accounts for data transmission delays. diff --git a/docs/reference/transform/apis/update-transform.asciidoc b/docs/reference/transform/apis/update-transform.asciidoc index 25d0e8cadb1ea..795812db725f6 100644 --- a/docs/reference/transform/apis/update-transform.asciidoc +++ b/docs/reference/transform/apis/update-transform.asciidoc @@ -195,7 +195,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=sync-time-field] + -- TIP: In general, it’s a good idea to use a field that contains the -<>. If you use a different field, +<>. If you use a different field, you might need to set the `delay` such that it accounts for data transmission delays.