From 493df57af658d3aa0491bba05ad36e118cfd11f5 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Fri, 14 May 2021 07:53:52 -0400 Subject: [PATCH 1/7] [DOCS] Moving grok to its own scripting page --- .../reference/ingest/processors/grok.asciidoc | 39 ---------- docs/reference/redirects.asciidoc | 5 ++ docs/reference/scripting/grok-syntax.asciidoc | 73 +++++++++++++++++++ docs/reference/scripting/using.asciidoc | 1 + 4 files changed, 79 insertions(+), 39 deletions(-) create mode 100644 docs/reference/scripting/grok-syntax.asciidoc diff --git a/docs/reference/ingest/processors/grok.asciidoc b/docs/reference/ingest/processors/grok.asciidoc index 97403447d52dd..b04775f56bdcf 100644 --- a/docs/reference/ingest/processors/grok.asciidoc +++ b/docs/reference/ingest/processors/grok.asciidoc @@ -8,8 +8,6 @@ Extracts structured fields out of a single text field within a document. You cho extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused. -This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format -that is generally written for humans and not computer consumption. This processor comes packaged with many https://github.com/elastic/elasticsearch/blob/{branch}/libs/grok/src/main/resources/patterns[reusable patterns]. @@ -17,43 +15,6 @@ If you need help building patterns to match your logs, you will find the {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool quite useful! The https://grokconstructor.appspot.com[Grok Constructor] is also a useful tool. -[[grok-basics]] -==== Grok Basics - -Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. -The regular expression library is Oniguruma, and you can see the full supported regexp syntax -https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Oniguruma site]. - -Grok works by leveraging this regular expression language to allow naming existing patterns and combining them into more -complex patterns that match your fields. - -The syntax for reusing a grok pattern comes in three forms: `%{SYNTAX:SEMANTIC}`, `%{SYNTAX}`, `%{SYNTAX:SEMANTIC:TYPE}`. - -The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER` -pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match. `NUMBER` and `IP` are both -patterns that are provided within the default patterns set. - -The `SEMANTIC` is the identifier you give to the piece of text being matched. For example, `3.44` could be the -duration of an event, so you could call it simply `duration`. Further, a string `55.3.244.1` might identify -the `client` making a request. - -The `TYPE` is the type you wish to cast your named field. `int`, `long`, `double`, `float` and `boolean` are supported types for coercion. - -For example, you might want to match the following text: - -[source,txt] --------------------------------------------------- -3.44 55.3.244.1 --------------------------------------------------- - -You may know that the message in the example is a number followed by an IP address. You can match this text by using the following -Grok expression. - -[source,txt] --------------------------------------------------- -%{NUMBER:duration} %{IP:client} --------------------------------------------------- - [[using-grok]] ==== Using the Grok Processor in a Pipeline diff --git a/docs/reference/redirects.asciidoc b/docs/reference/redirects.asciidoc index 60c92176e9644..10af977a5bf6c 100644 --- a/docs/reference/redirects.asciidoc +++ b/docs/reference/redirects.asciidoc @@ -3,6 +3,11 @@ The following pages have moved or been deleted. +[role="exclude",id="grok-basics"] +=== Grok basics + +See <>. + // [START] Security redirects [role="exclude",id="get-started-users"] diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc new file mode 100644 index 0000000000000..555fb14308111 --- /dev/null +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -0,0 +1,73 @@ +[[grok]] +=== Grokking grok +Grok is a regular expression dialect that supports aliased expressions that you +can reuse. Grok works really well with syslog logs, Apache and other webserver +logs, mysql logs, and in general, any log format that is generally written for +humans and not computer consumption. + +Because grok sits on top of regular expressions, any regular expressions are +valid in grok. The regular expression library is Oniguruma, and you can see the +full supported regexp syntax +https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Oniguruma site]. + +Grok uses this regular expression language to allow naming existing patterns +and combining them into more complex patterns that match your fields. + +[[grok-syntax]] +==== Grok syntax +The syntax for reusing a grok pattern comes in three forms: + +* `%{SYNTAX}` +* `%{SYNTAX:SEMANTIC}` +* `%{SYNTAX:SEMANTIC:TYPE}` + +`SYNTAX`:: +The name of the pattern that will match your text. For example, `NUMBER` and +`IP` are both patterns that are provided within the default patterns set. The +`NUMBER` pattern matches data like `3.44`, and the `IP` pattern matches data +like `55.3.244.1`. + +`SEMANTIC`:: +The identifier you give to the piece of text being matched. For example, `3.44` +could be the duration of an event, so you might call it `duration`. The string +`55.3.244.1` might identify the `client` making a request. + +`TYPE`:: +The data type you want to cast your named field. `int`, `long`, `double`, +`float` and `boolean` are supported types. + +For example, let's say you have message data that looks like this: + +[source,txt] +---- +3.44 55.3.244.1 +---- + +You know that the first value is a number, followed by an IP address. You can +match this text by using the following grok expression: + +[source,txt] +---- +%{NUMBER:duration} %{IP:client} +---- + +[[grok-patterns]] +==== Grok patterns +The {elastic-stack} ships with numerous https://github.com/elastic/elasticsearch/blob/master/libs/grok/src/main/resources/patterns/grok-patterns[predefined grok patterns] that simplify working with grok. + +For example, if you're working with Apache log data, you can use the +`%{COMMONAPACHELOG}` syntax, which understands the structure of Apache logs. A +sample document might look like this: + +[source,txt] +---- +{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} +---- + +To extract the IP address from the `message` field, write a Painless script +that incorporates the `%{COMMONAPACHELOG}` syntax. You can test your script +using the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painless +execute API, or by creating a runtime field that includes the script. + +TIP: If you need help building grok patterns to match your data, use the {kib} +{kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool. \ No newline at end of file diff --git a/docs/reference/scripting/using.asciidoc b/docs/reference/scripting/using.asciidoc index e235dcb210aa9..3f29427f0b722 100644 --- a/docs/reference/scripting/using.asciidoc +++ b/docs/reference/scripting/using.asciidoc @@ -567,3 +567,4 @@ DELETE /_ingest/pipeline/my_test_scores_pipeline //// include::common-script-uses.asciidoc[] +include::grok-syntax.asciidoc[] \ No newline at end of file From d3d39dbe575e8630ce8acc59bf37c478f9abd741 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Fri, 14 May 2021 13:29:13 -0400 Subject: [PATCH 2/7] Adding examples --- docs/reference/mapping/runtime.asciidoc | 2 +- docs/reference/scripting/grok-syntax.asciidoc | 176 +++++++++++++++--- 2 files changed, 151 insertions(+), 27 deletions(-) diff --git a/docs/reference/mapping/runtime.asciidoc b/docs/reference/mapping/runtime.asciidoc index 3be0302f9bf77..116e7a4f2b672 100644 --- a/docs/reference/mapping/runtime.asciidoc +++ b/docs/reference/mapping/runtime.asciidoc @@ -1179,7 +1179,7 @@ supports aliased expressions that you can reuse. See <> The script matches on the `%{COMMONAPACHELOG}` log pattern, which understands the structure of Apache logs. If the pattern matches, the script emits the -value matching IP address. If the pattern doesn't match +value of the matching IP address. If the pattern doesn't match (`clientip != null`), the script just returns the field value without crashing. [source,console] diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc index 555fb14308111..bec24baad7667 100644 --- a/docs/reference/scripting/grok-syntax.asciidoc +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -1,25 +1,23 @@ [[grok]] === Grokking grok -Grok is a regular expression dialect that supports aliased expressions that you -can reuse. Grok works really well with syslog logs, Apache and other webserver -logs, mysql logs, and in general, any log format that is generally written for -humans and not computer consumption. +Grok is a regular expression dialect that supports reusable aliased expressions. Grok works really well with syslog logs, Apache and other webserver +logs, mysql logs, and generally any log format that is written for humans and +not computer consumption. -Because grok sits on top of regular expressions, any regular expressions are -valid in grok. The regular expression library is Oniguruma, and you can see the -full supported regexp syntax -https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Oniguruma site]. - -Grok uses this regular expression language to allow naming existing patterns -and combining them into more complex patterns that match your fields. +Grok sits on top of the https://github.com/kkos/oniguruma/blob/master/doc/RE[Oniguruma] regular expression library, so any regular expressions are +valid in grok. Grok uses this regular expression language to allow naming +existing patterns and combining them into more complex patterns that match your +fields. [[grok-syntax]] -==== Grok syntax -The syntax for reusing a grok pattern comes in three forms: +==== Grok patterns +The {stack} ships with numerous https://github.com/elastic/elasticsearch/blob/master/libs/grok/src/main/resources/patterns/grok-patterns[predefined grok patterns] that simplify working with grok. The syntax for reusing grok patterns +takes one of the following forms: -* `%{SYNTAX}` -* `%{SYNTAX:SEMANTIC}` -* `%{SYNTAX:SEMANTIC:TYPE}` +[%autowidth] +|=== +|`%{SYNTAX}` | `%{SYNTAX:SEMANTIC}` |`%{SYNTAX:SEMANTIC:TYPE}` +|=== `SYNTAX`:: The name of the pattern that will match your text. For example, `NUMBER` and @@ -43,8 +41,8 @@ For example, let's say you have message data that looks like this: 3.44 55.3.244.1 ---- -You know that the first value is a number, followed by an IP address. You can -match this text by using the following grok expression: +The first value is a number, followed by what appears to be an IP address. You +can match this text by using the following grok expression: [source,txt] ---- @@ -52,8 +50,15 @@ match this text by using the following grok expression: ---- [[grok-patterns]] -==== Grok patterns -The {elastic-stack} ships with numerous https://github.com/elastic/elasticsearch/blob/master/libs/grok/src/main/resources/patterns/grok-patterns[predefined grok patterns] that simplify working with grok. +==== Incorporate grok patterns +You can incorporate predefined grok patterns into Painless scripts to extract +data. To test your script, use either the {painless}/painless-execute-api.html[field contexts] of the Painless execute API or create a runtime field that +includes the script. Runtime fields offer greater flexibility and accept +multiple documents, but the Painless execute API is a great option if you don't +have write access on a cluster where you're testing a script. + +TIP: If you need help building grok patterns to match your data, use the +{kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool in {kib}. For example, if you're working with Apache log data, you can use the `%{COMMONAPACHELOG}` syntax, which understands the structure of Apache logs. A @@ -61,13 +66,132 @@ sample document might look like this: [source,txt] ---- +"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - +[30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736" +---- + +To extract the IP address from the `message` field, you can write a Painless +script that incorporates the `%{COMMONAPACHELOG}` syntax. You can test this +script using the {painless}/painless-execute-api.html#painless-runtime-ip[`ip` field context] of the Painless execute API, but let's use a runtime field +instead. + +Based on the sample document, index the `@timestamp` and `message` fields. To +remain flexible, use `wildcard` as the field type for `message`: + +[source,console] +---- +PUT /my-index-000001/ +{ + "mappings": { + "properties": { + "@timestamp": { + "format": "strict_date_optional_time||epoch_second", + "type": "date" + }, + "message": { + "type": "wildcard" + } + } + } +} +---- + +Next, use the <> to index some log data into +`my-index-000001`. + +[source,console] +---- +POST /my-index-000001/_bulk?refresh +{"index":{}} {"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} +{"index":{}} +{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} +{"index":{}} +{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} +{"index":{}} +{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"} +{"index":{}} +{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"} +{"index":{}} +{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} +{"index":{}} +{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"} +---- +// TEST[continued] + +Now you can define a runtime field in the mappings that includes your Painless +script and grok pattern. If the pattern matches, the script emits the value of +the matching IP address. If the pattern doesn't match (`clientip != null`), the +script just returns the field value without crashing. + +[source,console] +---- +PUT my-index-000001/_mappings +{ + "runtime": { + "http.clientip": { + "type": "ip", + "script": """ + String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip; + if (clientip != null) emit(clientip); + """ + } + } +} ---- +// TEST[continued] -To extract the IP address from the `message` field, write a Painless script -that incorporates the `%{COMMONAPACHELOG}` syntax. You can test your script -using the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painless -execute API, or by creating a runtime field that includes the script. +[[grok-pattern-results]] +==== Return calculated results +Using the `http.clientip` runtime field, you can define a simple query to run a +search for a specific IP address and return all related fields. The <> parameter on the `_search` API works for all fields, +even those that weren't sent as part of the original `_source`: -TIP: If you need help building grok patterns to match your data, use the {kib} -{kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool. \ No newline at end of file +[source,console] +---- +GET my-index-000001/_search +{ + "query": { + "match": { + "http.clientip": "40.135.0.0" + } + }, + "fields" : ["http.clientip"] +} +---- +// TEST[continued] +// TEST[s/_search/_search\?filter_path=hits/] + +The response includes the specific IP address indicated in your search query. +The grok pattern within the Painless script extracted this value from the +`message` field at runtime. + +[source,console-result] +---- +{ + "hits" : { + "total" : { + "value" : 1, + "relation" : "eq" + }, + "max_score" : 1.0, + "hits" : [ + { + "_index" : "my-index-000001", + "_id" : "1iN2a3kBw4xTzEDqyYE0", + "_score" : 1.0, + "_source" : { + "timestamp" : "2020-04-30T14:30:17-05:00", + "message" : "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736" + }, + "fields" : { + "http.clientip" : [ + "40.135.0.0" + ] + } + } + ] + } +} +---- +// TESTRESPONSE[s/"_id" : "1iN2a3kBw4xTzEDqyYE0"/"_id": $body.hits.hits.0._id/] From 4cf4516386f76abb909dfa2830d9b87d8dab67ad Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Tue, 18 May 2021 16:35:29 -0400 Subject: [PATCH 3/7] Updating cross link for grok page --- docs/reference/scripting/common-script-uses.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/scripting/common-script-uses.asciidoc b/docs/reference/scripting/common-script-uses.asciidoc index 0c07cfcd10052..00a8a5c1e3a8c 100644 --- a/docs/reference/scripting/common-script-uses.asciidoc +++ b/docs/reference/scripting/common-script-uses.asciidoc @@ -12,7 +12,7 @@ information, but you only want to extract pieces and parts. There are two options at your disposal: -* <> is a regular expression dialect that supports aliased +* <> is a regular expression dialect that supports aliased expressions that you can reuse. Because Grok sits on top of regular expressions (regex), any regular expressions are valid in grok as well. * <> extracts structured fields out of text, using From bb80ebacad368836a0be520b27563abe2bf57a46 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Thu, 20 May 2021 16:29:16 -0400 Subject: [PATCH 4/7] Adds same runtime field in a search request for #73262 --- docs/reference/mapping/runtime.asciidoc | 143 +++++++++++------- docs/reference/scripting/grok-syntax.asciidoc | 32 ++++ 2 files changed, 119 insertions(+), 56 deletions(-) diff --git a/docs/reference/mapping/runtime.asciidoc b/docs/reference/mapping/runtime.asciidoc index 116e7a4f2b672..59251e5ebbfeb 100644 --- a/docs/reference/mapping/runtime.asciidoc +++ b/docs/reference/mapping/runtime.asciidoc @@ -91,7 +91,7 @@ calculates the day of the week based on the value of `timestamp`, and uses [source,console] ---- -PUT my-index/ +PUT my-index-000001/ { "mappings": { "runtime": { @@ -130,7 +130,7 @@ the index mapping as runtime fields: [source,console] ---- -PUT my-index +PUT my-index-000001 { "mappings": { "dynamic": "runtime", @@ -152,7 +152,7 @@ exist, the response doesn't include any values for that runtime field. [source,console] ---- -PUT my-index/ +PUT my-index-000001/ { "mappings": { "runtime": { @@ -174,7 +174,7 @@ remove a runtime field from the mappings, set the value of the runtime field to [source,console] ---- -PUT my-index/_mapping +PUT my-index-000001/_mapping { "runtime": { "day_of_week": null @@ -213,7 +213,7 @@ and only within the context of this search request: [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "runtime_mappings": { "day_of_week": { @@ -242,7 +242,7 @@ other runtime fields. For example, let's say you bulk index some sensor data: [source,console] ---- -POST my-index/_bulk?refresh=true +POST my-index-000001/_bulk?refresh=true {"index":{}} {"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":"5.2","start": "300","end":"8675309"}} {"index":{}} @@ -265,7 +265,7 @@ your indexed fields and modify the data type: [source,console] ---- -PUT my-index/_mapping +PUT my-index-000001/_mapping { "runtime": { "measures.start": { @@ -292,7 +292,7 @@ Now, you can easily run an [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "aggs": { "avg_start": { @@ -340,7 +340,7 @@ compute statistics over numeric values extracted from the aggregated documents. [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "runtime_mappings": { "duration": { @@ -393,11 +393,11 @@ script, and returns the value as part of the query. Because the runtime field shadows the mapped field, you can override the value returned in search without modifying the mapped field. -For example, let's say you indexed the following documents into `my-index`: +For example, let's say you indexed the following documents into `my-index-000001`: [source,console] ---- -POST my-index/_bulk?refresh=true +POST my-index-000001/_bulk?refresh=true {"index":{}} {"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":5.2}} {"index":{}} @@ -422,7 +422,7 @@ If you search for documents where the model number matches `HG537PU`: [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "query": { "match": { @@ -448,7 +448,7 @@ The response includes indexed values for documents matching model number "max_score" : 1.0296195, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "F1BeSXYBg_szTodcYCmk", "_score" : 1.0296195, "_source" : { @@ -460,7 +460,7 @@ The response includes indexed values for documents matching model number } }, { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "l02aSXYBkpNf6QRDO62Q", "_score" : 1.0296195, "_source" : { @@ -489,7 +489,7 @@ for documents matching the search request: [source,console] ---- -POST my-index/_search +POST my-index-000001/_search { "runtime_mappings": { "measures.voltage": { @@ -529,7 +529,7 @@ which still returns in the response: "max_score" : 1.0296195, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "F1BeSXYBg_szTodcYCmk", "_score" : 1.0296195, "_source" : { @@ -546,7 +546,7 @@ which still returns in the response: } }, { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "l02aSXYBkpNf6QRDO62Q", "_score" : 1.0296195, "_source" : { @@ -587,7 +587,7 @@ the request so that new fields are added to the mapping as runtime fields. [source,console] ---- -PUT my-index/ +PUT my-index-000001/ { "mappings": { "dynamic": "runtime", @@ -614,7 +614,7 @@ Let's ingest some sample data, which will result in two indexed fields: [source,console] ---- -POST /my-index/_bulk?refresh +POST /my-index-000001/_bulk?refresh { "index": {}} { "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"} { "index": {}} @@ -651,7 +651,7 @@ modify the mapping without changing any field values. [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "fields": [ "@timestamp", @@ -668,7 +668,7 @@ the `message` field and will further refine the query: [source,console] ---- -PUT /my-index/_mapping +PUT /my-index-000001/_mapping { "runtime": { "client_ip": { @@ -687,7 +687,7 @@ runtime field: [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "size": 1, "query": { @@ -717,7 +717,7 @@ address. "max_score" : 1.0, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "oWs5KXYB-XyJbifr9mrz", "_score" : 1.0, "_source" : { @@ -767,11 +767,11 @@ valves. The connected sensors are only capable of reporting a fraction of the true readings. Rather than outfit the pressure valves with new sensors, you decide to calculate the values based on reported readings. Based on the reported data, you define the following fields in your mapping for -`my-index`: +`my-index-000001`: [source,console] ---- -PUT my-index/ +PUT my-index-000001/ { "mappings": { "properties": { @@ -797,7 +797,7 @@ You then bulk index some sample data from your sensors. This data includes [source,console] ---- -POST my-index/_bulk?refresh=true +POST my-index-000001/_bulk?refresh=true {"index":{}} {"timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a"} {"index":{}} @@ -820,7 +820,7 @@ voltage and multiplies it by `2`: [source,console] ---- -PUT my-index/_mapping +PUT my-index-000001/_mapping { "runtime": { "voltage_corrected": { @@ -844,7 +844,7 @@ parameter on the `_search` API: [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "fields": [ "voltage_corrected", @@ -869,7 +869,7 @@ GET my-index/_search "max_score" : 1.0, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "z4TCrHgBdg9xpPrU6z9k", "_score" : 1.0, "_source" : { @@ -888,7 +888,7 @@ GET my-index/_search } }, { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "0ITCrHgBdg9xpPrU6z9k", "_score" : 1.0, "_source" : { @@ -920,7 +920,7 @@ multiplier for reported sensor data should be `4`. To gain greater performance, you decide to index the `voltage_corrected` runtime field with the new `multiplier` parameter. -In a new index named `my-index-00001`, copy the `voltage_corrected` runtime +In a new index named `my-index-000001`, copy the `voltage_corrected` runtime field definition into the mappings of the new index. It's that simple! You can add an optional parameter named `on_script_error` that determines whether to reject the entire document if the script throws an error at index time @@ -928,7 +928,7 @@ reject the entire document if the script throws an error at index time [source,console] ---- -PUT my-index-00001/ +PUT my-index-000001/ { "mappings": { "properties": { @@ -964,11 +964,11 @@ PUT my-index-00001/ index time. Setting the value to `ignore` will register the field in the document’s `_ignored` metadata field and continue indexing. -Bulk index some sample data from your sensors into the `my-index-00001` index: +Bulk index some sample data from your sensors into the `my-index-000001` index: [source,console] ---- -POST my-index-00001/_bulk?refresh=true +POST my-index-000001/_bulk?refresh=true { "index": {}} { "timestamp": 1516729294000, "temperature": 200, "voltage": 5.2, "node": "a"} { "index": {}} @@ -992,7 +992,7 @@ the `_search` API to retrieve the fields you want: [source,console] ---- -POST my-index-00001/_search +POST my-index-000001/_search { "query": { "range": { @@ -1024,7 +1024,7 @@ match the range query, based on the calculated value of the included script: "max_score" : 1.0, "hits" : [ { - "_index" : "my-index-00001", + "_index" : "my-index-000001", "_id" : "yoSLrHgBdg9xpPrUZz_P", "_score" : 1.0, "_source" : { @@ -1043,7 +1043,7 @@ match the range query, based on the calculated value of the included script: } }, { - "_index" : "my-index-00001", + "_index" : "my-index-000001", "_id" : "y4SLrHgBdg9xpPrUZz_P", "_score" : 1.0, "_source" : { @@ -1083,12 +1083,12 @@ time for these fields. ==== Define indexed fields as a starting point You can start with a simple example by adding the `@timestamp` and `message` -fields to the `my-index` mapping as indexed fields. To remain flexible, use +fields to the `my-index-000001` mapping as indexed fields. To remain flexible, use `wildcard` as the field type for `message`: [source,console] ---- -PUT /my-index/ +PUT /my-index-000001/ { "mappings": { "properties": { @@ -1108,7 +1108,7 @@ PUT /my-index/ ==== Ingest some data After mapping the fields you want to retrieve, index a few records from your log data into {es}. The following request uses the <> -to index raw log data into `my-index`. Instead of indexing all of your log +to index raw log data into `my-index-000001`. Instead of indexing all of your log data, you can use a small sample to experiment with runtime fields. The final document is not a valid Apache log format, but we can account for @@ -1116,7 +1116,7 @@ that scenario in our script. [source,console] ---- -POST /my-index/_bulk?refresh +POST /my-index-000001/_bulk?refresh {"index":{}} {"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} {"index":{}} @@ -1138,7 +1138,7 @@ At this point, you can view how {es} stores your raw data. [source,console] ---- -GET /my-index +GET /my-index-000001 ---- // TEST[continued] @@ -1147,7 +1147,7 @@ The mapping contains two fields: `@timestamp` and `message`. [source,console-result] ---- { - "my-index" : { + "my-index-000001" : { "aliases" : { }, "mappings" : { "properties" : { @@ -1167,15 +1167,15 @@ The mapping contains two fields: `@timestamp` and `message`. } } ---- -// TESTRESPONSE[s/\.\.\./"settings": $body.my-index.settings/] +// TESTRESPONSE[s/\.\.\./"settings": $body.my-index-000001.settings/] [[runtime-examples-grok]] ==== Define a runtime field with a grok pattern If you want to retrieve results that include `clientip`, you can add that field as a runtime field in the mapping. The following runtime script defines a -grok pattern that extracts structured fields out of a single text +<> that extracts structured fields out of a single text field within a document. A grok pattern is like a regular expression that -supports aliased expressions that you can reuse. See <> to learn more about grok syntax. +supports aliased expressions that you can reuse. The script matches on the `%{COMMONAPACHELOG}` log pattern, which understands the structure of Apache logs. If the pattern matches, the script emits the @@ -1184,7 +1184,7 @@ value of the matching IP address. If the pattern doesn't match [source,console] ---- -PUT my-index/_mappings +PUT my-index-000001/_mappings { "runtime": { "http.clientip": { @@ -1201,6 +1201,37 @@ PUT my-index/_mappings <1> This condition ensures that the script doesn't crash even if the pattern of the message doesn't match. +Alternatively, you can define the same runtime field but in the context of a +search request. The runtime definition and the script are exactly the same as +the one defined previously in the index mapping. Just copy that definition into +the search request under the `runtime_mappings` section and include a query +that matches on the runtime field. This query returns the same results as if +you defined a search query for the `http.clientip` runtime field in your index +mappings, but only in the context of this specific search: + +[source,console] +---- +GET my-index-000001/_search +{ + "runtime_mappings": { + "http.clientip": { + "type": "ip", + "script": """ + String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip; + if (clientip != null) emit(clientip); + """ + } + }, + "query": { + "match": { + "http.clientip": "40.135.0.0" + } + }, + "fields" : ["http.clientip"] +} +---- +// TEST[continued] + [[runtime-examples-grok-ip]] ===== Search for a specific IP address Using the `http.clientip` runtime field, you can define a simple query to run a @@ -1208,7 +1239,7 @@ search for a specific IP address and return all related fields. [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "query": { "match": { @@ -1247,7 +1278,7 @@ data that doesn't match the grok pattern. "max_score" : 1.0, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "FdLqu3cBhqheMnFKd0gK", "_score" : 1.0, "_source" : { @@ -1281,7 +1312,7 @@ You can also run a <> that operates on the [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "query": { "range": { @@ -1309,7 +1340,7 @@ timestamp falls within the defined range. "max_score" : 1.0, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "hdEhyncBRSB6iD-PoBqe", "_score" : 1.0, "_source" : { @@ -1318,7 +1349,7 @@ timestamp falls within the defined range. } }, { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "htEhyncBRSB6iD-PoBqe", "_score" : 1.0, "_source" : { @@ -1348,7 +1379,7 @@ successful dissect patterns. [source,console] ---- -PUT my-index/_mappings +PUT my-index-000001/_mappings { "runtime": { "http.client.ip": { @@ -1367,7 +1398,7 @@ Similarly, you can define a dissect pattern to extract the https://developer.moz [source,console] ---- -PUT my-index/_mappings +PUT my-index-000001/_mappings { "runtime": { "http.response": { @@ -1387,7 +1418,7 @@ You can then run a query to retrieve a specific HTTP response using the [source,console] ---- -GET my-index/_search +GET my-index-000001/_search { "query": { "match": { @@ -1413,7 +1444,7 @@ The response includes a single document where the HTTP response is `304`: "max_score" : 1.0, "hits" : [ { - "_index" : "my-index", + "_index" : "my-index-000001", "_id" : "A2qDy3cBWRMvVAuI7F8M", "_score" : 1.0, "_source" : { diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc index bec24baad7667..7de78851c3043 100644 --- a/docs/reference/scripting/grok-syntax.asciidoc +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -141,6 +141,38 @@ PUT my-index-000001/_mappings ---- // TEST[continued] +Alternatively, you can define the same runtime field but in the context of a +search request. The runtime definition and the script are exactly the same as +the one defined previously in the index mapping. Just copy that definition into +the search request under the `runtime_mappings` section and include a query +that matches on the runtime field. This query returns the same results as if +you <> for the `http.clientip` +runtime field in your index mappings, but only in the context of this specific +search: + +[source,console] +---- +GET my-index-000001/_search +{ + "runtime_mappings": { + "http.clientip": { + "type": "ip", + "script": """ + String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip; + if (clientip != null) emit(clientip); + """ + } + }, + "query": { + "match": { + "http.clientip": "40.135.0.0" + } + }, + "fields" : ["http.clientip"] +} +---- +// TEST[continued] + [[grok-pattern-results]] ==== Return calculated results Using the `http.clientip` runtime field, you can define a simple query to run a From f26db6e69cd40aa8f433c06adf823cd5b83cb39c Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Thu, 20 May 2021 16:59:57 -0400 Subject: [PATCH 5/7] Clarify titles and shift navigation --- docs/reference/scripting/grok-syntax.asciidoc | 6 ++++-- docs/reference/scripting/using.asciidoc | 4 ++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc index 7de78851c3043..e4d6b8673da47 100644 --- a/docs/reference/scripting/grok-syntax.asciidoc +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -50,7 +50,7 @@ can match this text by using the following grok expression: ---- [[grok-patterns]] -==== Incorporate grok patterns +==== Use grok patterns in Painless scripts You can incorporate predefined grok patterns into Painless scripts to extract data. To test your script, use either the {painless}/painless-execute-api.html[field contexts] of the Painless execute API or create a runtime field that includes the script. Runtime fields offer greater flexibility and accept @@ -119,6 +119,8 @@ POST /my-index-000001/_bulk?refresh ---- // TEST[continued] +[[grok-patterns-runtime]] +==== Incorporate grok patterns and scripts in runtime fields Now you can define a runtime field in the mappings that includes your Painless script and grok pattern. If the pattern matches, the script emits the value of the matching IP address. If the pattern doesn't match (`clientip != null`), the @@ -146,7 +148,7 @@ search request. The runtime definition and the script are exactly the same as the one defined previously in the index mapping. Just copy that definition into the search request under the `runtime_mappings` section and include a query that matches on the runtime field. This query returns the same results as if -you <> for the `http.clientip` +you <> for the `http.clientip` runtime field in your index mappings, but only in the context of this specific search: diff --git a/docs/reference/scripting/using.asciidoc b/docs/reference/scripting/using.asciidoc index 3f29427f0b722..e322a85158115 100644 --- a/docs/reference/scripting/using.asciidoc +++ b/docs/reference/scripting/using.asciidoc @@ -566,5 +566,5 @@ DELETE /_ingest/pipeline/my_test_scores_pipeline //// -include::common-script-uses.asciidoc[] -include::grok-syntax.asciidoc[] \ No newline at end of file +include::grok-syntax.asciidoc[] +include::common-script-uses.asciidoc[] \ No newline at end of file From c953dc42273a90dd7627a0123e3e7d276d71d745 Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Wed, 26 May 2021 17:33:51 -0400 Subject: [PATCH 6/7] Incorporating review feedback --- docs/reference/scripting/grok-syntax.asciidoc | 23 +++++++++++-------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc index e4d6b8673da47..17d8beef84ec9 100644 --- a/docs/reference/scripting/grok-syntax.asciidoc +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -16,7 +16,7 @@ takes one of the following forms: [%autowidth] |=== -|`%{SYNTAX}` | `%{SYNTAX:SEMANTIC}` |`%{SYNTAX:SEMANTIC:TYPE}` +|`%{SYNTAX}` | `%{SYNTAX:ID}` |`%{SYNTAX:ID:TYPE}` |=== `SYNTAX`:: @@ -25,7 +25,7 @@ The name of the pattern that will match your text. For example, `NUMBER` and `NUMBER` pattern matches data like `3.44`, and the `IP` pattern matches data like `55.3.244.1`. -`SEMANTIC`:: +`ID`:: The identifier you give to the piece of text being matched. For example, `3.44` could be the duration of an event, so you might call it `duration`. The string `55.3.244.1` might identify the `client` making a request. @@ -64,11 +64,14 @@ For example, if you're working with Apache log data, you can use the `%{COMMONAPACHELOG}` syntax, which understands the structure of Apache logs. A sample document might look like this: -[source,txt] +// Note to contributors that the line break in the following example is +// intentional to promote better readability in the output +[source,js] ---- "timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736" ---- +// NOTCONSOLE To extract the IP address from the `message` field, you can write a Painless script that incorporates the `%{COMMONAPACHELOG}` syntax. You can test this @@ -80,7 +83,7 @@ remain flexible, use `wildcard` as the field type for `message`: [source,console] ---- -PUT /my-index-000001/ +PUT /my-index/ { "mappings": { "properties": { @@ -97,11 +100,11 @@ PUT /my-index-000001/ ---- Next, use the <> to index some log data into -`my-index-000001`. +`my-index`. [source,console] ---- -POST /my-index-000001/_bulk?refresh +POST /my-index/_bulk?refresh {"index":{}} {"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"} {"index":{}} @@ -128,7 +131,7 @@ script just returns the field value without crashing. [source,console] ---- -PUT my-index-000001/_mappings +PUT my-index/_mappings { "runtime": { "http.clientip": { @@ -154,7 +157,7 @@ search: [source,console] ---- -GET my-index-000001/_search +GET my-index/_search { "runtime_mappings": { "http.clientip": { @@ -183,7 +186,7 @@ even those that weren't sent as part of the original `_source`: [source,console] ---- -GET my-index-000001/_search +GET my-index/_search { "query": { "match": { @@ -211,7 +214,7 @@ The grok pattern within the Painless script extracted this value from the "max_score" : 1.0, "hits" : [ { - "_index" : "my-index-000001", + "_index" : "my-index", "_id" : "1iN2a3kBw4xTzEDqyYE0", "_score" : 1.0, "_source" : { From e523ed69b62ffe89008decc3c44789ee4687aaff Mon Sep 17 00:00:00 2001 From: Adam Locke Date: Thu, 27 May 2021 07:59:20 -0400 Subject: [PATCH 7/7] Updating cross-link to Painless --- docs/reference/scripting/grok-syntax.asciidoc | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/reference/scripting/grok-syntax.asciidoc b/docs/reference/scripting/grok-syntax.asciidoc index 17d8beef84ec9..03c5811d9594d 100644 --- a/docs/reference/scripting/grok-syntax.asciidoc +++ b/docs/reference/scripting/grok-syntax.asciidoc @@ -52,10 +52,11 @@ can match this text by using the following grok expression: [[grok-patterns]] ==== Use grok patterns in Painless scripts You can incorporate predefined grok patterns into Painless scripts to extract -data. To test your script, use either the {painless}/painless-execute-api.html[field contexts] of the Painless execute API or create a runtime field that -includes the script. Runtime fields offer greater flexibility and accept -multiple documents, but the Painless execute API is a great option if you don't -have write access on a cluster where you're testing a script. +data. To test your script, use either the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painless +execute API or create a runtime field that includes the script. Runtime fields +offer greater flexibility and accept multiple documents, but the Painless +execute API is a great option if you don't have write access on a cluster +where you're testing a script. TIP: If you need help building grok patterns to match your data, use the {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool in {kib}.