From a8881813491d4a3ff780576608deeb6208d4059f Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Mon, 26 Aug 2019 13:07:01 -0400 Subject: [PATCH 1/9] [DOCS] Reformats analyze API --- docs/reference/indices/analyze.asciidoc | 198 ++++++++++++++++++++---- 1 file changed, 171 insertions(+), 27 deletions(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 50dc88f3711d2..7a5d9f6bc7b07 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -1,15 +1,145 @@ [[indices-analyze]] -=== Analyze +=== Analyze API +++++ +Analyze +++++ -Performs the analysis process on a text and return the tokens breakdown -of the text. +Performs the <> process on a text +and returns a token breakdown of the text. -Can be used without specifying an index against one of the many built in -analyzers: +[source,js] +-------------------------------------------------- +GET /_analyze +{ + "analyzer" : "standard", + "text" : "Quick Brown Foxes!" +} +-------------------------------------------------- +// CONSOLE + + +[[analyze-api-request]] +==== {api-request-title} + +`GET /_analyze` + +`POST /_analyze` + +`GET //_analyze` + +`POST //_analyze` + + +[[analyze-api-path-params]] +==== {api-path-parms-title} + +``:: ++ +-- +(Optional, string) +Index used to derive the analyzer. + +If specified, +the `analyzer` or `field` parameter overrides this value. + +If the `analyzer` and `field` parameters are not specified, +the analyze API uses the default analyzer for the ``. + +If the `` parameter is not specified +or the `` does not have a default analyzer, +the analyze API uses the <>. +-- + + +[[analyze-api-query-params]] +==== {api-query-parms-title} + +`analyzer`:: ++ +-- +(Optional, string or <>) +Analyzer used to analyze for the provided `text`. + +See <> for a list of built-in analyzers. +You can also provide a <>. + +If this parameter is not specified, +the analyze API uses the analyzer for the `field`'s mapping. + +If the `field` parameter is not specified, +the analyze API uses the default analyzer for the ``. + +If the `` parameter is not specified +or the `` does not have a default analyzer, +the analyze API uses the <>. +-- + +`attributes`:: +(Optional, array of strings) +Array of token attributes used to filter the output of the `explain` parameter. + +`char_filter`:: +(Optional, array of strings) +Array of character filters used to preprocess characters before the tokenizer. +See <> for a list of character filters. + +`explain`:: +(Optional, boolean) +If `true`, the response includes token attributes and additional details. +Defaults to `false`. +experimental:[The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.] + +`field`:: ++ +-- +(Optional, string) +Field used to derive the analyzer. +To use this parameter, +you must specify an ``. + +If specified, +the `analyzer` parameter overrides this value. + +If the `field` parameter is not specified, +the analyze API uses the default analyzer for the ``. + +If the `` parameter is not specified +or the `` does not have a default analyzer, +the analyze API uses the <>. +-- + +`filter`:: +(Optional, Array of strings) +Array of token filters used to change tokens after the tokenizer. +See <> for a list of token filters. + +`normalizer`:: +(Optional, string) +Normalizer used to convert text into a single token. +See <> for a list of normalizers. + +`text`:: +(Required, string or array of strings) +Text to analyze. +If an array of strings is provided, it is analyzed as a multi-valued field. + +`tokenizer`:: +(Optional, string) +Tokenizer used to convert text into tokens. +See <> for a list of tokenizers. + +[[analyze-api-example]] +==== {api-examples-title} + +[[analyze-api-no-index-ex]] +===== No index specified + +You can use the analyze API without specifying an `` against one of the +many built-in analyzers: [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "analyzer" : "standard", "text" : "this is a test" @@ -17,11 +147,14 @@ GET _analyze -------------------------------------------------- // CONSOLE -If text parameter is provided as array of strings, it is analyzed as a multi-valued field. +[[analyze-api-text-array-ex]] +===== Array of text strings + +If the `text` parameter is provided as array of strings, it is analyzed as a multi-valued field. [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "analyzer" : "standard", "text" : ["this is a test", "the second text"] @@ -29,13 +162,16 @@ GET _analyze -------------------------------------------------- // CONSOLE -Or by building a custom transient analyzer out of tokenizers, -token filters and char filters. Token filters can use the shorter 'filter' -parameter name: +[[analyze-api-custom-analyzer-ex]] +===== Custom analyzer + +You can use the analyze API to test a custom transient analyzer built from +tokenizers, token filters, and char filters. Token filters use the `filter` +parameter: [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], @@ -46,7 +182,7 @@ GET _analyze [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], @@ -62,7 +198,7 @@ Custom tokenizers, token filters, and character filters can be specified in the [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "tokenizer" : "whitespace", "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}], @@ -71,11 +207,14 @@ GET _analyze -------------------------------------------------- // CONSOLE -It can also run against a specific index: +[[analyze-api-specific-index-ex]] +===== Specific index + +You can also run the analyze API against a specific index: [source,js] -------------------------------------------------- -GET analyze_sample/_analyze +GET /analyze_sample/_analyze { "text" : "this is a test" } @@ -89,7 +228,7 @@ can also be provided to use a different analyzer: [source,js] -------------------------------------------------- -GET analyze_sample/_analyze +GET /analyze_sample/_analyze { "analyzer" : "whitespace", "text" : "this is a test" @@ -98,11 +237,14 @@ GET analyze_sample/_analyze // CONSOLE // TEST[setup:analyze_sample] -Also, the analyzer can be derived based on a field mapping, for example: +[[analyze-api-field-ex]] +===== Derive analyzer from a field mapping + +The analyzer can be derived based on a field mapping, for example: [source,js] -------------------------------------------------- -GET analyze_sample/_analyze +GET /analyze_sample/_analyze { "field" : "obj1.field1", "text" : "this is a test" @@ -114,11 +256,14 @@ GET analyze_sample/_analyze Will cause the analysis to happen based on the analyzer configured in the mapping for `obj1.field1` (and if not, the default index analyzer). +[[analyze-api-normalizer-ex]] +===== Normalizer + A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index. [source,js] -------------------------------------------------- -GET analyze_sample/_analyze +GET /analyze_sample/_analyze { "normalizer" : "my_normalizer", "text" : "BaR" @@ -131,7 +276,7 @@ Or by building a custom transient normalizer out of token filters and char filte [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "filter" : ["lowercase"], "text" : "BaR" @@ -140,7 +285,7 @@ GET _analyze // CONSOLE [[explain-analyze-api]] -==== Explain Analyze +===== Explain analyze If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token. You can filter token attributes you want to output by setting `attributes` option. @@ -149,7 +294,7 @@ NOTE: The format of the additional detail information is labelled as experimenta [source,js] -------------------------------------------------- -GET _analyze +GET /_analyze { "tokenizer" : "standard", "filter" : ["snowball"], @@ -210,8 +355,7 @@ The request returns the following result: <1> Output only "keyword" attribute, since specify "attributes" in the request. [[tokens-limit-settings]] -[float] -=== Settings to prevent tokens explosion +===== Settings to prevent tokens explosion Generating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced: @@ -225,7 +369,7 @@ The following setting allows to limit the number of tokens that can be produced: [source,js] -------------------------------------------------- -PUT analyze_sample +PUT /analyze_sample { "settings" : { "index.analyze.max_token_count" : 20000 @@ -237,7 +381,7 @@ PUT analyze_sample [source,js] -------------------------------------------------- -GET analyze_sample/_analyze +GET /analyze_sample/_analyze { "text" : "this is a test" } From db5d3b1ced7a7fa515164df691117c1ef440e93e Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:31:12 -0400 Subject: [PATCH 2/9] reword intro sentence --- docs/reference/indices/analyze.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 7a5d9f6bc7b07..5e42b7d10e8fc 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -4,8 +4,8 @@ Analyze ++++ -Performs the <> process on a text -and returns a token breakdown of the text. +Performs <> on a text string +and returns the resulting tokens. [source,js] -------------------------------------------------- From 0af3ae0361be2af9c720aff38525321ce2b73cf2 Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:32:10 -0400 Subject: [PATCH 3/9] reword field mapping sentence --- docs/reference/indices/analyze.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 5e42b7d10e8fc..d0a6cb91d82dd 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -64,7 +64,7 @@ See <> for a list of built-in analyzers. You can also provide a <>. If this parameter is not specified, -the analyze API uses the analyzer for the `field`'s mapping. +the analyze API uses the analyzer defined in the `field`'s mapping. If the `field` parameter is not specified, the analyze API uses the default analyzer for the ``. From 030fcba666144832370f50053b82e1403330e81c Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:33:00 -0400 Subject: [PATCH 4/9] reword parm --- docs/reference/indices/analyze.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index d0a6cb91d82dd..42a882ac307d9 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -45,7 +45,7 @@ the `analyzer` or `field` parameter overrides this value. If the `analyzer` and `field` parameters are not specified, the analyze API uses the default analyzer for the ``. -If the `` parameter is not specified +If no `` is specified or the `` does not have a default analyzer, the analyze API uses the <>. -- @@ -69,7 +69,7 @@ the analyze API uses the analyzer defined in the `field`'s mapping. If the `field` parameter is not specified, the analyze API uses the default analyzer for the ``. -If the `` parameter is not specified +If no `` is specified or the `` does not have a default analyzer, the analyze API uses the <>. -- From 548c3903366a6fc05881a1d304c69e5e77c8d36c Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:33:32 -0400 Subject: [PATCH 5/9] reword `` parm --- docs/reference/indices/analyze.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 42a882ac307d9..c89944636bcbd 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -100,7 +100,7 @@ you must specify an ``. If specified, the `analyzer` parameter overrides this value. -If the `field` parameter is not specified, +If no `` is specified, the analyze API uses the default analyzer for the ``. If the `` parameter is not specified From 32b2165301dc369f350095cafbcf79b706b6981a Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:36:44 -0400 Subject: [PATCH 6/9] clean up inline parm mentions --- docs/reference/indices/analyze.asciidoc | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index c89944636bcbd..826cd38d95d9f 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -40,13 +40,13 @@ GET /_analyze Index used to derive the analyzer. If specified, -the `analyzer` or `field` parameter overrides this value. +the `analyzer` or `` parameter overrides this value. -If the `analyzer` and `field` parameters are not specified, -the analyze API uses the default analyzer for the ``. +If no analyzer or field are specified, +the analyze API uses the default analyzer for the index. -If no `` is specified -or the `` does not have a default analyzer, +If no index is specified +or the index does not have a default analyzer, the analyze API uses the <>. -- @@ -64,7 +64,7 @@ See <> for a list of built-in analyzers. You can also provide a <>. If this parameter is not specified, -the analyze API uses the analyzer defined in the `field`'s mapping. +the analyze API uses the analyzer defined in the field's mapping. If the `field` parameter is not specified, the analyze API uses the default analyzer for the ``. @@ -95,15 +95,15 @@ experimental:[The format of the additional detail information is labelled as exp (Optional, string) Field used to derive the analyzer. To use this parameter, -you must specify an ``. +you must specify an index. If specified, the `analyzer` parameter overrides this value. -If no `` is specified, -the analyze API uses the default analyzer for the ``. +If no field is specified, +the analyze API uses the default analyzer for the index. -If the `` parameter is not specified +If no index is specified or the `` does not have a default analyzer, the analyze API uses the <>. -- From eb129cc90bff0f14818fad67a97d86353d0a0835 Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Tue, 27 Aug 2019 08:38:23 -0400 Subject: [PATCH 7/9] iter --- docs/reference/indices/analyze.asciidoc | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 826cd38d95d9f..7f35301651ad4 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -66,11 +66,11 @@ You can also provide a <>. If this parameter is not specified, the analyze API uses the analyzer defined in the field's mapping. -If the `field` parameter is not specified, -the analyze API uses the default analyzer for the ``. +If no field is specified, +the analyze API uses the default analyzer for the index. -If no `` is specified -or the `` does not have a default analyzer, +If no index is specified, +or the index does not have a default analyzer, the analyze API uses the <>. -- @@ -104,7 +104,7 @@ If no field is specified, the analyze API uses the default analyzer for the index. If no index is specified -or the `` does not have a default analyzer, +or the index does not have a default analyzer, the analyze API uses the <>. -- @@ -121,11 +121,11 @@ See <> for a list of normalizers. `text`:: (Required, string or array of strings) Text to analyze. -If an array of strings is provided, it is analyzed as a multi-valued field. +If an array of strings is provided, it is analyzed as a multi-value field. `tokenizer`:: (Optional, string) -Tokenizer used to convert text into tokens. +Tokenizer to use to convert text into tokens. See <> for a list of tokenizers. [[analyze-api-example]] @@ -134,8 +134,8 @@ See <> for a list of tokenizers. [[analyze-api-no-index-ex]] ===== No index specified -You can use the analyze API without specifying an `` against one of the -many built-in analyzers: +You can apply any of the built-in analyzers to the text string without +specifying an index. [source,js] -------------------------------------------------- @@ -150,7 +150,7 @@ GET /_analyze [[analyze-api-text-array-ex]] ===== Array of text strings -If the `text` parameter is provided as array of strings, it is analyzed as a multi-valued field. +If the `text` parameter is provided as array of strings, it is analyzed as a multi-value field. [source,js] -------------------------------------------------- @@ -355,7 +355,7 @@ The request returns the following result: <1> Output only "keyword" attribute, since specify "attributes" in the request. [[tokens-limit-settings]] -===== Settings to prevent tokens explosion +===== Setting a token limit Generating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced: From 2607193d83efcd3eee4d9cf231fc6591056bd58d Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Wed, 28 Aug 2019 08:47:01 -0400 Subject: [PATCH 8/9] Update docs/reference/indices/analyze.asciidoc Co-Authored-By: debadair --- docs/reference/indices/analyze.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index 7f35301651ad4..d940cd663aca7 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -115,7 +115,7 @@ See <> for a list of token filters. `normalizer`:: (Optional, string) -Normalizer used to convert text into a single token. +Normalizer to use to convert text into a single token. See <> for a list of normalizers. `text`:: From 8d1a617f122f56f04fbce65c689479e09106702f Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Wed, 28 Aug 2019 08:47:08 -0400 Subject: [PATCH 9/9] Update docs/reference/indices/analyze.asciidoc Co-Authored-By: debadair --- docs/reference/indices/analyze.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/indices/analyze.asciidoc b/docs/reference/indices/analyze.asciidoc index d940cd663aca7..b48243c6b59f6 100644 --- a/docs/reference/indices/analyze.asciidoc +++ b/docs/reference/indices/analyze.asciidoc @@ -110,7 +110,7 @@ the analyze API uses the <>. `filter`:: (Optional, Array of strings) -Array of token filters used to change tokens after the tokenizer. +Array of token filters used to apply after the tokenizer. See <> for a list of token filters. `normalizer`::