From 0ce5cef1049280c88deb091c45cfa06f743b2015 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Mon, 3 Jun 2019 16:44:17 +0200 Subject: [PATCH 1/5] [Docs] Clarify caveats for phonetic filters replace option The `replace` option in the phonetic token filter can have suprising side effects, e.g. such as described in #26921. This PR adds a note to be mindful about such scenarios and offers alternatives to using the `replace` option. Closes #26921 --- docs/plugins/analysis-phonetic.asciidoc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/plugins/analysis-phonetic.asciidoc b/docs/plugins/analysis-phonetic.asciidoc index e22f819e1eb3e..f1b768169a89d 100644 --- a/docs/plugins/analysis-phonetic.asciidoc +++ b/docs/plugins/analysis-phonetic.asciidoc @@ -65,6 +65,14 @@ GET phonetic_sample/_analyze <1> Returns: `J`, `joe`, `BLKS`, `bloggs` +It is important to note that `"replace": false` can lead to unexpected behaviour since +the original and the phonetic version are both kept at the same token location. Some +queries, e.g. the `match` query with applied fuzzyness, ignore one of these two token +versions. This can lead to issues that are difficult to diagnose and reason about. +For this reason, it is often beneficial to use separate fields for analysis with and +without phonetic filtering. That way searches can be run against both fields with differing +boosts and trade-offs (e.g. only run fuzzy queries on the original text field, but not the +phonetic version). [float] ===== Double metaphone settings From 3d595e26c0efe23f6f00225b077a6f93314379c0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Tue, 4 Jun 2019 10:24:46 +0200 Subject: [PATCH 2/5] Fixing typos --- docs/plugins/analysis-phonetic.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/plugins/analysis-phonetic.asciidoc b/docs/plugins/analysis-phonetic.asciidoc index f1b768169a89d..1af6a39fcf8ee 100644 --- a/docs/plugins/analysis-phonetic.asciidoc +++ b/docs/plugins/analysis-phonetic.asciidoc @@ -65,9 +65,9 @@ GET phonetic_sample/_analyze <1> Returns: `J`, `joe`, `BLKS`, `bloggs` -It is important to note that `"replace": false` can lead to unexpected behaviour since +It is important to note that `"replace": false` can lead to unexpected behavior since the original and the phonetic version are both kept at the same token location. Some -queries, e.g. the `match` query with applied fuzzyness, ignore one of these two token +queries, e.g. the `match` query with applied fuzziness, ignore one of these two token versions. This can lead to issues that are difficult to diagnose and reason about. For this reason, it is often beneficial to use separate fields for analysis with and without phonetic filtering. That way searches can be run against both fields with differing From 7058f49d9a92c865d755b2d429c9b5b25eb509ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Tue, 4 Jun 2019 10:42:41 +0200 Subject: [PATCH 3/5] Rephrase --- docs/plugins/analysis-phonetic.asciidoc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/plugins/analysis-phonetic.asciidoc b/docs/plugins/analysis-phonetic.asciidoc index 1af6a39fcf8ee..80b433eaa9bf7 100644 --- a/docs/plugins/analysis-phonetic.asciidoc +++ b/docs/plugins/analysis-phonetic.asciidoc @@ -66,13 +66,13 @@ GET phonetic_sample/_analyze <1> Returns: `J`, `joe`, `BLKS`, `bloggs` It is important to note that `"replace": false` can lead to unexpected behavior since -the original and the phonetic version are both kept at the same token location. Some -queries, e.g. the `match` query with applied fuzziness, ignore one of these two token -versions. This can lead to issues that are difficult to diagnose and reason about. -For this reason, it is often beneficial to use separate fields for analysis with and -without phonetic filtering. That way searches can be run against both fields with differing -boosts and trade-offs (e.g. only run fuzzy queries on the original text field, but not the -phonetic version). +the original and the phonetically analyzed version are both kept at the same token position. +Some queries handle these stacked tokens in special ways. For example, the fuzzy `match` +query does not apply <> to stacked synonym tokens. This can lead to issues that are +difficult to diagnose and reason about. For this reason, it is often beneficial to use separate +fields for analysis with and without phonetic filtering. That way searches can be run against +both fields with differing boosts and trade-offs (e.g. only run a fuzzy `match` query on the +original text field, but not on the phonetic version). [float] ===== Double metaphone settings From 414c55c06dbbd5d04dbe43a5f36f2f98521070a0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Wed, 5 Jun 2019 11:17:39 +0200 Subject: [PATCH 4/5] Add note to match-query docs --- docs/reference/query-dsl/match-query.asciidoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/reference/query-dsl/match-query.asciidoc b/docs/reference/query-dsl/match-query.asciidoc index 4b998d82cda24..4fcb40a76ec9c 100644 --- a/docs/reference/query-dsl/match-query.asciidoc +++ b/docs/reference/query-dsl/match-query.asciidoc @@ -75,7 +75,8 @@ rewritten. Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled by setting `fuzzy_transpositions` to `false`. -Note that fuzzy matching is not applied to terms with synonyms, as under the hood +NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the +analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion. From a90185942d0d2e65742ede1762746109f6e31535 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Wed, 5 Jun 2019 11:30:09 +0200 Subject: [PATCH 5/5] Adding cross-book link --- docs/plugins/analysis-phonetic.asciidoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/plugins/analysis-phonetic.asciidoc b/docs/plugins/analysis-phonetic.asciidoc index 80b433eaa9bf7..3627751670a32 100644 --- a/docs/plugins/analysis-phonetic.asciidoc +++ b/docs/plugins/analysis-phonetic.asciidoc @@ -68,11 +68,11 @@ GET phonetic_sample/_analyze It is important to note that `"replace": false` can lead to unexpected behavior since the original and the phonetically analyzed version are both kept at the same token position. Some queries handle these stacked tokens in special ways. For example, the fuzzy `match` -query does not apply <> to stacked synonym tokens. This can lead to issues that are -difficult to diagnose and reason about. For this reason, it is often beneficial to use separate -fields for analysis with and without phonetic filtering. That way searches can be run against -both fields with differing boosts and trade-offs (e.g. only run a fuzzy `match` query on the -original text field, but not on the phonetic version). +query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens. +This can lead to issues that are difficult to diagnose and reason about. For this reason, it +is often beneficial to use separate fields for analysis with and without phonetic filtering. +That way searches can be run against both fields with differing boosts and trade-offs (e.g. +only run a fuzzy `match` query on the original text field, but not on the phonetic version). [float] ===== Double metaphone settings