Skip to content

Commit

Permalink
[Docs] Clarify caveats for phonetic filters replace option (#42807)
Browse files Browse the repository at this point in the history
The `replace` option in the phonetic token filter can have suprising side
effects, e.g. such as described in #26921. This PR adds a note to be mindful
about such scenarios and offers alternatives to using the `replace` option.

Closes #26921
  • Loading branch information
Christoph Büscher authored Jun 5, 2019
1 parent 60c8fc1 commit ffc5534
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
8 changes: 8 additions & 0 deletions docs/plugins/analysis-phonetic.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ GET phonetic_sample/_analyze

<1> Returns: `J`, `joe`, `BLKS`, `bloggs`

It is important to note that `"replace": false` can lead to unexpected behavior since
the original and the phonetically analyzed version are both kept at the same token position.
Some queries handle these stacked tokens in special ways. For example, the fuzzy `match`
query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens.
This can lead to issues that are difficult to diagnose and reason about. For this reason, it
is often beneficial to use separate fields for analysis with and without phonetic filtering.
That way searches can be run against both fields with differing boosts and trade-offs (e.g.
only run a fuzzy `match` query on the original text field, but not on the phonetic version).

[float]
===== Double metaphone settings
Expand Down
3 changes: 2 additions & 1 deletion docs/reference/query-dsl/match-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ rewritten.
Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled
by setting `fuzzy_transpositions` to `false`.

Note that fuzzy matching is not applied to terms with synonyms, as under the hood
NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the
analysis process produces multiple tokens at the same position. Under the hood
these terms are expanded to a special synonym query that blends term frequencies,
which does not support fuzzy expansion.

Expand Down

0 comments on commit ffc5534

Please sign in to comment.