Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Reformat ASCII folding token filter docs #48143

Merged
merged 6 commits into from
Oct 23, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Strips all characters after an apostrophe, including the apostrophe itself.

This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
which was built for the Turkish language.


Expand Down
107 changes: 97 additions & 10 deletions docs/reference/analysis/tokenfilters/asciifolding-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,83 @@
[[analysis-asciifolding-tokenfilter]]
=== ASCII Folding Token Filter
=== ASCII folding token filter
++++
<titleabbrev>ASCII folding</titleabbrev>
++++

A token filter of type `asciifolding` that converts alphabetic, numeric,
and symbolic Unicode characters which are not in the first 127 ASCII
characters (the "Basic Latin" Unicode block) into their ASCII
equivalents, if one exists. Example:
Converts alphabetic, numeric, and symbolic characters that are not in the Basic
Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if
one exists. For example, the filter changes `à` to `a`.

This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].

[[analysis-asciifolding-tokenfilter-analyze-ex]]
==== Example

The following <<indices-analyze,analyze API>> request uses the `asciifolding`
filter to drop the diacritical marks in `açaí à la carte`:

jrodewig marked this conversation as resolved.
Show resolved Hide resolved
[source,console]
--------------------------------------------------
GET /_analyze
{
"tokenizer" : "standard",
"filter" : ["asciifolding"],
"text" : "açaí à la carte"
}
--------------------------------------------------

The filter produces the following tokens:

[source,text]
--------------------------------------------------
[ acai, a, la, carte ]
--------------------------------------------------

/////////////////////
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "acai",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "a",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "la",
"start_offset" : 7,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "carte",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
--------------------------------------------------
/////////////////////

[[analysis-asciifolding-tokenfilter-analyzer-ex]]
==== Add to an analyzer

The following <<indices-create-index,create index API>> request uses the
`asciifolding` filter to configure a new
<<analysis-custom-analyzer,custom analyzer>>.
jrodewig marked this conversation as resolved.
Show resolved Hide resolved

[source,console]
--------------------------------------------------
Expand All @@ -13,7 +86,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["asciifolding"]
}
Expand All @@ -23,9 +96,23 @@ PUT /asciifold_example
}
--------------------------------------------------

Accepts `preserve_original` setting which defaults to false but if true
will keep the original token as well as emit the folded token. For
example:
[[analysis-asciifolding-tokenfilter-configure-parms]]
==== Configurable parameters

`preserve_original`::
(Optional, boolean)
If `true`, emit both original tokens and folded tokens.
Defaults to `false`.
jrodewig marked this conversation as resolved.
Show resolved Hide resolved

[[analysis-asciifolding-tokenfilter-customize]]
==== Customize

To customize the `asciifolding` filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.

jrodewig marked this conversation as resolved.
Show resolved Hide resolved
For example, the following request creates a custom `asciifolding` filter with
`preserve_original` set to true:

[source,console]
--------------------------------------------------
Expand All @@ -34,7 +121,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
Expand Down