Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Reformat ASCII folding token filter docs #48143

Merged
merged 6 commits into from
Oct 23, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Strips all characters after an apostrophe, including the apostrophe itself.

This filter is included in {es}'s built-in <<turkish-analyzer,Turkish language
analyzer>>. It uses Lucene's
https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html[ApostropheFilter],
which was built for the Turkish language.


Expand Down
104 changes: 94 additions & 10 deletions docs/reference/analysis/tokenfilters/asciifolding-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,83 @@
[[analysis-asciifolding-tokenfilter]]
=== ASCII Folding Token Filter
=== ASCII folding token filter
++++
<titleabbrev>ASCII folding</titleabbrev>
++++

A token filter of type `asciifolding` that converts alphabetic, numeric,
and symbolic Unicode characters which are not in the first 127 ASCII
characters (the "Basic Latin" Unicode block) into their ASCII
equivalents, if one exists. Example:
Converts alphabetic, numeric, and symbolic Unicode characters which are not in
jrodewig marked this conversation as resolved.
Show resolved Hide resolved
the first 127 ASCII characters (the "Basic Latin" Unicode block) into their
ASCII equivalents, if one exists. For example, the filter changes `à` to `a`.

This filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html[ASCIIFoldingFilter].

[[analysis-asciifolding-tokenfilter-analyze-ex]]
==== Example

The following <<indices-analyze,analyze API>> request demonstrates how the
ASCII folding token filter works.

jrodewig marked this conversation as resolved.
Show resolved Hide resolved
[source,console]
--------------------------------------------------
GET /_analyze
{
"tokenizer" : "standard",
"filter" : ["asciifolding"],
"text" : "açaí à la carte"
}
--------------------------------------------------

The filter produces the following tokens:

[source,text]
--------------------------------------------------
[ acai, a, la, carte ]
--------------------------------------------------

/////////////////////
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "acai",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "a",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "la",
"start_offset" : 7,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "carte",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
--------------------------------------------------
/////////////////////

[[analysis-asciifolding-tokenfilter-analyzer-ex]]
==== Add to an analyzer

The following <<indices-create-index,create index API>> request uses the
ASCII folding token filter to configure a new
<<analysis-custom-analyzer,custom analyzer>>.
jrodewig marked this conversation as resolved.
Show resolved Hide resolved

[source,console]
--------------------------------------------------
Expand All @@ -13,7 +86,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["asciifolding"]
}
Expand All @@ -23,9 +96,20 @@ PUT /asciifold_example
}
--------------------------------------------------

Accepts `preserve_original` setting which defaults to false but if true
will keep the original token as well as emit the folded token. For
example:
[[analysis-asciifolding-tokenfilter-configure-parms]]
==== Configurable parameters

`preserve_original`::
(Optional, boolean)
If `true`, keep the original tokens and emit folded tokens.
Defaults to `false`.
jrodewig marked this conversation as resolved.
Show resolved Hide resolved

[[analysis-asciifolding-tokenfilter-customize]]
==== Customize

To customize the ASCII folding token filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.

jrodewig marked this conversation as resolved.
Show resolved Hide resolved
[source,console]
--------------------------------------------------
Expand All @@ -34,7 +118,7 @@ PUT /asciifold_example
"settings" : {
"analysis" : {
"analyzer" : {
"default" : {
"standard_asciifolding" : {
"tokenizer" : "standard",
"filter" : ["my_ascii_folding"]
}
Expand Down