Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Reformat trim token filter docs #51649

Merged
merged 1 commit into from
Mar 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 104 additions & 1 deletion docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,107 @@
<titleabbrev>Trim</titleabbrev>
++++

The `trim` token filter trims the whitespace surrounding a token.
Removes leading and trailing whitespace from each token in a stream.

The `trim` filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].

[TIP]
====
Many commonly used tokenizers, such as the
<<analysis-standard-tokenizer,`standard`>> or
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
default. When using these tokenizers, you don't need to add a separate `trim`
filter.
====

[[analysis-trim-tokenfilter-analyze-ex]]
==== Example

To see how the `trim` filter works, you first need to produce a token
containing whitespace.

The following <<indices-analyze,analyze API>> request uses the
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
`" fox "`.

[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"text" : " fox "
}
----

The API returns the following response. Note the `" fox "` token contains
the original text's whitespace.

[source,console-result]
----
{
"tokens": [
{
"token": " fox ",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----

To remove the whitespace, add the `trim` filter to the previous analyze API
request.

[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["trim"],
"text" : " fox "
}
----

The API returns the following response. The returned `fox` token does not
include any leading or trailing whitespace.

[source,console-result]
----
{
"tokens": [
{
"token": "fox",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----

[[analysis-trim-tokenfilter-analyzer-ex]]
==== Add to an analyzer

The following <<indices-create-index,create index API>> request uses the `trim`
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.

[source,console]
----
PUT trim_example
{
"settings": {
"analysis": {
"analyzer": {
"keyword_trim": {
"tokenizer": "keyword",
"filter": [ "trim" ]
}
}
}
}
}
----