Skip to content

Commit

Permalink
[DOCS] Reformat trim token filter docs (#51649)
Browse files Browse the repository at this point in the history
Makes the following changes to the `trim` token filter docs:

* Updates description
* Adds a link to the related Lucene filter
* Adds tip about removing whitespace using tokenizers
* Adds detailed analyze snippets
* Adds custom analyzer snippet
  • Loading branch information
jrodewig authored Mar 2, 2020
1 parent 340c08a commit 996ec0d
Showing 1 changed file with 104 additions and 1 deletion.
105 changes: 104 additions & 1 deletion docs/reference/analysis/tokenfilters/trim-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,107 @@
<titleabbrev>Trim</titleabbrev>
++++

The `trim` token filter trims the whitespace surrounding a token.
Removes leading and trailing whitespace from each token in a stream.

The `trim` filter uses Lucene's
https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html[TrimFilter].

[TIP]
====
Many commonly used tokenizers, such as the
<<analysis-standard-tokenizer,`standard`>> or
<<analysis-whitespace-tokenizer,`whitespace`>> tokenizer, remove whitespace by
default. When using these tokenizers, you don't need to add a separate `trim`
filter.
====

[[analysis-trim-tokenfilter-analyze-ex]]
==== Example

To see how the `trim` filter works, you first need to produce a token
containing whitespace.

The following <<indices-analyze,analyze API>> request uses the
<<analysis-keyword-tokenizer,`keyword`>> tokenizer to produce a token for
`" fox "`.

[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"text" : " fox "
}
----

The API returns the following response. Note the `" fox "` token contains
the original text's whitespace.

[source,console-result]
----
{
"tokens": [
{
"token": " fox ",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----

To remove the whitespace, add the `trim` filter to the previous analyze API
request.

[source,console]
----
GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["trim"],
"text" : " fox "
}
----

The API returns the following response. The returned `fox` token does not
include any leading or trailing whitespace.

[source,console-result]
----
{
"tokens": [
{
"token": "fox",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
----

[[analysis-trim-tokenfilter-analyzer-ex]]
==== Add to an analyzer

The following <<indices-create-index,create index API>> request uses the `trim`
filter to configure a new <<analysis-custom-analyzer,custom analyzer>>.

[source,console]
----
PUT trim_example
{
"settings": {
"analysis": {
"analyzer": {
"keyword_trim": {
"tokenizer": "keyword",
"filter": [ "trim" ]
}
}
}
}
}
----

0 comments on commit 996ec0d

Please sign in to comment.