diff --git a/_analyzers/custom-analyzer.md b/_analyzers/custom-analyzer.md
index b808268f66..c456f3d826 100644
--- a/_analyzers/custom-analyzer.md
+++ b/_analyzers/custom-analyzer.md
@@ -1,7 +1,7 @@
---
layout: default
title: Creating a custom analyzer
-nav_order: 90
+nav_order: 40
parent: Analyzers
---
diff --git a/_analyzers/language-analyzers/index.md b/_analyzers/language-analyzers/index.md
index 89a4a42254..cc53c1cdac 100644
--- a/_analyzers/language-analyzers/index.md
+++ b/_analyzers/language-analyzers/index.md
@@ -1,7 +1,7 @@
---
layout: default
title: Language analyzers
-nav_order: 100
+nav_order: 140
parent: Analyzers
has_children: true
has_toc: true
diff --git a/_analyzers/fingerprint.md b/_analyzers/supported-analyzers/fingerprint.md
similarity index 98%
rename from _analyzers/fingerprint.md
rename to _analyzers/supported-analyzers/fingerprint.md
index dd8027f037..267e16c039 100644
--- a/_analyzers/fingerprint.md
+++ b/_analyzers/supported-analyzers/fingerprint.md
@@ -1,7 +1,8 @@
---
layout: default
title: Fingerprint analyzer
-nav_order: 110
+parent: Analyzers
+nav_order: 60
---
# Fingerprint analyzer
diff --git a/_analyzers/supported-analyzers/index.md b/_analyzers/supported-analyzers/index.md
index 43e41b8d6a..b54660478f 100644
--- a/_analyzers/supported-analyzers/index.md
+++ b/_analyzers/supported-analyzers/index.md
@@ -18,14 +18,14 @@ The following table lists the built-in analyzers that OpenSearch provides. The l
Analyzer | Analysis performed | Analyzer output
:--- | :--- | :---
-**Standard** (default) | - Parses strings into tokens at word boundaries
- Removes most punctuation
- Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-**Simple** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
-**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
-**Stop** | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Removes stop words
- Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
-**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
-**Pattern** | - Parses strings into tokens using regular expressions
- Supports converting strings to lowercase
- Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+[**Standard**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/standard/) (default) | - Parses strings into tokens at word boundaries
- Removes most punctuation
- Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
+[**Simple**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/simple/) | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
+[**Whitespace**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/whitespace/) | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
+[**Stop**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/stop/) | - Parses strings into tokens on any non-letter character
- Removes non-letter characters
- Removes stop words
- Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
+[**Keyword**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/keyword/) (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
+[**Pattern**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/pattern/)| - Parses strings into tokens using regular expressions
- Supports converting strings to lowercase
- Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/index/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
-**Fingerprint** | - Parses strings on any non-letter character
- Normalizes characters by converting them to ASCII
- Converts tokens to lowercase
- Sorts, deduplicates, and concatenates tokens into a single token
- Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`]
Note that the apostrophe was converted to its ASCII counterpart.
+[**Fingerprint**]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/fingerprint/) | - Parses strings on any non-letter character
- Normalizes characters by converting them to ASCII
- Converts tokens to lowercase
- Sorts, deduplicates, and concatenates tokens into a single token
- Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`]
Note that the apostrophe was converted to its ASCII counterpart.
## Language analyzers
@@ -37,5 +37,5 @@ The following table lists the additional analyzers that OpenSearch supports.
| Analyzer | Analysis performed |
|:---------------|:---------------------------------------------------------------------------------------------------------|
-| `phone` | An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) for parsing phone numbers. |
-| `phone-search` | A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) for parsing phone numbers. |
+| [`phone`]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/phone-analyzers/#the-phone-analyzer) | An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) for parsing phone numbers. |
+| [`phone-search`]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/phone-analyzers/#the-phone-search-analyzer) | A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) for parsing phone numbers. |
diff --git a/_analyzers/keyword.md b/_analyzers/supported-analyzers/keyword.md
similarity index 98%
rename from _analyzers/keyword.md
rename to _analyzers/supported-analyzers/keyword.md
index 3aec99d1d4..00c314d0c4 100644
--- a/_analyzers/keyword.md
+++ b/_analyzers/supported-analyzers/keyword.md
@@ -1,6 +1,7 @@
---
layout: default
title: Keyword analyzer
+parent: Analyzers
nav_order: 80
---
diff --git a/_analyzers/pattern.md b/_analyzers/supported-analyzers/pattern.md
similarity index 99%
rename from _analyzers/pattern.md
rename to _analyzers/supported-analyzers/pattern.md
index 0d67999b82..bc3cb9a306 100644
--- a/_analyzers/pattern.md
+++ b/_analyzers/supported-analyzers/pattern.md
@@ -1,6 +1,7 @@
---
layout: default
title: Pattern analyzer
+parent: Analyzers
nav_order: 90
---
diff --git a/_analyzers/supported-analyzers/phone-analyzers.md b/_analyzers/supported-analyzers/phone-analyzers.md
index f24b7cf328..d94bfe192f 100644
--- a/_analyzers/supported-analyzers/phone-analyzers.md
+++ b/_analyzers/supported-analyzers/phone-analyzers.md
@@ -1,6 +1,6 @@
---
layout: default
-title: Phone number
+title: Phone number analyzers
parent: Analyzers
nav_order: 140
---
diff --git a/_analyzers/simple.md b/_analyzers/supported-analyzers/simple.md
similarity index 98%
rename from _analyzers/simple.md
rename to _analyzers/supported-analyzers/simple.md
index edfa7f58a6..29f8f9a533 100644
--- a/_analyzers/simple.md
+++ b/_analyzers/supported-analyzers/simple.md
@@ -1,7 +1,8 @@
---
layout: default
title: Simple analyzer
-nav_order: 50
+parent: Analyzers
+nav_order: 100
---
# Simple analyzer
diff --git a/_analyzers/standard.md b/_analyzers/supported-analyzers/standard.md
similarity index 98%
rename from _analyzers/standard.md
rename to _analyzers/supported-analyzers/standard.md
index e4a7a70fbc..d5c3650d5d 100644
--- a/_analyzers/standard.md
+++ b/_analyzers/supported-analyzers/standard.md
@@ -1,7 +1,8 @@
---
layout: default
title: Standard analyzer
-nav_order: 40
+parent: Analyzers
+nav_order: 50
---
# Standard analyzer
diff --git a/_analyzers/stop.md b/_analyzers/supported-analyzers/stop.md
similarity index 99%
rename from _analyzers/stop.md
rename to _analyzers/supported-analyzers/stop.md
index 68dc554473..df62c7fe58 100644
--- a/_analyzers/stop.md
+++ b/_analyzers/supported-analyzers/stop.md
@@ -1,7 +1,8 @@
---
layout: default
title: Stop analyzer
-nav_order: 70
+parent: Analyzers
+nav_order: 110
---
# Stop analyzer
diff --git a/_analyzers/whitespace.md b/_analyzers/supported-analyzers/whitespace.md
similarity index 98%
rename from _analyzers/whitespace.md
rename to _analyzers/supported-analyzers/whitespace.md
index 67fee61295..4691b4f733 100644
--- a/_analyzers/whitespace.md
+++ b/_analyzers/supported-analyzers/whitespace.md
@@ -1,7 +1,8 @@
---
layout: default
title: Whitespace analyzer
-nav_order: 60
+parent: Analyzers
+nav_order: 120
---
# Whitespace analyzer
diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
index b06489c805..875e94db5a 100644
--- a/_analyzers/token-filters/index.md
+++ b/_analyzers/token-filters/index.md
@@ -63,5 +63,5 @@ Token filter | Underlying Lucene token filter| Description
[`truncate`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/truncate/) | [TruncateTokenFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html) | Truncates tokens with lengths exceeding the specified character limit.
[`unique`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/unique/) | N/A | Ensures that each token is unique by removing duplicate tokens from a stream.
[`uppercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/uppercase/) | [UpperCaseFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/core/LowerCaseFilter.html) | Converts tokens to uppercase.
-[`word_delimiter`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/word-delimiter/) | [WordDelimiterFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html) | Splits tokens at non-alphanumeric characters and performs normalization based on the specified rules.
-[`word_delimiter_graph`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/word-delimiter-graph/) | [WordDelimiterGraphFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html) | Splits tokens at non-alphanumeric characters and performs normalization based on the specified rules. Assigns a `positionLength` attribute to multi-position tokens.
+[`word_delimiter`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/word-delimiter/) | [WordDelimiterFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html) | Splits tokens on non-alphanumeric characters and performs normalization based on the specified rules.
+[`word_delimiter_graph`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/word-delimiter-graph/) | [WordDelimiterGraphFilter](https://lucene.apache.org/core/9_10_0/analysis/common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html) | Splits tokens on non-alphanumeric characters and performs normalization based on the specified rules. Assigns a `positionLength` attribute to multi-position tokens.
diff --git a/_analyzers/token-filters/word-delimiter-graph.md b/_analyzers/token-filters/word-delimiter-graph.md
index ac734bebeb..b901f5a0e5 100644
--- a/_analyzers/token-filters/word-delimiter-graph.md
+++ b/_analyzers/token-filters/word-delimiter-graph.md
@@ -7,7 +7,7 @@ nav_order: 480
# Word delimiter graph token filter
-The `word_delimiter_graph` token filter is used to split tokens at predefined characters and also offers optional token normalization based on customizable rules.
+The `word_delimiter_graph` token filter is used to splits token on predefined characters and also offers optional token normalization based on customizable rules.
The `word_delimiter_graph` filter is used to remove punctuation from complex identifiers like part numbers or product IDs. In such cases, it is best used with the `keyword` tokenizer. For hyphenated words, use the `synonym_graph` token filter instead of the `word_delimiter_graph` filter because users frequently search for these terms both with and without hyphens.
{: .note}
@@ -44,7 +44,7 @@ Parameter | Required/Optional | Data type | Description
`split_on_case_change` | Optional | Boolean | Splits tokens where consecutive letters have different cases (one is lowercase and the other is uppercase). For example, `"OpenSearch"` becomes `[ Open, Search ]`. Default is `true`.
`split_on_numerics` | Optional | Boolean | Splits tokens where there are consecutive letters and numbers. For example `"v8engine"` will become `[ v, 8, engine ]`. Default is `true`.
`stem_english_possessive` | Optional | Boolean | Removes English possessive endings, such as `'s`. Default is `true`.
-`type_table` | Optional | Array of strings | A custom map that specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For example, to treat a hyphen (`-`) as an alphanumeric character, specify `["- => ALPHA"]` so that words are not split at hyphens. Valid types are:
- `ALPHA`: alphabetical
- `ALPHANUM`: alphanumeric
- `DIGIT`: numeric
- `LOWER`: lowercase alphabetical
- `SUBWORD_DELIM`: non-alphanumeric delimiter
- `UPPER`: uppercase alphabetical
+`type_table` | Optional | Array of strings | A custom map that specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For example, to treat a hyphen (`-`) as an alphanumeric character, specify `["- => ALPHA"]` so that words are not split on hyphens. Valid types are:
- `ALPHA`: alphabetical
- `ALPHANUM`: alphanumeric
- `DIGIT`: numeric
- `LOWER`: lowercase alphabetical
- `SUBWORD_DELIM`: non-alphanumeric delimiter
- `UPPER`: uppercase alphabetical
`type_table_path` | Optional | String | Specifies a path (absolute or relative to the config directory) to a file containing a custom character map. The map specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For valid types, see `type_table`.
## Example
diff --git a/_analyzers/token-filters/word-delimiter.md b/_analyzers/token-filters/word-delimiter.md
index d820fae2a0..77a71f28fb 100644
--- a/_analyzers/token-filters/word-delimiter.md
+++ b/_analyzers/token-filters/word-delimiter.md
@@ -7,7 +7,7 @@ nav_order: 470
# Word delimiter token filter
-The `word_delimiter` token filter is used to split tokens at predefined characters and also offers optional token normalization based on customizable rules.
+The `word_delimiter` token filter is used to splits token on predefined characters and also offers optional token normalization based on customizable rules.
We recommend using the `word_delimiter_graph` filter instead of the `word_delimiter` filter whenever possible because the `word_delimiter` filter sometimes produces invalid token graphs. For more information about the differences between the two filters, see [Differences between the `word_delimiter_graph` and `word_delimiter` filters]({{site.url}}{{site.baseurl}}/analyzers/token-filters/word-delimiter-graph/#differences-between-the-word_delimiter_graph-and-word_delimiter-filters).
{: .important}
@@ -45,7 +45,7 @@ Parameter | Required/Optional | Data type | Description
`split_on_case_change` | Optional | Boolean | Splits tokens where consecutive letters have different cases (one is lowercase and the other is uppercase). For example, `"OpenSearch"` becomes `[ Open, Search ]`. Default is `true`.
`split_on_numerics` | Optional | Boolean | Splits tokens where there are consecutive letters and numbers. For example `"v8engine"` will become `[ v, 8, engine ]`. Default is `true`.
`stem_english_possessive` | Optional | Boolean | Removes English possessive endings, such as `'s`. Default is `true`.
-`type_table` | Optional | Array of strings | A custom map that specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For example, to treat a hyphen (`-`) as an alphanumeric character, specify `["- => ALPHA"]` so that words are not split at hyphens. Valid types are:
- `ALPHA`: alphabetical
- `ALPHANUM`: alphanumeric
- `DIGIT`: numeric
- `LOWER`: lowercase alphabetical
- `SUBWORD_DELIM`: non-alphanumeric delimiter
- `UPPER`: uppercase alphabetical
+`type_table` | Optional | Array of strings | A custom map that specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For example, to treat a hyphen (`-`) as an alphanumeric character, specify `["- => ALPHA"]` so that words are not split on hyphens. Valid types are:
- `ALPHA`: alphabetical
- `ALPHANUM`: alphanumeric
- `DIGIT`: numeric
- `LOWER`: lowercase alphabetical
- `SUBWORD_DELIM`: non-alphanumeric delimiter
- `UPPER`: uppercase alphabetical
`type_table_path` | Optional | String | Specifies a path (absolute or relative to the config directory) to a file containing a custom character map. The map specifies how to treat characters and whether to treat them as delimiters, which avoids unwanted splitting. For valid types, see `type_table`.
## Example
diff --git a/_analyzers/tokenizers/index.md b/_analyzers/tokenizers/index.md
index 1f9e49c855..f5b5ff0f25 100644
--- a/_analyzers/tokenizers/index.md
+++ b/_analyzers/tokenizers/index.md
@@ -56,7 +56,7 @@ Tokenizer | Description | Example
`keyword` | - No-op tokenizer
- Outputs the entire string unchanged
- Can be combined with token filters, like lowercase, to normalize terms | `My repo`
becomes
`My repo`
`pattern` | - Uses a regular expression pattern to parse text into terms on a word separator or to capture matching text as terms
- Uses [Java regular expressions](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) | `https://opensearch.org/forum`
becomes
[`https`, `opensearch`, `org`, `forum`] because by default the tokenizer splits terms at word boundaries (`\W+`)
Can be configured with a regex pattern
`simple_pattern` | - Uses a regular expression pattern to return matching text as terms
- Uses [Lucene regular expressions](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/util/automaton/RegExp.html)
- Faster than the `pattern` tokenizer because it uses a subset of the `pattern` tokenizer regular expressions | Returns an empty array by default
Must be configured with a pattern because the pattern defaults to an empty string
-`simple_pattern_split` | - Uses a regular expression pattern to split the text at matches rather than returning the matches as terms
- Uses [Lucene regular expressions](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/util/automaton/RegExp.html)
- Faster than the `pattern` tokenizer because it uses a subset of the `pattern` tokenizer regular expressions | No-op by default
Must be configured with a pattern
+`simple_pattern_split` | - Uses a regular expression pattern to split the text on matches rather than returning the matches as terms
- Uses [Lucene regular expressions](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/util/automaton/RegExp.html)
- Faster than the `pattern` tokenizer because it uses a subset of the `pattern` tokenizer regular expressions | No-op by default
Must be configured with a pattern
`char_group` | - Parses on a set of configurable characters
- Faster than tokenizers that run regular expressions | No-op by default
Must be configured with a list of characters
`path_hierarchy` | - Parses text on the path separator (by default, `/`) and returns a full path to each component in the tree hierarchy | `one/two/three`
becomes
[`one`, `one/two`, `one/two/three`]
diff --git a/_analyzers/tokenizers/pattern.md b/_analyzers/tokenizers/pattern.md
index f422d8c805..036dd9050f 100644
--- a/_analyzers/tokenizers/pattern.md
+++ b/_analyzers/tokenizers/pattern.md
@@ -11,7 +11,7 @@ The `pattern` tokenizer is a highly flexible tokenizer that allows you to split
## Example usage
-The following example request creates a new index named `my_index` and configures an analyzer with a `pattern` tokenizer. The tokenizer splits text at `-`, `_`, or `.` characters:
+The following example request creates a new index named `my_index` and configures an analyzer with a `pattern` tokenizer. The tokenizer splits text on `-`, `_`, or `.` characters:
```json
PUT /my_index
@@ -102,7 +102,7 @@ Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`pattern` | Optional | String | The pattern used to split text into tokens, specified using a [Java regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Default is `\W+`.
`flags` | Optional | String | Configures pipe-separated [flags](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary) to apply to the regular expression, for example, `"CASE_INSENSITIVE|MULTILINE|DOTALL"`.
-`group` | Optional | Integer | Specifies the capture group to be used as a token. Default is `-1` (split at a match).
+`group` | Optional | Integer | Specifies the capture group to be used as a token. Default is `-1` (split on a match).
## Example using a group parameter
diff --git a/_analyzers/tokenizers/simple-pattern-split.md b/_analyzers/tokenizers/simple-pattern-split.md
index 1fd130082e..25367f25b5 100644
--- a/_analyzers/tokenizers/simple-pattern-split.md
+++ b/_analyzers/tokenizers/simple-pattern-split.md
@@ -13,7 +13,7 @@ The tokenizer uses the matched parts of the input text (based on the regular exp
## Example usage
-The following example request creates a new index named `my_index` and configures an analyzer with a `simple_pattern_split` tokenizer. The tokenizer is configured to split text at hyphens:
+The following example request creates a new index named `my_index` and configures an analyzer with a `simple_pattern_split` tokenizer. The tokenizer is configured to split text on hyphens:
```json
PUT /my_index
diff --git a/_analyzers/tokenizers/whitespace.md b/_analyzers/tokenizers/whitespace.md
index 604eeeb6a0..fb168304a7 100644
--- a/_analyzers/tokenizers/whitespace.md
+++ b/_analyzers/tokenizers/whitespace.md
@@ -7,7 +7,7 @@ nav_order: 160
# Whitespace tokenizer
-The `whitespace` tokenizer splits text at white space characters, such as spaces, tabs, and new lines. It treats each word separated by white space as a token and does not perform any additional analysis or normalization like lowercasing or punctuation removal.
+The `whitespace` tokenizer splits text on white space characters, such as spaces, tabs, and new lines. It treats each word separated by white space as a token and does not perform any additional analysis or normalization like lowercasing or punctuation removal.
## Example usage