Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD Enhancements in Version 2.15 #7388

Merged
merged 20 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Follow these steps to set up your local copy of the repository:

```
curl -sSL https://get.rvm.io | bash -s stable
rvm install 3.2
rvm install 3.2.4
ruby -v
```

Expand Down
52 changes: 47 additions & 5 deletions _observing-your-data/ad/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,38 @@ A detector is an individual anomaly detection task. You can define multiple dete
- Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector.
1. Specify the data source.
- For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query.
- (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported in the DSL.
kaituo marked this conversation as resolved.
Show resolved Hide resolved

**Example filter in DSL**
kaituo marked this conversation as resolved.
Show resolved Hide resolved
The query is designed to match documents that have specific values in the urlPath.keyword field. Specifically, it will match documents where the urlPath.keyword field is equal to one of the following values:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
- /domain/{id}/short
- /sub_dir/{id}/short
- /abcd/123/{id}/xyz

```json
{
"bool": {
"should": [
{
"term": {
"urlPath.keyword": "/domain/{id}/short"
}
},
{
"term": {
"urlPath.keyword": "/sub_dir/{id}/short"
}
},
{
"term": {
"urlPath.keyword": "/abcd/123/{id}/xyz"
}
}
]
}
}
```

1. Specify a timestamp.
- Select the **Timestamp field** in your index.
1. Define operation settings.
Expand All @@ -45,22 +76,33 @@ A detector is an individual anomaly detection task. You can define multiple dete
- This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay.
- For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time.
1. Specify custom result index.
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The anomaly detection plugin adds an `opensearch-ad-plugin-result-` prefix to the index name that you input. For example, if you input `abc` as the result index name, the final index name is `opensearch-ad-plugin-result-abc`.
- If you want to store the anomaly detection results in your own index, choose **Enable custom result index** and specify the custom index to store the result. The Anomaly Detection plugin automatically prefixes the index name you input with `opensearch-ad-plugin-result-`. For example, if you enter `abc` as the result index name, the final alias name will be `opensearch-ad-plugin-result-abc`. This alias points to an index with a name that includes the date and a sequence number, such as `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`.
kaituo marked this conversation as resolved.
Show resolved Hide resolved

You can use the dash “-” sign to separate the namespace to manage custom result index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the result index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area.
{: .note }

- If the custom index you specify doesn’t already exist, the Anomaly Detection plugin creates this index when you create the detector and start your real-time or historical analysis.
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
- Using a custom result index allows you to build customized dashboards. When the Security plugin (also known as Fine-grained access control) is enabled, our default result index becomes a system index. As a result, the default result index is not accessible through the standard index/search API. You must use the anomaly detection RESTful API or the Dashboard to access its content. Consequently, you cannot build a customized dashboard using the default result index if the Security plugin is enabled.
kaituo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "the Dashboard" refer to? Do you mean "use OpenSearch Dashboards?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I meant the Anomaly detection Dashboards.

- If the custom index you specify doesn’t already exist, the Anomaly Detection plugin creates this index when you create the detector and start your real-time or historical analysis.
kaituo marked this conversation as resolved.
Show resolved Hide resolved
- If the custom index already exists, the plugin checks if the index mapping of the custom index matches the anomaly result file. You need to make sure the custom index has valid mapping as shown here: [anomaly-results.json](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json).
kaituo marked this conversation as resolved.
Show resolved Hide resolved
- To use the custom result index option, you need the following permissions:
- `indices:admin/create` - If the custom index already exists, you don't need this.
- `indices:admin/create` - Required for the Anomaly Detection plugin to create and roll over the custom index.
kaituo marked this conversation as resolved.
Show resolved Hide resolved
- `indices:admin/aliases` - Required for the Anomaly Detection plugin to create and access an alias for the custom index.
kaituo marked this conversation as resolved.
Show resolved Hide resolved
- `indices:data/write/index` - You need the `write` permission for the Anomaly Detection plugin to write results into the custom index for a single-entity detector.
- `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom result indexes to show results on the anomaly detection UI.
- `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space.
- `indices:data/write/bulk*` - You need the `bulk*` permission because the Anomaly Detection plugin uses the bulk API to write results into the custom index.
- Managing the custom result index:
- The anomaly detection dashboard queries all detectors’ results from all custom result indexes. Having too many custom result indexes might impact the performance of the Anomaly Detection plugin.
- You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old result indexes. You can also manually delete or archive any old result indexes. We recommend reusing a custom result index for multiple detectors.
- The Anomaly Detection plugin can also be used to manage the lifecycle of custom indexes. It rolls an alias over to a new index when the custom result index meets any of the following conditions:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved


Parameter | Description | Type | Unit | Example | Required
:--- | :--- |:--- |:--- |:--- |:---
`result_index_min_size` | Specifies the minimum size of total primary shard storage (excluding replicas) required to roll over the index. For example, if `result_index_min_size` is set to 100 GiB and the index has 5 primary shards and 5 replica shards of 20 GiB each, the total size of all primary shards is 100 GiB, triggering the rollover. | `integer` | `MB` | `51200` | No
kaituo marked this conversation as resolved.
Show resolved Hide resolved
`result_index_min_age` | Specifies the minimum age of the index required to roll over. The index age is calculated from its creation time to the current time. | `integer` |`day` | `7` | No
kaituo marked this conversation as resolved.
Show resolved Hide resolved
`result_index_ttl` | Specifies the minimum age required to permanently delete rolled over indexes. | `integer` | `day` | `60` | No
kaituo marked this conversation as resolved.
Show resolved Hide resolved

1. Choose **Next**.

After you define the detector, the next step is to configure the model.
Expand Down
3 changes: 2 additions & 1 deletion _observing-your-data/ad/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,5 @@ plugins.anomaly_detection.dedicated_cache_size | 10 | If the real-time analysis
plugins.anomaly_detection.max_concurrent_preview | 2 | The maximum number of concurrent previews. You can use this setting to limit resource usage.
plugins.anomaly_detection.model_max_size_percent | 0.1 | The upper bound of the memory percentage for a model.
plugins.anomaly_detection.door_keeper_in_cache.enabled | False | When set to `true`, OpenSearch places a bloom filter in front of an inactive entity cache to filter out items that are not likely to appear more than once.
plugins.anomaly_detection.hcad_cold_start_interpolation.enabled | False | When set to `true`, enables interpolation in high-cardinality anomaly detection (HCAD) cold start.
plugins.anomaly_detection.hcad_cold_start_interpolation.enabled | False | When set to `true`, enables interpolation in high-cardinality anomaly detection (HCAD) cold start.
kaituo marked this conversation as resolved.
Show resolved Hide resolved
plugins.anomaly_detection.jvm_heap_usage_threshold | 95 | The JVM memory usage threshold at which anomaly detectors are disabled. Defaults to 95% of the JVM heap.
Loading