This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 36
1.9 backport #220
Merged
yizheliu-amazon
merged 33 commits into
opendistro-for-elasticsearch:opendistro-1.9
from
yizheliu-amazon:1.9-backport
Aug 28, 2020
Merged
1.9 backport #220
yizheliu-amazon
merged 33 commits into
opendistro-for-elasticsearch:opendistro-1.9
from
yizheliu-amazon:1.9-backport
Aug 28, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add result indices retention period Currently we never delete the result index, even though customers have deleted the detector. An increasing amount of result indices can use significant disk space, as well as memory pressure due to the creation of rolled over indices. This PR adds retention period to anomaly results. We delete result indices when they are older than a retention period, which is 90 days by default. We use 90 days because that's the maximum days we allow users to view results on Kibana. Users can configure the retention period via the setting opendistro.anomaly_detection.ad_result_history_retention_period dynamically. Also, previously we roll over empty result indices. This PR fixes that by removing the max age condition of result indices. So we only roll over the result index when the maximum number of documents in the index is reached. Testing done: * manually tested result indices would be deleted after passing retention period.
…h#175) * Add dyanmic setting for memory threshold We have the following two memory threshold: * a detector's model size per node cannot exceed 10% of heap memory * the total model size per node cannot exceed 10% of heap memory. If users use the ES cluster mainly for anomaly detection, they should be able to increase the memory for AD.This PR makes the threshold dynamically adjustable via the setting opendistro.anomaly_detection.model_max_size_percent. Testing done: * Manually verified I can dynamically adjust the memory threshold setting.
…search#182) * Support security plugin enabled testing in CI This change is to support integration testing against remote security-enabled clustering in our CI. Currently, we only support the integration testing without a security plugin and have to remove the security plugin in CI. The critical change is to create an https client with basic auth in ODFERestTestCase and let our rest test case inherit from ODFERestTestCase. With default security configuration, the password for the admin user is also “admin.” For ssl certification, we choose to disable it since it’s the default demo cert used here. Testing done: * ran the scripts on CI/CD workflow on Mac and tested it works. * gradle build still passes.
* Create RFC for AD CLI This is the RFC for the Anomaly Detection CLI feature to be built on AD.
…arch#164) * Adds initialization progress to profile API This PR adds init_progress to profile API. init_progress helps users track initialization percentage, needed shingles, and estimated time to go if the future data stream is continuous (no missing data). Initialization percentage measures how far away we can observe RCF models emitting scores. The implementation fetches the RCF model's total updates while the AD job is running and materializes the value to the newly added index .opendistro-anomaly-info. Total updates record the number of times this RCF model has been updated. Initialization percent is computed as x/128: * if total updates > 128, x = 128. Otherwise, x is the total updates * 128 is our output after the number in RCF. After observing 128 samples, RCF starts emitting scores. Needed shingles are computed as 128 -x. Estimated minutes to go is computed as needed shingles * detector interval This PR also materializes the error message in the most recent run to speed up profile API's error fetching. During each AD execution, we also check if a checkpoint is there (the result is saved and maintained as other AD states), if yes, we cold start immediately. Testing done: 1. adds unit tests 2. run e2e tests to verify init_progress number makes sense.
…a full shingle (opendistro-for-elasticsearch#176) Adds feature to query data from the index when insufficient data in buffer to form a full shingle. Query results are cached.
ESAD Version 0.1 to interact with AD Plugin through cli.
* Workflow changes for AD release Update release workflow to be triggered only, when a new tag is pushed ( starts with v ) * Add badge for testing and code coverage Added flag: cli for cli project Added flag: plugin for java * Include threshold * Include flag in code covereage
…or-elasticsearch/ad-multinode Adding multinode integration test support
Change table from right aligned to left aligned.
Remove esad.
Instead of displaying headers with empty list, show message with hints.
For readability, move constants based on alphabetically.
…rch#203) Allows the model shingle size to be set per detector.
…pendistro-for-elasticsearch#210) * Change profile API to not return estimated minutes remaining until cold start is finished Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this. Testing done: 1. Manually verified the problem is fixed. 2. added unit test for the issue.
* Change to use callbacks in cold start This PR changes the code path of cold start in the transport layer to use callbacks. Previously, I created AD’s ExecutorService that has one thread for cold starts in ColdStartRunner. When we need to trigger a cold start, we can submit a task in the ExecutorService, and consult a hash map (keyed by detector Id) that cached the results of recent cold start results. Since I have to invoke the cold start thread in various callbacks, I created a cold start thread pool and put the cold start result in the transport state. This PR also handles new exceptions like invalid queries introduced by recent changes on ModelManager and FeatureManager. This PR lowers the severity of a couple of log messages in HashRing and RCFPollingTransportAction to avoid overwhelming readers of log files. These log messages are common. This PR corrects typos and updates known causes of EndRunException in comments. Testing done: 1. Simulated cold start failures: Exceptions of cold starts can be seen by the transport layer. EndRunException can cause AD jobs to be terminated. 2. Happy case of a cold start still works.
Every exported go method should have comments for readability and usage. Add appropriate commands and sample data is copied from https://opendistro.github.io/for-elasticsearch-docs/docs/ad/api
Added function and test to check get response and regenerated mocks as well
Used by GetDetectorController to deserialize response from gateway to type.
Mapper to convert gateway response to structure.
GetDetector will call gateway to get configuration GetDetectorByName will first call SearchDetector to get ID from Name pattern. Later, it will call GetDetector to get details.
Handler for cat command, this will get list of detector based on id and name pattern
Created command to accept list of detectors to concatenate and print detectors. This is similar to cat command in unix.
goimports fixes imports order and also formats like gofmt.
Codecov Report
@@ Coverage Diff @@
## opendistro-1.9 #220 +/- ##
=================================================
Coverage ? 72.36%
Complexity ? 1288
=================================================
Files ? 139
Lines ? 6073
Branches ? 469
=================================================
Hits ? 4395
Misses ? 1465
Partials ? 213
Flags with carried forward coverage won't be shown. Click here to find out more. |
CLI workflow fails. |
@yizheliu-amazon can you include this commit(9df7b59) as well to fix workflow failure? |
Thanks for pointing it out. Cherry-picked it. |
After cherry-picking new changes, all workflows pass. |
ylwu-amzn
approved these changes
Aug 28, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the change.
kaituo
approved these changes
Aug 28, 2020
VijayanB
approved these changes
Aug 28, 2020
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
Backport changes from PR #178 (c062263, commit after ODFE 1.9 upgrade) to PR #214 (942a10a, commit before ODFE 1.10 upgrade)
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.