Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Fix long initialization issues in anomaly detection #133

Closed
ylwu-amzn opened this issue May 13, 2020 · 12 comments
Closed

Fix long initialization issues in anomaly detection #133

ylwu-amzn opened this issue May 13, 2020 · 12 comments
Labels
AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin enhancement Enhance current feature for better performance, user experience, etc

Comments

@ylwu-amzn
Copy link
Contributor

ylwu-amzn commented May 13, 2020

Detector initialization process needs at least 6 data points for continuous 8 intervals to complete shingle process. If no data or has not enough data, user may experience long initialization period. We should tune the error message to show something like "no data found" or "no enough data". So user can know why initialization takes a long time.

We can query feature data when create detector to make sure there is enough data. And we can add max empty query limit to stop detector.

@ylwu-amzn ylwu-amzn added the enhancement Enhance current feature for better performance, user experience, etc label May 13, 2020
@ylwu-amzn ylwu-amzn changed the title Tune callout message for long initialization Long initialization May 13, 2020
@ylwu-amzn
Copy link
Contributor Author

We should add real auto refresh. Currently the initialization state just show a loading spinner, will not send request to query latest state.

@yizheliu-amazon
Copy link
Contributor

Currently we just show message in UI if long initialization is found, but don't stop detector. In the long term, I think we should stop detector from backend if we identify long initialization, and treat it as initialization failure if possible.

@epotocko
Copy link

Could existing data in the ElasticSearch index be used to initialize the detector? From my usage, I've only seen records created while the detector is running used for anomaly detection. I would have expected it to use all data available in the index or at least a subset of recent data.

@ylwu-amzn
Copy link
Contributor Author

Could existing data in the ElasticSearch index be used to initialize the detector? From my usage, I've only seen records created while the detector is running used for anomaly detection. I would have expected it to use all data available in the index or at least a subset of recent data.

Good question. @wnbts , can you help explain?

@wnbts
Copy link
Contributor

wnbts commented May 20, 2020

Hi @epotocko , the current "initialization" page is misleading/overloaded. The system does use existing data to complete model training on the backend. However, the real time data stream might be a root cause. If you use the rest api, do you see produced results?

@epotocko
Copy link

Which API call are you referring to? Shortly after creating and starting a detector with 4+ months of data points and a 60 minute detector interval:
The _preview API returns: { "anomaly_result" : [ ]..........
The _profile API returns: { "state" : "INIT" }

I can reproduce this consistently with different data sets. The _profile API will always return an INIT state until enough "new" records are received.

@wnbts
Copy link
Contributor

wnbts commented May 21, 2020

@epotocko I was referring to get anomaly results api. Do you see anomaly results and feature values since the start of the detector? And show maybe 10~20 examples of recent results if you can to see if there are any errors. That will help us see whether it's a data stream issue or a system issue. Thanks!

@epotocko
Copy link

@wnbts Immediately after creating the detector, the get anomaly results api returns 0 hits. I checked about an hour later and there was one hit with the error "No full shingle in current detection window".

I have the detector interval set to 60 minutes. I checked the elasticsearch data and every 60 minute period has at least 50 records. Let me know if that sounds like the expected behavior.

@wnbts
Copy link
Contributor

wnbts commented May 21, 2020

@epotocko thanks so much. I understand the situation much better. It is functioning as expected. Currently, the system is trying to get 8 points from the real-time stream to actually produce results. With your configuration, it will take roughly 6~8 hours. We do have a discussion of using indexed data to speed up that data collection process. I am going to create an issue for that discussion to detail this behavior. Also keep me posted on what the results look like after 8 hours.

yizheliu-amazon added a commit that referenced this issue May 22, 2020
* Add proper message in case of long initialization. Issue:#133

* Remove 'sufficient' to avoid confusion
ohltyler pushed a commit that referenced this issue May 27, 2020
* Add proper message in case of long initialization. Issue:#133

* Remove 'sufficient' to avoid confusion
@sean-zheng-amazon sean-zheng-amazon added the AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin label Jun 10, 2020
@yizheliu-amazon
Copy link
Contributor

We should add real auto refresh. Currently the initialization state just show a loading spinner, will not send request to query latest state.

Added in PR: #232

@sean-zheng-amazon sean-zheng-amazon changed the title Long initialization Fix long initialization Jun 24, 2020
@anirudha anirudha changed the title Fix long initialization Fix long initialization issues in anomaly detection Jul 8, 2020
@ohltyler
Copy link
Contributor

With PRs #248 and #253 merged how do we feel about marking this as complete, or close it and create a new updated issue?

@ohltyler
Copy link
Contributor

Closing this issue because of the different initialization callouts and progress percentage changes that have been added.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AnomalyDetection Item related to Anomaly Detection and AD Kibana plugin enhancement Enhance current feature for better performance, user experience, etc
Projects
None yet
Development

No branches or pull requests

6 participants