not return estimated minutes remaining until cold start is finished #210

kaituo · 2020-08-12T16:52:21Z

Issue #, if available:
#198

Description of changes:
Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this.

Testing done:

Manually verified the problem is fixed.
added unit test for the issue.

Before the PR:
[email protected]: ~
% curl -XGET "http://localhost:9200/_opendistro/_anomaly_detection/detectors/F5nQ4HMBALosA6jhE7Ni/_profile?_all=true&pretty"
{
"state" : "INIT",
"shingle_size" : 0,
"coordinating_node" : "",
"total_size_in_bytes" : 0,
"init_progress" : {
"percentage" : "0%",
"estimated_minutes_left" : 128,
"needed_shingles" : 128
}
}

After the PR:
[email protected]: ~
% curl -XGET "http://localhost:9200/_opendistro/_anomaly_detection/detectors/FO7a4HMBTu_4lKyQCdgs/_profile?_all=true&pretty"
{
"state" : "INIT",
"shingle_size" : 0,
"coordinating_node" : "",
"total_size_in_bytes" : 0,
"init_progress" : {
"percentage" : "0%",
"needed_shingles" : 128
}
}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ld start is finished Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this. Testing done: 1. Manually verified the problem is fixed. 2. added unit test for the issue.

codecov · 2020-08-12T16:53:44Z

Codecov Report

Merging #210 into master will decrease coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #210      +/-   ##
============================================
- Coverage     72.18%   72.16%   -0.03%     
- Complexity     1275     1282       +7     
============================================
  Files           140      140              
  Lines          5914     5938      +24     
  Branches        463      468       +5     
============================================
+ Hits           4269     4285      +16     
- Misses         1441     1444       +3     
- Partials        204      209       +5

Flag	Coverage Δ	Complexity Δ
#cli	`79.74% <ø> (ø)`	`0.00 <ø> (ø)`
#plugin	`71.27% <100.00%> (-0.03%)`	`1282.00 <1.00> (+7.00)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
...elasticsearch/ad/AnomalyDetectorProfileRunner.java	`69.94% <100.00%> (+1.26%)`	`38.00 <1.00> (+3.00)`
...istroforelasticsearch/ad/util/ColdStartRunner.java	`80.00% <0.00%> (-16.67%)`	`7.00% <0.00%> (-2.00%)`
...asticsearch/ad/cluster/ADClusterEventListener.java	`88.00% <0.00%> (-4.00%)`	`13.00% <0.00%> (-1.00%)`
...search/ad/rest/RestIndexAnomalyDetectorAction.java	`54.05% <0.00%> (-1.51%)`	`3.00% <0.00%> (ø%)`
...sticsearch/ad/indices/AnomalyDetectionIndices.java	`61.59% <0.00%> (-0.73%)`	`23.00% <0.00%> (-1.00%)`
...stroforelasticsearch/ad/AnomalyDetectorPlugin.java	`93.61% <0.00%> (ø)`	`10.00% <0.00%> (ø%)`
...ticsearch/ad/settings/AnomalyDetectorSettings.java	`100.00% <0.00%> (ø)`	`1.00% <0.00%> (ø%)`
...est/handler/IndexAnomalyDetectorActionHandler.java	`0.00% <0.00%> (ø)`	`0.00% <0.00%> (ø%)`
...opendistroforelasticsearch/ad/ml/ModelManager.java	`92.05% <0.00%> (+0.01%)`	`112.00% <0.00%> (ø%)`
...stroforelasticsearch/ad/model/AnomalyDetector.java	`88.88% <0.00%> (+0.75%)`	`46.00% <0.00%> (+5.00%)`
... and 1 more

wnbts

deleted comment

wnbts · 2020-08-12T23:23:13Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorProfileRunner.java

@@ -391,19 +399,26 @@ private void processInitResponse(
        String detectorId,
        Set<ProfileName> profilesToCollect,
        MultiResponsesDelegateActionListener<DetectorProfile> listener,
-        long totalUpdates
+        long totalUpdates,
+        boolean hideMinutesLeft


this should be hide estimate, which affects minutes left and data points needed.

you meant the variable name should be hideEstimate? It only affect minutes left, not data points needed.

I see the data needed and time needed together. Before model training is done, they should be hidden together by a message such as "Model training is starting soon". Once that is done, the detailed information on needed data and time is now shown, such as "need x data and y time".

It'd be confusing to see "need x data and no time". What does the ux look like in this case?

As far as ux goes, we will probably add an intermediate callout saying something like "attempting to initialize with existing/historical data", and show that callout until we can retrieve minutesLeft from the backend profile call. Then we show the regular progress bar callout, which includes the "need x data and y time" info. See Kibana issue here.

I guess it's kind of arbitrary on if we hide minutes left and/or data points left, but need some way to indicate on the frontend that the cold start process isn't finished.

The initial temporal call out is informative enough. Before a model is trained, both data and time that the (non-existent) model will need to wait for should be equally unknown.

If we have a separate callout, both of these would be ignored as far as frontend is concerned. Wouldn't we want a different callout in this case? Seems that we wouldn't want to show 0% progress either, when that may immediately go to 100% and disappear if there is sufficient historical data.

The call out for the case before model training may just be "attempting to initialize with existing/historical data"/"Model training is starting soon", no data/time/progress needed to cause confusion or panic (yet).

First, UX does not show needed shingles. They only look at progress and estimated time. Second, as an API user, I don't feel it causes confusion when we show needed shingles. For example, one minute ago, we need 128 shingles. After 1 minute, we need 0. It means we find enough training data. At that particular moment, the needed shingles are accurate. If cold start is stuck or retrying, the number also gives clarification on the state.

why not apply a more consistent logic to make code more maintainable and understandable? that is before a model is created, the time and data it needs to wait is either both unknown or both guessed. Or, what's the benefit of having mixed results from different logic for each field?

this is a question not a blocker

The data it needs to wait is not guessed. The time is. Without a checkpoint, it is accurate to say we need 128 shingles.

The benefit is provide transparency instead of guessing.

Thanks for not blocking.

src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorProfileRunner.java

wnbts · 2020-08-13T23:05:27Z

src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorProfileRunner.java

+        if (requiredSamples <= 0) {
+            throw new IllegalArgumentException("required samples should be a positive number, but was " + requiredSamples);


testing. the testing of this condition is missing.

yizheliu-amazon

Looks good to me.

…210) * Change profile API to not return estimated minutes remaining until cold start is finished Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this. Testing done: 1. Manually verified the problem is fixed. 2. added unit test for the issue.

kaituo requested review from yizheliu-amazon and ohltyler August 12, 2020 16:52

kaituo added the enhancement New feature or request label Aug 12, 2020

wnbts reviewed Aug 12, 2020

View reviewed changes

ohltyler reviewed Aug 13, 2020

View reviewed changes

src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorProfileRunner.java Show resolved Hide resolved

throw exception if passed in requiredSamples is <= 0

db10243

ohltyler approved these changes Aug 13, 2020

View reviewed changes

wnbts reviewed Aug 13, 2020

View reviewed changes

yizheliu-amazon approved these changes Aug 14, 2020

View reviewed changes

Added test case for invalid required samples

84f7cd1

kaituo merged commit fed3b78 into opendistro-for-elasticsearch:master Aug 18, 2020

ohltyler mentioned this pull request Aug 24, 2020

Add intermediate callout message during cold start opendistro-for-elasticsearch/anomaly-detection-kibana-plugin#283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not return estimated minutes remaining until cold start is finished #210

not return estimated minutes remaining until cold start is finished #210

kaituo commented Aug 12, 2020

codecov bot commented Aug 12, 2020 •

edited

Loading

wnbts left a comment •

edited

Loading

wnbts Aug 12, 2020

kaituo Aug 13, 2020

wnbts Aug 13, 2020

ohltyler Aug 13, 2020 •

edited

Loading

wnbts Aug 13, 2020

ohltyler Aug 13, 2020

wnbts Aug 14, 2020

kaituo Aug 17, 2020 •

edited

Loading

wnbts Aug 17, 2020

kaituo Aug 18, 2020

wnbts Aug 13, 2020

kaituo Aug 17, 2020

yizheliu-amazon left a comment

		if (requiredSamples <= 0) {
		throw new IllegalArgumentException("required samples should be a positive number, but was " + requiredSamples);

not return estimated minutes remaining until cold start is finished #210

not return estimated minutes remaining until cold start is finished #210

Conversation

kaituo commented Aug 12, 2020

codecov bot commented Aug 12, 2020 • edited Loading

Codecov Report

wnbts left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ohltyler Aug 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaituo Aug 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yizheliu-amazon left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 12, 2020 •

edited

Loading

wnbts left a comment •

edited

Loading

ohltyler Aug 13, 2020 •

edited

Loading

kaituo Aug 17, 2020 •

edited

Loading