-
Notifications
You must be signed in to change notification settings - Fork 36
not return estimated minutes remaining until cold start is finished #210
not return estimated minutes remaining until cold start is finished #210
Conversation
…ld start is finished Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this. Testing done: 1. Manually verified the problem is fixed. 2. added unit test for the issue.
Codecov Report
@@ Coverage Diff @@
## master #210 +/- ##
============================================
- Coverage 72.18% 72.16% -0.03%
- Complexity 1275 1282 +7
============================================
Files 140 140
Lines 5914 5938 +24
Branches 463 468 +5
============================================
+ Hits 4269 4285 +16
- Misses 1441 1444 +3
- Partials 204 209 +5
Flags with carried forward coverage won't be shown. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted comment
@@ -391,19 +399,26 @@ private void processInitResponse( | |||
String detectorId, | |||
Set<ProfileName> profilesToCollect, | |||
MultiResponsesDelegateActionListener<DetectorProfile> listener, | |||
long totalUpdates | |||
long totalUpdates, | |||
boolean hideMinutesLeft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be hide estimate, which affects minutes left and data points needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you meant the variable name should be hideEstimate? It only affect minutes left, not data points needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the data needed and time needed together. Before model training is done, they should be hidden together by a message such as "Model training is starting soon". Once that is done, the detailed information on needed data and time is now shown, such as "need x data and y time".
It'd be confusing to see "need x data and no time". What does the ux look like in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as ux goes, we will probably add an intermediate callout saying something like "attempting to initialize with existing/historical data", and show that callout until we can retrieve minutesLeft
from the backend profile call. Then we show the regular progress bar callout, which includes the "need x data and y time" info. See Kibana issue here.
I guess it's kind of arbitrary on if we hide minutes left and/or data points left, but need some way to indicate on the frontend that the cold start process isn't finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial temporal call out is informative enough. Before a model is trained, both data and time that the (non-existent) model will need to wait for should be equally unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a separate callout, both of these would be ignored as far as frontend is concerned. Wouldn't we want a different callout in this case? Seems that we wouldn't want to show 0% progress either, when that may immediately go to 100% and disappear if there is sufficient historical data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call out for the case before model training may just be "attempting to initialize with existing/historical data"/"Model training is starting soon", no data/time/progress needed to cause confusion or panic (yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, UX does not show needed shingles. They only look at progress and estimated time. Second, as an API user, I don't feel it causes confusion when we show needed shingles. For example, one minute ago, we need 128 shingles. After 1 minute, we need 0. It means we find enough training data. At that particular moment, the needed shingles are accurate. If cold start is stuck or retrying, the number also gives clarification on the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not apply a more consistent logic to make code more maintainable and understandable? that is before a model is created, the time and data it needs to wait is either both unknown or both guessed. Or, what's the benefit of having mixed results from different logic for each field?
this is a question not a blocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data it needs to wait is not guessed. The time is. Without a checkpoint, it is accurate to say we need 128 shingles.
The benefit is provide transparency instead of guessing.
Thanks for not blocking.
src/main/java/com/amazon/opendistroforelasticsearch/ad/AnomalyDetectorProfileRunner.java
Show resolved
Hide resolved
if (requiredSamples <= 0) { | ||
throw new IllegalArgumentException("required samples should be a positive number, but was " + requiredSamples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing. the testing of this condition is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
…210) * Change profile API to not return estimated minutes remaining until cold start is finished Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this. Testing done: 1. Manually verified the problem is fixed. 2. added unit test for the issue.
Issue #, if available:
#198
Description of changes:
Currently, the progress bar on AD Kibana will show the estimated time remaining to initialize a detector. This can be confusing if this message is displayed before cold start is finished, where the actual initialization time may be much shorter if sufficient historical data exists. This PR changes the profile api to not return any estimated time left until the cold start is finished to prevent this.
Testing done:
Before the PR:
[email protected]: ~
% curl -XGET "http://localhost:9200/_opendistro/_anomaly_detection/detectors/F5nQ4HMBALosA6jhE7Ni/_profile?_all=true&pretty"
{
"state" : "INIT",
"shingle_size" : 0,
"coordinating_node" : "",
"total_size_in_bytes" : 0,
"init_progress" : {
"percentage" : "0%",
"estimated_minutes_left" : 128,
"needed_shingles" : 128
}
}
After the PR:
[email protected]: ~
% curl -XGET "http://localhost:9200/_opendistro/_anomaly_detection/detectors/FO7a4HMBTu_4lKyQCdgs/_profile?_all=true&pretty"
{
"state" : "INIT",
"shingle_size" : 0,
"coordinating_node" : "",
"total_size_in_bytes" : 0,
"init_progress" : {
"percentage" : "0%",
"needed_shingles" : 128
}
}
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.