Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add missing feature alert if recent feature data is missing #248

Conversation

yizheliu-amazon
Copy link
Contributor

@yizheliu-amazon yizheliu-amazon commented Jul 7, 2020

Issue #, if available:

Description of changes:
Add missing feature alert if recent feature data is missing

If latest 2 feature data points of any feature are missing, we show yellow warning like below
Screen Shot 2020-07-06 at 10 22 20 PM

If latest 3 feature data points of any feature are missing, we show red alert below:
Screen Shot 2020-07-07 at 10 21 11 AM

Regardless of yellow or red alert, as long as there exists missing feature data, there is annotation on the feature chart.

Screen Shot 2020-07-06 at 10 25 17 PM

Meanwhile, I will check with tech writer on wording change >> Done

When data point is missing, below is shown:
Screen Shot 2020-07-09 at 2 44 39 PM

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ylwu-amzn
Copy link
Contributor

If latest 2 feature data points of any feature are missing, we show yellow warning

Why don't show yellow warning if 1 feature data point missing? So user can know the data ingestion may have problem earlier.

@yizheliu-amazon
Copy link
Contributor Author

If latest 2 feature data points of any feature are missing, we show yellow warning

Why don't show yellow warning if 1 feature data point missing? So user can know the data ingestion may have problem earlier.

It may be too sensitive if latest 1 data point is missing, and ingestion is recovered soon. This may happen if data ingestion is delayed occasionally. If latest 2 consecutive data points are missing, it can be considered as consistent behavior and we should let user be aware of this.

@yizheliu-amazon
Copy link
Contributor Author

Issue associated with this: #249

@ohltyler ohltyler linked an issue Jul 9, 2020 that may be closed by this pull request
@ohltyler ohltyler added the enhancement Enhance current feature for better performance, user experience, etc label Jul 9, 2020
@ohltyler
Copy link
Contributor

Also, general question, is there an easy way to remove the line connecting the feature data points where data was missing in between? Feel like it shouldn't be included.
Screen Shot 2020-07-09 at 7 03 07 PM

@yizheliu-amazon
Copy link
Contributor Author

Also, general question, is there an easy way to remove the line connecting the feature data points where data was missing in between? Feel like it shouldn't be included.
Screen Shot 2020-07-09 at 7 03 07 PM

Actually this is a corner case. My local ES process somehow stopped itself during that period. Ideally it should not happen. And the connecting line is automatically drawn by Elastic chart library. Let me explore if it is possible to skip it.

@yizheliu-amazon
Copy link
Contributor Author

Also, general question, is there an easy way to remove the line connecting the feature data points where data was missing in between? Feel like it shouldn't be included.
Screen Shot 2020-07-09 at 7 03 07 PM

Actually this is a corner case. My local ES process somehow stopped itself during that period. Ideally it should not happen. And the connecting line is automatically drawn by Elastic chart library. Let me explore if it is possible to skip it.

After exploring, I didn't find good way to remove the connecting line. Issue created: #250

@ohltyler
Copy link
Contributor

Also, general question, is there an easy way to remove the line connecting the feature data points where data was missing in between? Feel like it shouldn't be included.
Screen Shot 2020-07-09 at 7 03 07 PM

Actually this is a corner case. My local ES process somehow stopped itself during that period. Ideally it should not happen. And the connecting line is automatically drawn by Elastic chart library. Let me explore if it is possible to skip it.

After exploring, I didn't find good way to remove the connecting line. Issue created: #250

Yeah, not a big deal, thanks for looking into it!

Copy link
Contributor

@ohltyler ohltyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

props.detectorInterval.interval,
getFeatureMissingAnnotationDateRange(
props.dateRange,
props.detectorEnabledTime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will only show missing data alerts from latest detector enabled time? Is it possible we show alerts for data before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only since latest enabled time. Ideally, we should show alert after detector 1st enabled time, and we should not show data during time period when detector disabled. However, there is no way to keep the 1st enabled time, also no way to keep track of which time period detector is disabled. If detector is disabled/enabled multiple times, it is hard for us to ignore the disabled time periods. That's why we only show alerts since latest enabled time.

// If array size is 100K, `findAnomalyWithMaxAnomalyGrade`
// takes less than 2ms by average, while `Array#reduce`
// takes about 16ms by average and`Array#sort`
const calculateSampleWindowsWithMaxDataPoints = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature data points are already sampled, should we sample again for the missing data points? Can we show missing alert annotation for every feature data points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sampled feature data points in will cause that some missing data points can not be captured, because when we sample anomaly results, we collect all the data for fixed time range, if both missing data points and existing data points are there in that range, we will not be able to catch the missing ones.

) {
const isExisting = findTimeExistsInWindow(
existingTimes,
getRoundedTimeInMin(currentTime),
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get rounded time in minute? Are the existingTimes rounded time? The feature data point shown on feature breakdown chart is using rounded time? I remember only the live chart on AD result page using rounded time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I guess I somehow changed code to use getFloorPlotTime for existingTimes, should use getRoundedTimeInMin instead. AD result page is using rounded number for anomaly. The reason to use rounded time is that some data points does not have strict time intervals between each other, usually a small number offset in timestamp, this may cause that some data points doesn't show up in expected time period due to the offset, and then we think there is missing data in that time period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will use rounded or floor time in both AD result chart and feature breakdown chart? If user deep dive into anomaly result by zooming in AD result chart and check raw data in Kibana discover, it may bring some confusion as AD result chart using rounded time. User may find raw data can't match AD result chart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion. Currently, rounded number is used for anomaly grade, confidence on AD result page, not for timestamp. Source of Math.round usage. Rounded time in this PR is for alert annotation timestamp only. It won't cause any mismatch.

: featureData
.map(feature => getFloorPlotTime(feature.startTime))
.filter(featureTime => featureTime != undefined);
for (
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time complexity of this block is O(m*n). m is intervals, n is existingTimes count. Have you tested the performance? Suggest to use binary search in function findTimeExistsInWindow as existingTimes is ordered. If the performance is ok, we can add some todo currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently with 7 days data performance seems ok. will change to use binary search or other improvement for findTimeExistsInWindow

Copy link
Contributor

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change!

@yizheliu-amazon yizheliu-amazon merged commit 5ae09a9 into opendistro-for-elasticsearch:master Jul 15, 2020
@yizheliu-amazon yizheliu-amazon deleted the feature-missing-dev branch July 15, 2020 17:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Enhance current feature for better performance, user experience, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show annotation for missing feature data
3 participants