Skip to content

Commit

Permalink
[DOCS] Anomaly detection: Visualize delayed data (#75098)
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl authored Jul 14, 2021
1 parent 4f30201 commit c8c7f0e
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ cluster.

== Why worry about delayed data?

This is a particularly prescient question. If data are delayed randomly (and
consequently are missing from analysis), the results of certain types of
functions are not really affected. In these situations, it all comes out okay in
the end as the delayed data is distributed randomly. An example would be a `mean`
metric for a field in a large collection of data. In this case, checking for
delayed data may not provide much benefit. If data are consistently delayed,
however, {anomaly-jobs} with a `low_count` function may provide false positives.
In this situation, it would be useful to see if data comes in after an anomaly is
recorded so that you can determine a next course of action.
If data are delayed randomly (and consequently are missing from analysis), the
results of certain types of functions are not really affected. In these
situations, it all comes out okay in the end as the delayed data is distributed
randomly. An example would be a `mean` metric for a field in a large collection
of data. In this case, checking for delayed data may not provide much benefit.
If data are consistently delayed, however, {anomaly-jobs} with a `low_count`
function may provide false positives. In this situation, it would be useful to
see if data comes in after an anomaly is recorded so that you can determine a
next course of action.

== How do we detect delayed data?

Expand All @@ -40,7 +40,16 @@ of the associated {anomaly-job}. The `doc_count` of those buckets are then
compared with the job's finalized analysis buckets to see whether any data has
arrived since the analysis. If there is indeed missing data due to their ingest
delay, the end user is notified. For example, you can see annotations in {kib}
for the periods where these delays occur.
for the periods where these delays occur:

[role="screenshot"]
image::images/ml-annotations.png["Delayed data annotations in the Single Metric Viewer"]

There is another tool for visualizing the delayed data on the *Annotations* tab
in the {anomaly-detect} job management page:

[role="screenshot"]
image::images/ml-datafeed-chart.png["Delayed data in the {dfeed} chart"]

== What to do about delayed data?

Expand All @@ -50,4 +59,4 @@ delayed data is too great or the situation calls for it, the next course of
action to consider is to increase the `query_delay` of the datafeed. This
increased delay allows more time for data to be indexed. If you have real-time
constraints, however, an increased delay might not be desirable. In which case,
you would have to {ref}/tune-for-indexing-speed.html[tune for better indexing speed].
you would have to {ref}/tune-for-indexing-speed.html[tune for better indexing speed].
Binary file added docs/reference/ml/images/ml-annotations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/reference/ml/images/ml-datafeed-chart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c8c7f0e

Please sign in to comment.