Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Adds text about data types and categorization to Anomaly Detection overview page #809

Closed
wants to merge 10 commits into from
Closed
22 changes: 22 additions & 0 deletions docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[role="xpack"]
[[ml-datatypes-categorization]]
=== Data types and categorization

Categorization is a {ml} process that observes the static parts of the data,
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
clusters similar data together, and classifies them into categories. However,
categorization doesn't work equally efficient on different data types. It works
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
best on machine-written messages and application outputs, typically on data that
consists of repeated elements, for example log messages for the purpose of
system troubleshooting. Log categorization groups unstructured log messages into
categories, then you can use {anomaly-detect} to model and identify rare or
unusual counts of log message categories. For more information about the
process, see
{ml-docs}/ml-configuring-categories.html[Categorizing log messages].

The reason why categorization works best on data like log messages is that they
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
have structural similarities that can be recognized easily by the {ml} model.
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
Complete sentences in human communication or literary text (for example emails,
wiki pages, prose, or other human generated content) can be extremely diverse in
structure, consequently categorization may provide poor results on such data.
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
For example, the categorization job would create so many categories that
couldn't be handled effectively.
szabosteve marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions docs/en/stack/ml/anomaly-detection/overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ include::analyzing.asciidoc[]

include::forecasting.asciidoc[]

include::categorization-data.asciidoc[]