elastic · szabosteve · Jan 15, 2020 · Jan 15, 2020 · Jan 15, 2020 · Jan 15, 2020
diff --git a/docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc b/docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc
@@ -0,0 +1,22 @@
+[role="xpack"]
+[[ml-datatypes-categorization]]
+=== Data types and categorization
+
+Categorization is a {ml} process that observes the static parts of the data, 
+clusters similar data together, and classifies them into categories. However, 
+categorization doesn't work equally efficient on different data types. It works 
+best on machine-written messages and application outputs, typically on data that 
+consists of repeated elements, for example log messages for the purpose of 
+system troubleshooting. Log categorization groups unstructured log messages into 
+categories, then you can use {anomaly-detect} to model and identify rare or 
+unusual counts of log message categories. For more information about the 
+process, see 
+{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
+
+The reason why categorization works best on data like log messages is that they 
+have structural similarities that can be recognized easily by the {ml} model. 
+Complete sentences in human communication or literary text (for example emails, 
+wiki pages, prose, or other human generated content) can be extremely diverse in 
+structure, consequently categorization may provide poor results on such data. 
+For example, the categorization job would create so many categories that 
+couldn't be handled effectively.
diff --git a/docs/en/stack/ml/anomaly-detection/overview.asciidoc b/docs/en/stack/ml/anomaly-detection/overview.asciidoc
@@ -6,3 +6,4 @@ include::analyzing.asciidoc[]
 
 include::forecasting.asciidoc[]
 
+include::categorization-data.asciidoc[]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,3 +6,4 @@ include::analyzing.asciidoc[]

		include::forecasting.asciidoc[]

		include::categorization-data.asciidoc[]