elastic · szabosteve · Jan 15, 2020 · Jan 15, 2020 · Jan 15, 2020 · Jan 15, 2020
diff --git a/docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc b/docs/en/stack/ml/anomaly-detection/categorization-data.asciidoc
@@ -0,0 +1,22 @@
+[role="xpack"]
+[[ml-datatypes-categorization]]
+=== Data types and categorization
+
+Categorization is a {ml} process that considers a tokenization of a field, 
+clusters similar data together, and classifies them into categories. However, 
+categorization doesn't work equally well on different data types. It works 
+best on machine-written messages and application outputs, typically on data that 
+consists of repeated elements, for example log messages for the purpose of 
+system troubleshooting. Log categorization groups unstructured log messages into 
+categories, then you can use {anomaly-detect} to model and identify rare or 
+unusual counts of log message categories. For more information about the 
+process, see 
+{ml-docs}/ml-configuring-categories.html[Categorizing log messages].
+
+Categorization is tuned to work best on data like log messages by taking token
+order into account, not considering synonyms, and including stop words in its analysis.
+Complete sentences in human communication or literary text (for example emails, 
+wiki pages, prose, or other human generated content) can be extremely diverse in 
+structure.  Since categorization is tuned for machine data it will give poor results on such human generated data. 
+For example, the categorization job would create so many categories that 
+couldn't be handled effectively.  Categorization is _not_ natural language processing (NLP).
diff --git a/docs/en/stack/ml/anomaly-detection/overview.asciidoc b/docs/en/stack/ml/anomaly-detection/overview.asciidoc
@@ -6,3 +6,4 @@ include::analyzing.asciidoc[]
 
 include::forecasting.asciidoc[]
 
+include::categorization-data.asciidoc[]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,3 +6,4 @@ include::analyzing.asciidoc[]

		include::forecasting.asciidoc[]

		include::categorization-data.asciidoc[]