-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mlcategory and result_type to category definition docs #60108
Labels
Comments
Pinging @elastic/ml-core (:ml) |
benwtrent
added a commit
that referenced
this issue
Oct 7, 2020
…#63326) To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`. This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword. The stored JSON is a `string`. Additionally, this commit adds a `result_type: category_definition` entry to category definition documents. This will help simplify and unify result queries in the future. closes #60108
benwtrent
added a commit
to benwtrent/elasticsearch
that referenced
this issue
Oct 7, 2020
…elastic#63326) To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`. This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword. The stored JSON is a `string`. Additionally, this commit adds a `result_type: category_definition` entry to category definition documents. This will help simplify and unify result queries in the future. closes elastic#60108
benwtrent
added a commit
that referenced
this issue
Oct 8, 2020
…#63326) (#63412) To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`. This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword. The stored JSON is a `string`. Additionally, this commit adds a `result_type: category_definition` entry to category definition documents. This will help simplify and unify result queries in the future. closes #60108 Co-authored-by: Elastic Machine <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When you run an ML anomaly detection job that also does categorization you end up with category definition results and anomaly results.
The category definitions have a
category_id
field of typelong
that stores the unique category ID within the job. The anomalies contain akeyword
fieldmlcategory
that stores the unique category ID that the anomaly relates to. The reason this is akeyword
is that all by/over/partition fields are added to anomaly results as keywords; the by/over/partition fields are not strongly typed within the core analytics code.This discrepancy in how the category ID is stored makes it harder to use generic Kibana functionality to tie the two types of document together. It would be nicer if either the anomaly results had a
category_id
field or the category definitions had amlcategory
field.There is a further complication. Many documents we write to the ML results have a
result_type
field that indicates the document type. However, some do not. Category definitions are one such type of ML result. The way category definition documents are found is to do anexists
query on thecategory_id
field. Since this practice is widely used, it would be a bad idea to include thecategory_id
field in any other type of ML result.As a result way to allow easier joining of category definitions and anomaly results is to add a field
mlcategory
of typekeyword
to category definition documents. This can easily be added to all category definition documents for both new jobs and pre-existing jobs when they are updated for any reason. Only jobs created in the version where the functionality is added or higher could guarantee the presence ofmlcategory
in category definition documents, but older jobs that are actively running would acquire it over time.While this change is being made a further change should be made to make querying ML results easier and more consistent in the future. We should add a
result_type
field with valuecategory_definition
to category definition documents. We will not be able to take advantage of this for a long time - we'll have to stick with querying for the existence ofcategory_id
. But by adding theresult_type
now we will create the possibility to simplify things further in a few years time, say in version 9.The text was updated successfully, but these errors were encountered: