Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mlcategory and result_type to category definition docs #60108

Closed
droberts195 opened this issue Jul 23, 2020 · 1 comment · Fixed by #63326
Closed

Add mlcategory and result_type to category definition docs #60108

droberts195 opened this issue Jul 23, 2020 · 1 comment · Fixed by #63326
Assignees
Labels
>enhancement :ml Machine learning

Comments

@droberts195
Copy link
Contributor

When you run an ML anomaly detection job that also does categorization you end up with category definition results and anomaly results.

The category definitions have a category_id field of type long that stores the unique category ID within the job. The anomalies contain a keyword field mlcategory that stores the unique category ID that the anomaly relates to. The reason this is a keyword is that all by/over/partition fields are added to anomaly results as keywords; the by/over/partition fields are not strongly typed within the core analytics code.

This discrepancy in how the category ID is stored makes it harder to use generic Kibana functionality to tie the two types of document together. It would be nicer if either the anomaly results had a category_id field or the category definitions had a mlcategory field.

There is a further complication. Many documents we write to the ML results have a result_type field that indicates the document type. However, some do not. Category definitions are one such type of ML result. The way category definition documents are found is to do an exists query on the category_id field. Since this practice is widely used, it would be a bad idea to include the category_id field in any other type of ML result.

As a result way to allow easier joining of category definitions and anomaly results is to add a field mlcategory of type keyword to category definition documents. This can easily be added to all category definition documents for both new jobs and pre-existing jobs when they are updated for any reason. Only jobs created in the version where the functionality is added or higher could guarantee the presence of mlcategory in category definition documents, but older jobs that are actively running would acquire it over time.

While this change is being made a further change should be made to make querying ML results easier and more consistent in the future. We should add a result_type field with value category_definition to category definition documents. We will not be able to take advantage of this for a long time - we'll have to stick with querying for the existence of category_id. But by adding the result_type now we will create the possibility to simplify things further in a few years time, say in version 9.

@droberts195 droberts195 added >enhancement :ml Machine learning labels Jul 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

benwtrent added a commit that referenced this issue Oct 7, 2020
…#63326)

To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`.

This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store
the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword.

The stored JSON is a `string`.

Additionally, this commit adds a `result_type: category_definition` entry to category definition documents.

This will help simplify and unify result queries in the future.

closes #60108
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 7, 2020
…elastic#63326)

To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`.

This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store
the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword.

The stored JSON is a `string`.

Additionally, this commit adds a `result_type: category_definition` entry to category definition documents.

This will help simplify and unify result queries in the future.

closes elastic#60108
benwtrent added a commit that referenced this issue Oct 8, 2020
…#63326) (#63412)

To easy correlation between anomaly results and category definitions, this commit adds a new keyword mapped field `mlcategory`.

This field is always the same as the `category_id` field (which is mapped as a long). But since anomaly results store
the `mlcategory` as a keyword, it simplifies queries if category_definitions also had this field as a keyword.

The stored JSON is a `string`.

Additionally, this commit adds a `result_type: category_definition` entry to category definition documents.

This will help simplify and unify result queries in the future.

closes #60108

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants