[ML] Add categorizer stats ML result type #57978

droberts195 · 2020-06-11T10:23:47Z

This type of result will store stats about how well categorization
is performing. When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.

This PR is a minimal implementation to allow the C++ side changes to
be made. More Java side changes related to per-partition
categorization will be in followup PRs. However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats. Like forecast request stats the
categorizer stats can be read directly from the job's results alias.

This type of result will store stats about how well categorization is performing. When per-partition categorization is in use, separate documents will be written for every partition so that it is possible to see if categorization is working well for some partitions but not others. This PR is a minimal implementation to allow the C++ side changes to be made. More Java side changes related to per-partition categorization will be in followup PRs. However, even in the long term I do not see a major benefit in introducing dedicated APIs for querying categorizer stats. Like forecast request stats the categorizer stats can be read directly from the job's results alias.

elasticmachine · 2020-06-11T10:23:48Z

Pinging @elastic/ml-core (:ml)

benwtrent

very minor things. Overall looks good.

...main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/CategorizerStats.java

benwtrent · 2020-06-11T11:42:33Z

...main/java/org/elasticsearch/xpack/core/ml/job/process/autodetect/state/CategorizerStats.java

+        this.categorizedDocCount = categorizedDocCount;
+        this.totalCategoryCount = totalCategoryCount;
+        this.frequentCategoryCount = frequentCategoryCount;
+        this.rareCategoryCount = rareCategoryCount;
+        this.deadCategoryCount = deadCategoryCount;
+        this.failedCategoryCount = failedCategoryCount;


We don't check that all of these are > 0. This is probably OK since we are getting this data from C++ and there is an implicit trust.

Yes, good point. This is dodgy when we're using VLong serialization. It means someone can DoS us by updating a document to contain a negative number.

This same problem applies to most of our other results classes too, so I think it's best to deal with it in a separate PR.

Like you say, the C++ won't send negative values for unsigned counters, so it's not a likely problem.

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/results/AutodetectResult.java

Co-authored-by: Benjamin Trent <[email protected]>

This type of result will store stats about how well categorization is performing. When per-partition categorization is in use, separate documents will be written for every partition so that it is possible to see if categorization is working well for some partitions but not others. This PR is a minimal implementation to allow the C++ side changes to be made. More Java side changes related to per-partition categorization will be in followup PRs. However, even in the long term I do not see a major benefit in introducing dedicated APIs for querying categorizer stats. Like forecast request stats the categorizer stats can be read directly from the job's results alias. Backport of elastic#57978

After merging elastic#58001 the BWC constants added in elastic#57978 are no longer needed.

This type of result will store stats about how well categorization is performing. When per-partition categorization is in use, separate documents will be written for every partition so that it is possible to see if categorization is working well for some partitions but not others. This PR is a minimal implementation to allow the C++ side changes to be made. More Java side changes related to per-partition categorization will be in followup PRs. However, even in the long term I do not see a major benefit in introducing dedicated APIs for querying categorizer stats. Like forecast request stats the categorizer stats can be read directly from the job's results alias. Backport of #57978

After merging #58001 the BWC constants added in #57978 are no longer needed.

droberts195 added >non-issue :ml Machine learning v8.0.0 v7.9.0 labels Jun 11, 2020

benwtrent self-requested a review June 11, 2020 11:30

benwtrent approved these changes Jun 11, 2020

View reviewed changes

Apply suggestions from code review

8c0bc5a

Co-authored-by: Benjamin Trent <[email protected]>

droberts195 merged commit 355958f into elastic:master Jun 11, 2020

droberts195 deleted the add_categorizer_stats branch June 11, 2020 16:59

droberts195 mentioned this pull request Jun 11, 2020

[7.x][ML] Add categorizer stats ML result type #58001

Merged

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Jun 11, 2020

[ML] Remove unnecessary BWC constants

bec7607

After merging elastic#58001 the BWC constants added in elastic#57978 are no longer needed.

droberts195 mentioned this pull request Jun 11, 2020

[ML] Remove unnecessary BWC constants #58002

Merged

droberts195 added a commit that referenced this pull request Jun 12, 2020

[ML] Remove unnecessary BWC constants (#58002)

990ce0d

After merging #58001 the BWC constants added in #57978 are no longer needed.

droberts195 mentioned this pull request Jul 16, 2020

[Logs UI] Include the dataset information in categorization warning message elastic/kibana#60392

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add categorizer stats ML result type #57978

[ML] Add categorizer stats ML result type #57978

droberts195 commented Jun 11, 2020

elasticmachine commented Jun 11, 2020

benwtrent left a comment

benwtrent Jun 11, 2020

droberts195 Jun 11, 2020

[ML] Add categorizer stats ML result type #57978

[ML] Add categorizer stats ML result type #57978

Conversation

droberts195 commented Jun 11, 2020

elasticmachine commented Jun 11, 2020

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jun 11, 2020

Choose a reason for hiding this comment

droberts195 Jun 11, 2020

Choose a reason for hiding this comment