Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add audit warning for 1000 categories found early in job #51146

Merged

Conversation

droberts195
Copy link
Contributor

@droberts195 droberts195 commented Jan 17, 2020

If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created. (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.

Relates #50749

If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created.  (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a few minor suggestions.

@@ -135,6 +135,8 @@
"Adjust the analysis_limits.model_memory_limit setting to ensure all data is analyzed";
public static final String JOB_AUDIT_MEMORY_STATUS_HARD_LIMIT_PRE_7_2 = "Job memory status changed to hard_limit at {0}; adjust the " +
"analysis_limits.model_memory_limit setting to ensure all data is analyzed";
public static final String JOB_AUDIT_EXCESSIVE_EARLY_CATEGORIES = "{0} categories observed in the first [{1}] buckets." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have square brackets about the second number in this message but not the first. Should we make it consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't put the first one in square brackets is that it won't vary (at least for a particular version of the product). The first number will always be 1000 unless somebody changes the code, whereas the second number can vary between different occurrences of the audit message.

@@ -87,7 +90,9 @@
private final FlushListener flushListener;
private volatile boolean processKilled;
private volatile boolean failed;
private int bucketCount; // only used from the process() thread, so doesn't need to be volatile
private long priorRunsBucketCount;
private int currentRunBucketCount; // only used from the process() thread, so doesn't need to be volatile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we take this chance and make this a long as well? It seems like something that could hit overflow problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a 1 second bucket span it will take 68 years to overflow, so pretty unlikely. But maybe I can avoid some casting by making it long, which will make the code cleaner. If so I'll change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't much casting but I changed it anyway just so both variables have the same type.

@@ -122,6 +127,7 @@ public AutodetectResultProcessor(Client client,
this.bulkResultsPersister = persister.bulkPersisterBuilder(jobId, this::isAlive);
this.timingStatsReporter = new TimingStatsReporter(timingStats, bulkResultsPersister);
this.deleteInterimRequired = true;
this.priorRunsBucketCount = timingStats.getBucketCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering where we'd get this from but it's cool we already pass it in!

@@ -225,6 +231,18 @@ void processResult(AutodetectResult result) {
CategoryDefinition categoryDefinition = result.getCategoryDefinition();
if (categoryDefinition != null) {
persister.persistCategoryDefinition(categoryDefinition, this::isAlive);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that all this has become more complex than a single call to the persister, I'd be tempted to extract this in a handleCategoryDefinition method.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit 160a212 into elastic:master Jan 17, 2020
@droberts195 droberts195 deleted the excessive_categories_audit_message branch January 17, 2020 16:25
droberts195 added a commit that referenced this pull request Jan 17, 2020
If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created.  (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
droberts195 added a commit that referenced this pull request Jan 17, 2020
If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created.  (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
…c#51146)

If 1000 different category definitions are created for a job in
the first 100 buckets it processes then an audit warning will now
be created.  (This will cause a yellow warning triangle in the
ML UI's jobs list.)

Such a large number of categories suggests that the field that
categorization is working on is not well suited to the ML
categorization functionality.
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Feb 11, 2020
In elastic#51146 a rudimentary check for poor categorization was added to
7.6.

This change replaces that warning based on a Java-side check with
a new one based on the categorization_status field that the ML C++
sets.  categorization_status was added in 7.7 and above by elastic#51879,
so this new warning based on more advanced conditions will also be
in 7.7 and above.

Closes elastic#50749
droberts195 added a commit that referenced this pull request Feb 11, 2020
…2195)

In #51146 a rudimentary check for poor categorization was added to
7.6.

This change replaces that warning based on a Java-side check with
a new one based on the categorization_status field that the ML C++
sets.  categorization_status was added in 7.7 and above by #51879,
so this new warning based on more advanced conditions will also be
in 7.7 and above.

Closes #50749
droberts195 added a commit that referenced this pull request Feb 11, 2020
…2195)

In #51146 a rudimentary check for poor categorization was added to
7.6.

This change replaces that warning based on a Java-side check with
a new one based on the categorization_status field that the ML C++
sets.  categorization_status was added in 7.7 and above by #51879,
so this new warning based on more advanced conditions will also be
in 7.7 and above.

Closes #50749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants