Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add 'model_prune_window' field to AD job config #75741

Merged
merged 7 commits into from
Aug 3, 2021

Conversation

edsavage
Copy link
Contributor

Add configuration for pruning dead split fields in anomaly detection jobs via the model_prune_window field for both the job creation and update APIs.

No default value is provided for this new configuration field. If not present then no 'aggressive' pruning is performed.

Relates to elastic/ml-cpp#1962

Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.

Relates to ml-cpp/elastic#1962
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jul 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have some yaml integration tests for PUT and UPDATE

Comment on lines 532 to 534
if (modelPruneWindow != null) {
TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW);
}
Copy link
Member

@benwtrent benwtrent Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this always be greater than the BucketSpan?

I see that the C++ side secretly does a Math.max(bucketspan, modelPruneWindow type of thing. I think it would be better to not allow a window to be too small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two times the bucket span is the smallest value that makes sense, so that should be enforced here.

Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils aren't much use for enforcing the value. It would need a bespoke check and error message.

@@ -142,6 +149,7 @@ public AnalysisConfig(StreamInput in) throws IOException {
influencers = Collections.unmodifiableList(in.readStringList());

multivariateByFields = in.readOptionalBoolean();
modelPruneWindow = in.readOptionalTimeValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add BWC serialization branches here and in the stream out

Comment on lines 532 to 534
if (modelPruneWindow != null) {
TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two times the bucket span is the smallest value that makes sense, so that should be enforced here.

Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils aren't much use for enforcing the value. It would need a bespoke check and error message.

@@ -701,6 +718,7 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() {
AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(detectors);
analysisConfig.setBucketSpan(TimeValue.timeValueHours(1));
analysisConfig.setLatency(TimeValue.ZERO);
analysisConfig.setModelPruneWindow(TimeValue.ZERO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".

@@ -710,14 +728,15 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() {
AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(Collections.singletonList(detector.build()));
analysisConfig.setBucketSpan(TimeValue.timeValueHours(1));
analysisConfig.setLatency(TimeValue.ZERO);
analysisConfig.setModelPruneWindow(TimeValue.ZERO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".

Also added some documentation for the new field...
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one minor nit

@@ -185,6 +185,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=latency]
(Boolean)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=multivariate-by-fields]

`model_prune_window`:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go above multivariate_by_fields for alphabetical order.

Comment on lines +576 to +579
if (modelPruneWindowSecs % bucketSpanSecs != 0) {
throw ExceptionsHelper.badRequestException(MODEL_PRUNE_WINDOW.getPreferredName() + " [" + modelPruneWindow.toString() + "]"
+ " must be a multiple of " + BUCKET_SPAN.getPreferredName() + " [" + bucketSpan.toString() + "]");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a thing we are enforcing, it makes sense to me to make this model_prune_window_buckets and take a whole, positive value indicating the number of buckets in the past to look at.

Forcing somebody to supply a timevalue when we require that value to always be a bucket multiple doesn't make sense to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 ^ ? What say you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided in a team meeting about a month ago that we'd just have one setting that was an amount of time rather than a number of buckets.

I think it's quite unlikely this error will be thrown in practice when you consider sensible values for pruning are of the order of days, not minutes, and most bucket spans divide exactly into a day.

So I think it's OK as-is.

I definitely don't think we should change this retention period to be a number of buckets. All the other retention periods are measured in time, not buckets.

What we could do is just enforce >= 2 * bucket span and not an exact multiple. The C++ now rounds up so everything will work out OK.

But, like I said, I suspect in practice the detail here won't make any difference to anybody, because real users always choose nice round numbers even when they're not forced to.

@@ -164,6 +168,7 @@ public JobUpdate(StreamInput in) throws IOException {
}
allowLazyOpen = in.readOptionalBoolean();
blocked = in.readOptionalWriteable(Blocked::new);
modelPruneWindow = in.readOptionalTimeValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BWC serialization here.

@@ -206,6 +211,7 @@ public void writeTo(StreamOutput out) throws IOException {
}
out.writeOptionalBoolean(allowLazyOpen);
out.writeOptionalWriteable(blocked);
out.writeOptionalTimeValue(modelPruneWindow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BWC serialization here.

Also fix failing jobs_cruf yaml tests
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, before backporting, the BWC tests should be muted :D.

I have forgotten that myself many times and caused build failures :)

@edsavage
Copy link
Contributor Author

edsavage commented Aug 2, 2021

Remember, before backporting, the BWC tests should be muted :D.

I have forgotten that myself many times and caused build failures :)

Cheers Ben

@edsavage edsavage merged commit 5651215 into elastic:master Aug 3, 2021
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Aug 3, 2021
Temporarily mute BWC tests to allow elastic#75999 to be merged

Relates elastic#75741
edsavage added a commit that referenced this pull request Aug 3, 2021
Temporarily mute BWC tests to allow #75999 to be merged

Relates #75741
edsavage added a commit to edsavage/elasticsearch that referenced this pull request Aug 3, 2021
Unmute the the BWC tests and alter the BWC version for the
model_prune_window field to be 7_15_0

Relates elastic#75741, elastic#76003
@edsavage edsavage mentioned this pull request Aug 3, 2021
edsavage added a commit that referenced this pull request Aug 3, 2021
…75999)

Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.

Relates to ml-cpp/#1962
Backports #75741
edsavage added a commit that referenced this pull request Aug 3, 2021
Unmute the the BWC tests and alter the BWC version for the
model_prune_window field to be 7_15_0

Relates #75741, #76003
@edsavage edsavage deleted the prune_dead_splits branch August 3, 2021 13:43
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021
Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.

Relates to ml-cpp/elastic#1962
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021
Temporarily mute BWC tests to allow elastic#75999 to be merged

Relates elastic#75741
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021
Unmute the the BWC tests and alter the BWC version for the
model_prune_window field to be 7_15_0

Relates elastic#75741, elastic#76003
@droberts195 droberts195 added the Team:Clients Meta label for clients team label Aug 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

droberts195 added a commit to droberts195/kibana that referenced this pull request Aug 5, 2021
Adds the model_prune_window setting added in elastic/elasticsearch#75741
to all Security jobs that use functions that support model pruning.
The "rare" function does not support model pruning, so jobs that use
the "rare" function are not modified.
droberts195 added a commit to elastic/kibana that referenced this pull request Aug 10, 2021
)

Adds the model_prune_window setting added in elastic/elasticsearch#75741
to all Security jobs that use functions that support model pruning.
This means that the models for split field values that are not seen for
30 days will be dropped. If those split field values are subsequently seen
again then new models will be created like for completely new entities.
The "rare" function does not support model pruning, so jobs that use
the "rare" function are not modified.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 10, 2021
…tic#107752)

Adds the model_prune_window setting added in elastic/elasticsearch#75741
to all Security jobs that use functions that support model pruning.
This means that the models for split field values that are not seen for
30 days will be dropped. If those split field values are subsequently seen
again then new models will be created like for completely new entities.
The "rare" function does not support model pruning, so jobs that use
the "rare" function are not modified.
kibanamachine added a commit to elastic/kibana that referenced this pull request Aug 10, 2021
) (#108058)

Adds the model_prune_window setting added in elastic/elasticsearch#75741
to all Security jobs that use functions that support model pruning.
This means that the models for split field values that are not seen for
30 days will be dropped. If those split field values are subsequently seen
again then new models will be created like for completely new entities.
The "rare" function does not support model pruning, so jobs that use
the "rare" function are not modified.

Co-authored-by: David Roberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement >feature :ml Machine learning Team:Clients Meta label for clients team Team:ML Meta label for the ML team v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants