-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add 'model_prune_window' field to AD job config #75741
Conversation
Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/elastic#1962
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some yaml integration tests for PUT
and UPDATE
if (modelPruneWindow != null) { | ||
TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this always be greater than the BucketSpan?
I see that the C++ side secretly does a Math.max(bucketspan, modelPruneWindow
type of thing. I think it would be better to not allow a window to be too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two times the bucket span is the smallest value that makes sense, so that should be enforced here.
Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils
aren't much use for enforcing the value. It would need a bespoke check and error message.
@@ -142,6 +149,7 @@ public AnalysisConfig(StreamInput in) throws IOException { | |||
influencers = Collections.unmodifiableList(in.readStringList()); | |||
|
|||
multivariateByFields = in.readOptionalBoolean(); | |||
modelPruneWindow = in.readOptionalTimeValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add BWC serialization branches here and in the stream out
if (modelPruneWindow != null) { | ||
TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two times the bucket span is the smallest value that makes sense, so that should be enforced here.
Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils
aren't much use for enforcing the value. It would need a bespoke check and error message.
@@ -701,6 +718,7 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() { | |||
AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(detectors); | |||
analysisConfig.setBucketSpan(TimeValue.timeValueHours(1)); | |||
analysisConfig.setLatency(TimeValue.ZERO); | |||
analysisConfig.setModelPruneWindow(TimeValue.ZERO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".
@@ -710,14 +728,15 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() { | |||
AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(Collections.singletonList(detector.build())); | |||
analysisConfig.setBucketSpan(TimeValue.timeValueHours(1)); | |||
analysisConfig.setLatency(TimeValue.ZERO); | |||
analysisConfig.setModelPruneWindow(TimeValue.ZERO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".
Also added some documentation for the new field...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from one minor nit
@@ -185,6 +185,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=latency] | |||
(Boolean) | |||
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=multivariate-by-fields] | |||
|
|||
`model_prune_window`::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go above multivariate_by_fields
for alphabetical order.
if (modelPruneWindowSecs % bucketSpanSecs != 0) { | ||
throw ExceptionsHelper.badRequestException(MODEL_PRUNE_WINDOW.getPreferredName() + " [" + modelPruneWindow.toString() + "]" | ||
+ " must be a multiple of " + BUCKET_SPAN.getPreferredName() + " [" + bucketSpan.toString() + "]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a thing we are enforcing, it makes sense to me to make this model_prune_window_buckets
and take a whole, positive value indicating the number of buckets in the past to look at.
Forcing somebody to supply a timevalue when we require that value to always be a bucket multiple doesn't make sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 ^ ? What say you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided in a team meeting about a month ago that we'd just have one setting that was an amount of time rather than a number of buckets.
I think it's quite unlikely this error will be thrown in practice when you consider sensible values for pruning are of the order of days, not minutes, and most bucket spans divide exactly into a day.
So I think it's OK as-is.
I definitely don't think we should change this retention period to be a number of buckets. All the other retention periods are measured in time, not buckets.
What we could do is just enforce >= 2 * bucket span and not an exact multiple. The C++ now rounds up so everything will work out OK.
But, like I said, I suspect in practice the detail here won't make any difference to anybody, because real users always choose nice round numbers even when they're not forced to.
@@ -164,6 +168,7 @@ public JobUpdate(StreamInput in) throws IOException { | |||
} | |||
allowLazyOpen = in.readOptionalBoolean(); | |||
blocked = in.readOptionalWriteable(Blocked::new); | |||
modelPruneWindow = in.readOptionalTimeValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BWC serialization here.
@@ -206,6 +211,7 @@ public void writeTo(StreamOutput out) throws IOException { | |||
} | |||
out.writeOptionalBoolean(allowLazyOpen); | |||
out.writeOptionalWriteable(blocked); | |||
out.writeOptionalTimeValue(modelPruneWindow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BWC serialization here.
Also fix failing jobs_cruf yaml tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember, before backporting, the BWC tests should be muted :D.
I have forgotten that myself many times and caused build failures :)
Cheers Ben |
Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741
Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003
Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/elastic#1962
Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741
Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003
Pinging @elastic/clients-team (Team:Clients) |
Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.
) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.
…tic#107752) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.
) (#108058) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified. Co-authored-by: David Roberts <[email protected]>
Add configuration for pruning dead split fields in anomaly detection jobs via the
model_prune_window
field for both the job creation and update APIs.No default value is provided for this new configuration field. If not present then no 'aggressive' pruning is performed.
Relates to elastic/ml-cpp#1962