[ML] Add 'model_prune_window' field to AD job config #75741

edsavage · 2021-07-27T15:35:58Z

Add configuration for pruning dead split fields in anomaly detection jobs via the model_prune_window field for both the job creation and update APIs.

No default value is provided for this new configuration field. If not present then no 'aggressive' pruning is performed.

Relates to elastic/ml-cpp#1962

Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/elastic#1962

elasticmachine · 2021-07-27T15:36:01Z

Pinging @elastic/ml-core (Team:ML)

benwtrent

It would be good to have some yaml integration tests for PUT and UPDATE

benwtrent · 2021-07-27T18:10:39Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfig.java

+            if (modelPruneWindow != null) {
+                TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW);
+            }


Shouldn't this always be greater than the BucketSpan?

I see that the C++ side secretly does a Math.max(bucketspan, modelPruneWindow type of thing. I think it would be better to not allow a window to be too small.

Two times the bucket span is the smallest value that makes sense, so that should be enforced here.

Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils aren't much use for enforcing the value. It would need a bespoke check and error message.

benwtrent · 2021-07-27T18:15:36Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfig.java

@@ -142,6 +149,7 @@ public AnalysisConfig(StreamInput in) throws IOException {
        influencers = Collections.unmodifiableList(in.readStringList());

        multivariateByFields = in.readOptionalBoolean();
+        modelPruneWindow = in.readOptionalTimeValue();


You need to add BWC serialization branches here and in the stream out

droberts195 · 2021-07-29T13:41:15Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfig.java

+            if (modelPruneWindow != null) {
+                TimeUtils.checkNonNegativeMultiple(modelPruneWindow, TimeUnit.SECONDS, MODEL_PRUNE_WINDOW);
+            }


Two times the bucket span is the smallest value that makes sense, so that should be enforced here.

Since this gets converted into a number of buckets in the C++ code, we should possibly also enforce that the value specified is an exact number of buckets. That then means the functions in TimeUtils aren't much use for enforcing the value. It would need a bespoke check and error message.

droberts195 · 2021-07-29T13:43:21Z

...lugin/core/src/test/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfigTests.java

@@ -701,6 +718,7 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() {
        AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(detectors);
        analysisConfig.setBucketSpan(TimeValue.timeValueHours(1));
        analysisConfig.setLatency(TimeValue.ZERO);
+        analysisConfig.setModelPruneWindow(TimeValue.ZERO);


Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".

droberts195 · 2021-07-29T13:43:28Z

...lugin/core/src/test/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfigTests.java

@@ -710,14 +728,15 @@ public void testVerify_GivenComplexPerPartitionCategorizationConfig() {
        AnalysisConfig.Builder analysisConfig = new AnalysisConfig.Builder(Collections.singletonList(detector.build()));
        analysisConfig.setBucketSpan(TimeValue.timeValueHours(1));
        analysisConfig.setLatency(TimeValue.ZERO);
+        analysisConfig.setModelPruneWindow(TimeValue.ZERO);


Zero shouldn't be valid, as it's null that means "don't prune after a fixed time".

Also added some documentation for the new field...

droberts195

LGTM apart from one minor nit

droberts195 · 2021-07-30T14:11:44Z

docs/reference/ml/anomaly-detection/apis/put-job.asciidoc

@@ -185,6 +185,10 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=latency]
 (Boolean)
 include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=multivariate-by-fields]

+`model_prune_window`:::


This should go above multivariate_by_fields for alphabetical order.

…prune_dead_splits

benwtrent · 2021-07-30T15:08:39Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/AnalysisConfig.java

+            if (modelPruneWindowSecs % bucketSpanSecs != 0) {
+                throw ExceptionsHelper.badRequestException(MODEL_PRUNE_WINDOW.getPreferredName() + " [" + modelPruneWindow.toString() + "]"
+                    + " must be a multiple of " + BUCKET_SPAN.getPreferredName() + " [" + bucketSpan.toString() + "]");
+            }


If this is a thing we are enforcing, it makes sense to me to make this model_prune_window_buckets and take a whole, positive value indicating the number of buckets in the past to look at.

Forcing somebody to supply a timevalue when we require that value to always be a bucket multiple doesn't make sense to me

@droberts195 ^ ? What say you?

We decided in a team meeting about a month ago that we'd just have one setting that was an amount of time rather than a number of buckets.

I think it's quite unlikely this error will be thrown in practice when you consider sensible values for pruning are of the order of days, not minutes, and most bucket spans divide exactly into a day.

So I think it's OK as-is.

I definitely don't think we should change this retention period to be a number of buckets. All the other retention periods are measured in time, not buckets.

What we could do is just enforce >= 2 * bucket span and not an exact multiple. The C++ now rounds up so everything will work out OK.

But, like I said, I suspect in practice the detail here won't make any difference to anybody, because real users always choose nice round numbers even when they're not forced to.

benwtrent · 2021-08-02T11:30:28Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/JobUpdate.java

@@ -164,6 +168,7 @@ public JobUpdate(StreamInput in) throws IOException {
        }
        allowLazyOpen = in.readOptionalBoolean();
        blocked = in.readOptionalWriteable(Blocked::new);
+        modelPruneWindow = in.readOptionalTimeValue();


BWC serialization here.

benwtrent · 2021-08-02T11:30:38Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/job/config/JobUpdate.java

@@ -206,6 +211,7 @@ public void writeTo(StreamOutput out) throws IOException {
        }
        out.writeOptionalBoolean(allowLazyOpen);
        out.writeOptionalWriteable(blocked);
+        out.writeOptionalTimeValue(modelPruneWindow);


BWC serialization here.

Also fix failing jobs_cruf yaml tests

benwtrent

Remember, before backporting, the BWC tests should be muted :D.

I have forgotten that myself many times and caused build failures :)

edsavage · 2021-08-02T13:54:12Z

Remember, before backporting, the BWC tests should be muted :D.

I have forgotten that myself many times and caused build failures :)

Cheers Ben

Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741

Temporarily mute BWC tests to allow #75999 to be merged Relates #75741

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003

…75999) Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/#1962 Backports #75741

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates #75741, #76003

Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/elastic#1962

Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003

elasticmachine · 2021-08-04T13:11:06Z

Pinging @elastic/clients-team (Team:Clients)

Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.

) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.

…tic#107752) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified.

) (#108058) Adds the model_prune_window setting added in elastic/elasticsearch#75741 to all Security jobs that use functions that support model pruning. This means that the models for split field values that are not seen for 30 days will be dropped. If those split field values are subsequently seen again then new models will be created like for completely new entities. The "rare" function does not support model pruning, so jobs that use the "rare" function are not modified. Co-authored-by: David Roberts <[email protected]>

[ML] Add 'model_prune_window' field to AD job config

ed17a80

Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/elastic#1962

edsavage added >feature >enhancement :ml Machine learning v8.0.0 v7.15.0 labels Jul 27, 2021

elasticmachine added the Team:ML Meta label for the ML team label Jul 27, 2021

benwtrent reviewed Jul 27, 2021

View reviewed changes

droberts195 reviewed Jul 29, 2021

View reviewed changes

peteharverson mentioned this pull request Jul 30, 2021

[ML] Add support for model_prune_window in job config elastic/kibana#107273

Closed

[ML] Attend to code review comments

eac3606

Also added some documentation for the new field...

droberts195 approved these changes Jul 30, 2021

View reviewed changes

edsavage added 3 commits July 30, 2021 15:33

[ML] Fixed order of doc entries

a654ceb

Merge branch 'master' of github.com:elasticsearch/elasticsearch into …

88d6cc7

…prune_dead_splits

[ML] Fix style

a5f7587

benwtrent reviewed Jul 30, 2021

View reviewed changes

[ML] Fix failing unit test

5cc84ae

benwtrent reviewed Aug 2, 2021

View reviewed changes

[ML] Attend to further review comments

f378641

Also fix failing jobs_cruf yaml tests

benwtrent approved these changes Aug 2, 2021

View reviewed changes

edsavage merged commit 5651215 into elastic:master Aug 3, 2021

edsavage mentioned this pull request Aug 3, 2021

[7.x][ML] Add 'model_prune_window' field to AD job config (#75741) #75999

Merged

edsavage added a commit to edsavage/elasticsearch that referenced this pull request Aug 3, 2021

[ML] Temporarily mute BWC tests

eb6e8cc

Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741

edsavage mentioned this pull request Aug 3, 2021

[ML] Temporarily mute BWC tests #76003

Merged

edsavage added a commit that referenced this pull request Aug 3, 2021

[ML] Temporarily mute BWC tests (#76003)

fb60475

Temporarily mute BWC tests to allow #75999 to be merged Relates #75741

edsavage added a commit to edsavage/elasticsearch that referenced this pull request Aug 3, 2021

[ML] Unmute BWC tests

319627c

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003

edsavage mentioned this pull request Aug 3, 2021

[ML] Unmute BWC tests #76008

Merged

edsavage added a commit that referenced this pull request Aug 3, 2021

[ML] Unmute BWC tests (#76008)

15933c0

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates #75741, #76003

edsavage deleted the prune_dead_splits branch August 3, 2021 13:43

lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021

[ML] Temporarily mute BWC tests (elastic#76003)

14932a1

Temporarily mute BWC tests to allow elastic#75999 to be merged Relates elastic#75741

lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Aug 3, 2021

[ML] Unmute BWC tests (elastic#76008)

80497c5

Unmute the the BWC tests and alter the BWC version for the model_prune_window field to be 7_15_0 Relates elastic#75741, elastic#76003

droberts195 added the Team:Clients Meta label for clients team label Aug 4, 2021

mark-vieira added v8.0.0-alpha1 and removed v8.0.0 labels Aug 4, 2021

droberts195 mentioned this pull request Aug 5, 2021

[ML] Adds a 30 day model prune window to non-rare Security jobs elastic/kibana#107752

Merged

lcawl mentioned this pull request Aug 5, 2021

[ML] Add 'model_prune_window' field to AD job config elastic/elasticsearch-specification#535

Merged

jgowdyelastic mentioned this pull request Aug 17, 2021

[ML] Add support for model_prune_window in job wizard elastic/kibana#108734

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add 'model_prune_window' field to AD job config #75741

[ML] Add 'model_prune_window' field to AD job config #75741

edsavage commented Jul 27, 2021

elasticmachine commented Jul 27, 2021

benwtrent left a comment

benwtrent Jul 27, 2021 •

edited

Loading

droberts195 Jul 29, 2021

benwtrent Jul 27, 2021

droberts195 Jul 29, 2021

droberts195 Jul 29, 2021

droberts195 Jul 29, 2021

droberts195 left a comment

droberts195 Jul 30, 2021

benwtrent Jul 30, 2021

benwtrent Jul 30, 2021

droberts195 Jul 30, 2021

benwtrent Aug 2, 2021

benwtrent Aug 2, 2021

benwtrent left a comment

edsavage commented Aug 2, 2021

elasticmachine commented Aug 4, 2021

[ML] Add 'model_prune_window' field to AD job config #75741

[ML] Add 'model_prune_window' field to AD job config #75741

Conversation

edsavage commented Jul 27, 2021

elasticmachine commented Jul 27, 2021

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droberts195 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent left a comment

Choose a reason for hiding this comment

edsavage commented Aug 2, 2021

elasticmachine commented Aug 4, 2021

benwtrent Jul 27, 2021 •

edited

Loading