Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] More advanced model snapshot retention options #56125

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2519,10 +2519,12 @@ private static Job buildJobForExpiredDataTests(String jobId) {
.setFunction("count")
.setDetectorDescription(randomAlphaOfLength(10))
.build();
AnalysisConfig.Builder configBuilder = new AnalysisConfig.Builder(Arrays.asList(detector));
AnalysisConfig.Builder configBuilder = new AnalysisConfig.Builder(Collections.singletonList(detector));
//should not be random, see:https://github.com/elastic/ml-cpp/issues/208
configBuilder.setBucketSpan(new TimeValue(1, TimeUnit.HOURS));
builder.setAnalysisConfig(configBuilder);
builder.setModelSnapshotRetentionDays(1L);
builder.setDailyModelSnapshotRetentionAfterDays(1L);

DataDescription.Builder dataDescription = new DataDescription.Builder();
dataDescription.setTimeFormat(DataDescription.EPOCH_MS);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,17 @@ public static Job.Builder createRandomizedJobBuilder() {
if (randomBoolean()) {
builder.setBackgroundPersistInterval(TimeValue.timeValueHours(randomIntBetween(1, 24)));
}
Long modelSnapshotRetentionDays = null;
if (randomBoolean()) {
builder.setModelSnapshotRetentionDays(randomNonNegativeLong());
modelSnapshotRetentionDays = randomNonNegativeLong();
builder.setModelSnapshotRetentionDays(modelSnapshotRetentionDays);
}
if (randomBoolean()) {
builder.setDailyModelSnapshotRetentionAfterDays(randomNonNegativeLong());
if (modelSnapshotRetentionDays != null) {
builder.setDailyModelSnapshotRetentionAfterDays(randomLongBetween(0, modelSnapshotRetentionDays));
} else {
builder.setDailyModelSnapshotRetentionAfterDays(randomNonNegativeLong());
}
}
if (randomBoolean()) {
builder.setResultsRetentionDays(randomNonNegativeLong());
Expand Down
3 changes: 2 additions & 1 deletion docs/reference/ml/anomaly-detection/apis/get-job.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,8 @@ The API returns the following results:
"model_plot_config" : {
"enabled" : true
},
"model_snapshot_retention_days" : 1,
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1,
"custom_settings" : {
"created_by" : "ml-module-sample",
...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ This is a possible response:
},
"model_memory_limit" : "1gb",
"categorization_examples_limit" : 4,
"model_snapshot_retention_days" : 1
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1
},
"datafeeds" : {
"scroll_size" : 1000
Expand Down
7 changes: 6 additions & 1 deletion docs/reference/ml/anomaly-detection/apis/put-job.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,10 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=custom-settings]
include::{docdir}/ml/ml-shared.asciidoc[tag=data-description]
//End data_description

`daily_model_snapshot_retention_after_days`::
(Optional, long)
include::{docdir}/ml/ml-shared.asciidoc[tag=daily-model-snapshot-retention-after-days]

`description`::
(Optional, string) A description of the job.

Expand Down Expand Up @@ -320,7 +324,8 @@ When the job is created, you receive the following results:
"time_field" : "timestamp",
"time_format" : "epoch_ms"
},
"model_snapshot_retention_days" : 1,
"model_snapshot_retention_days" : 10,
"daily_model_snapshot_retention_after_days" : 1,
"results_index_name" : "shared",
"allow_lazy_open" : false
}
Expand Down
4 changes: 4 additions & 0 deletions docs/reference/ml/anomaly-detection/apis/update-job.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ close the job, then reopen the job and restart the {dfeed} for the changes to ta
(object)
include::{docdir}/ml/ml-shared.asciidoc[tag=custom-settings]

`daily_model_snapshot_retention_after_days`::
(long)
include::{docdir}/ml/ml-shared.asciidoc[tag=daily-model-snapshot-retention-after-days]

`description`::
(string) A description of the job.

Expand Down
16 changes: 14 additions & 2 deletions docs/reference/ml/ml-shared.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,18 @@ example, it can contain custom URL information as shown in
{ml-docs}/ml-configuring-url.html[Adding custom URLs to {ml} results].
end::custom-settings[]

tag::daily-model-snapshot-retention-after-days[]
Advanced configuration option. Specifies a number of days between 0 and the
value of `model_snapshot_retention_days`. After this period of time, only the first
model snapshot per day is retained for this job. Age is calculated relative to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age is calculated relative to the timestamp of the newest model snapshot...

Is this sentence needed? If yes, it's unclear to me what this age affects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is needed. Not sure how best to phrase it though.

Suppose you have the following model snapshots:

  1. Tuesday 9:04 am
  2. Monday 10:39 am
  3. Sunday 9:03 am
  4. Saturday 10:46 pm
  5. Friday 8:02 pm
  6. Friday 7:49 pm
  7. Friday 8:59 am
  8. Thursday 9:04 am

And suppose model_snapshot_retention_days is 4 and daily_model_snapshot_retention_after_days is 2.

Then we will delete all snapshots older than Friday 9:04 am (= Tuesday 9:04 am - 4 days). So we will delete snapshots 7 and 8 by this rule. And for snapshots at or after Friday 9:04 am (= Tuesday 9:04 am - 4 days) but before Sunday 9:04 am (= Tuesday 9:04 am - 2 days) we will keep the first per 24 hour period. So we will keep snapshots 6 and 4 by this rule and delete snapshots 5 and 3.

You can see that this is all highly dependent on the timestamp of the latest model snapshot, because everything is measured relative to its timestamp. So this is what "Age is calculated relative to the timestamp of the newest model snapshot" was trying to say. Can you recommend a good way to explain it? I will probably do this docs change in a followup PR though if the PR build goes green soon as I need to get the main code change merged before the 7.8 branch is split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created elastic/stack-docs#1047 to update the Model snapshots (https://www.elastic.co/guide/en/machine-learning/current/ml-model-snapshots.html) page and can revisit the API details thereafter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lisa!

the timestamp of the newest model snapshot. For new jobs, the default value is
`1`, which means that all snapshots are retained for one day. Older snapshots
are thinned out such that only one per day is retained. For jobs that were
created before this setting was available, the default value matches the
`model_snapshot_retention_days` value, which preserves the original behavior
and no thinning out of model snapshots occurs.
end::daily-model-snapshot-retention-after-days[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated for tag::model-snapshot-retention-days[] as it says the default value is 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that change in a commit.


tag::data-description[]
The data description defines the format of the input data when you send data to
the job by using the <<ml-post-data,post data>> API. Note that when configure
Expand Down Expand Up @@ -997,8 +1009,8 @@ end::model-snapshot-id[]
tag::model-snapshot-retention-days[]
Advanced configuration option. The period of time (in days) that model snapshots
are retained. Age is calculated relative to the timestamp of the newest model
snapshot. The default value is `1`, which means snapshots that are one day
(twenty-four hours) older than the newest snapshot are deleted.
snapshot. The default value is `10`, which means snapshots that are ten days
older than the newest snapshot are deleted.
end::model-snapshot-retention-days[]

tag::model-timestamp[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,8 @@ public class Job extends AbstractDiffable<Job> implements Writeable, ToXContentO
*/
public static final ByteSizeValue PROCESS_MEMORY_OVERHEAD = new ByteSizeValue(10, ByteSizeUnit.MB);

public static final long DEFAULT_MODEL_SNAPSHOT_RETENTION_DAYS = 1;
public static final long DEFAULT_MODEL_SNAPSHOT_RETENTION_DAYS = 10;
public static final long DEFAULT_DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS = 1;

private static ObjectParser<Builder, Void> createParser(boolean ignoreUnknownFields) {
ObjectParser<Builder, Void> parser = new ObjectParser<>("job_details", ignoreUnknownFields, Builder::new);
Expand Down Expand Up @@ -808,6 +809,10 @@ public Builder setModelSnapshotRetentionDays(Long modelSnapshotRetentionDays) {
return this;
}

public Long getModelSnapshotRetentionDays() {
return modelSnapshotRetentionDays;
}

public Builder setDailyModelSnapshotRetentionAfterDays(Long dailyModelSnapshotRetentionAfterDays) {
this.dailyModelSnapshotRetentionAfterDays = dailyModelSnapshotRetentionAfterDays;
return this;
Expand Down Expand Up @@ -1043,9 +1048,6 @@ public void validateInputFields() {

checkValidBackgroundPersistInterval();
checkValueNotLessThan(0, RENORMALIZATION_WINDOW_DAYS.getPreferredName(), renormalizationWindowDays);
checkValueNotLessThan(0, MODEL_SNAPSHOT_RETENTION_DAYS.getPreferredName(), modelSnapshotRetentionDays);
checkValueNotLessThan(0, DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS.getPreferredName(),
dailyModelSnapshotRetentionAfterDays);
checkValueNotLessThan(0, RESULTS_RETENTION_DAYS.getPreferredName(), resultsRetentionDays);

if (!MlStrings.isValidId(id)) {
Expand All @@ -1055,6 +1057,8 @@ public void validateInputFields() {
throw new IllegalArgumentException(Messages.getMessage(Messages.JOB_CONFIG_ID_TOO_LONG, MlStrings.ID_LENGTH_LIMIT));
}

validateModelSnapshotRetentionSettings();

validateGroups();

// Results index name not specified in user input means use the default, so is acceptable in this validation
Expand All @@ -1076,6 +1080,37 @@ public void validateAnalysisLimitsAndSetDefaults(@Nullable ByteSizeValue maxMode
AnalysisLimits.DEFAULT_MODEL_MEMORY_LIMIT_MB);
}

/**
* This is meant to be called when a new job is created.
* It sets {@link #dailyModelSnapshotRetentionAfterDays} to the default value if it is not set and the default makes sense.
*/
public void validateModelSnapshotRetentionSettingsAndSetDefaults() {
validateModelSnapshotRetentionSettings();
if (dailyModelSnapshotRetentionAfterDays == null &&
modelSnapshotRetentionDays != null &&
modelSnapshotRetentionDays > DEFAULT_DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS) {
dailyModelSnapshotRetentionAfterDays = DEFAULT_DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS;
}
}

/**
* Validates that {@link #modelSnapshotRetentionDays} and {@link #dailyModelSnapshotRetentionAfterDays} make sense,
* both individually and in combination.
*/
public void validateModelSnapshotRetentionSettings() {

checkValueNotLessThan(0, MODEL_SNAPSHOT_RETENTION_DAYS.getPreferredName(), modelSnapshotRetentionDays);
checkValueNotLessThan(0, DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS.getPreferredName(),
dailyModelSnapshotRetentionAfterDays);

if (modelSnapshotRetentionDays != null &&
dailyModelSnapshotRetentionAfterDays != null &&
dailyModelSnapshotRetentionAfterDays > modelSnapshotRetentionDays) {
throw new IllegalArgumentException(Messages.getMessage(Messages.JOB_CONFIG_MODEL_SNAPSHOT_RETENTION_SETTINGS_INCONSISTENT,
dailyModelSnapshotRetentionAfterDays, modelSnapshotRetentionDays));
}
}

private void validateGroups() {
for (String group : this.groups) {
if (MlStrings.isValidId(group) == false) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
package org.elasticsearch.xpack.core.ml.job.messages;

import org.elasticsearch.xpack.core.ml.MachineLearningField;
import org.elasticsearch.xpack.core.ml.job.config.Job;

import java.text.MessageFormat;
import java.util.Locale;
Expand Down Expand Up @@ -212,6 +213,9 @@ public final class Messages {
"This job would cause a mapping clash with existing field [{0}] - avoid the clash by assigning a dedicated results index";
public static final String JOB_CONFIG_TIME_FIELD_NOT_ALLOWED_IN_ANALYSIS_CONFIG =
"data_description.time_field may not be used in the analysis_config";
public static final String JOB_CONFIG_MODEL_SNAPSHOT_RETENTION_SETTINGS_INCONSISTENT =
"The value of '" + Job.DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS + "' [{0}] cannot be greater than '" +
Job.MODEL_SNAPSHOT_RETENTION_DAYS + "' [{1}]";

public static final String JOB_AND_GROUP_NAMES_MUST_BE_UNIQUE =
"job and group names must be unique but job [{0}] and group [{0}] have the same name";
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ public final class ReservedFieldNames {
Job.RENORMALIZATION_WINDOW_DAYS.getPreferredName(),
Job.BACKGROUND_PERSIST_INTERVAL.getPreferredName(),
Job.MODEL_SNAPSHOT_RETENTION_DAYS.getPreferredName(),
Job.DAILY_MODEL_SNAPSHOT_RETENTION_AFTER_DAYS.getPreferredName(),
Job.RESULTS_RETENTION_DAYS.getPreferredName(),
Job.MODEL_SNAPSHOT_ID.getPreferredName(),
Job.MODEL_SNAPSHOT_MIN_VERSION.getPreferredName(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,9 @@
"type" : "object",
"enabled" : false
},
"daily_model_snapshot_retention_after_days" : {
"type" : "long"
},
"data_description" : {
"properties" : {
"field_delimiter" : {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ public void testConstructor_GivenEmptyJobConfiguration() {
assertNull(job.getModelPlotConfig());
assertNull(job.getRenormalizationWindowDays());
assertNull(job.getBackgroundPersistInterval());
assertThat(job.getModelSnapshotRetentionDays(), equalTo(1L));
assertThat(job.getModelSnapshotRetentionDays(), equalTo(10L));
assertNull(job.getDailyModelSnapshotRetentionAfterDays());
assertNull(job.getResultsRetentionDays());
assertNotNull(job.allInputFields());
Expand Down Expand Up @@ -168,7 +168,7 @@ public void testEquals_GivenDifferentIds() {
Job job1 = builder.build();
builder.setId("bar");
Job job2 = builder.build();
assertFalse(job1.equals(job2));
assertNotEquals(job1, job2);
}

public void testEquals_GivenDifferentRenormalizationWindowDays() {
Expand All @@ -183,7 +183,7 @@ public void testEquals_GivenDifferentRenormalizationWindowDays() {
jobDetails2.setRenormalizationWindowDays(4L);
jobDetails2.setAnalysisConfig(createAnalysisConfig());
jobDetails2.setCreateTime(date);
assertFalse(jobDetails1.build().equals(jobDetails2.build()));
assertNotEquals(jobDetails1.build(), jobDetails2.build());
}

public void testEquals_GivenDifferentBackgroundPersistInterval() {
Expand All @@ -198,7 +198,7 @@ public void testEquals_GivenDifferentBackgroundPersistInterval() {
jobDetails2.setBackgroundPersistInterval(TimeValue.timeValueSeconds(8000L));
jobDetails2.setAnalysisConfig(createAnalysisConfig());
jobDetails2.setCreateTime(date);
assertFalse(jobDetails1.build().equals(jobDetails2.build()));
assertNotEquals(jobDetails1.build(), jobDetails2.build());
}

public void testEquals_GivenDifferentModelSnapshotRetentionDays() {
Expand All @@ -213,7 +213,7 @@ public void testEquals_GivenDifferentModelSnapshotRetentionDays() {
jobDetails2.setModelSnapshotRetentionDays(8L);
jobDetails2.setAnalysisConfig(createAnalysisConfig());
jobDetails2.setCreateTime(date);
assertFalse(jobDetails1.build().equals(jobDetails2.build()));
assertNotEquals(jobDetails1.build(), jobDetails2.build());
}

public void testEquals_GivenDifferentResultsRetentionDays() {
Expand All @@ -228,7 +228,7 @@ public void testEquals_GivenDifferentResultsRetentionDays() {
jobDetails2.setResultsRetentionDays(4L);
jobDetails2.setAnalysisConfig(createAnalysisConfig());
jobDetails2.setCreateTime(date);
assertFalse(jobDetails1.build().equals(jobDetails2.build()));
assertNotEquals(jobDetails1.build(), jobDetails2.build());
}

public void testEquals_GivenDifferentCustomSettings() {
Expand All @@ -240,7 +240,7 @@ public void testEquals_GivenDifferentCustomSettings() {
Map<String, Object> customSettings2 = new HashMap<>();
customSettings2.put("key2", "value2");
jobDetails2.setCustomSettings(customSettings2);
assertFalse(jobDetails1.build().equals(jobDetails2.build()));
assertNotEquals(jobDetails1.build(), jobDetails2.build());
}

// JobConfigurationTests:
Expand Down Expand Up @@ -397,6 +397,30 @@ public void testVerify_GivenNegativeModelSnapshotRetentionDays() {
assertEquals(errorMessage, e.getMessage());
}

public void testVerify_GivenNegativeDailyModelSnapshotRetentionAfterDays() {
String errorMessage =
Messages.getMessage(Messages.JOB_CONFIG_FIELD_VALUE_TOO_LOW, "daily_model_snapshot_retention_after_days", 0, -1);
Job.Builder builder = buildJobBuilder("foo");
builder.setDailyModelSnapshotRetentionAfterDays(-1L);
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, builder::build);

assertEquals(errorMessage, e.getMessage());
}

public void testVerify_GivenInconsistentModelSnapshotRetentionSettings() {
long dailyModelSnapshotRetentionAfterDays = randomLongBetween(1, Long.MAX_VALUE);
long modelSnapshotRetentionDays = randomLongBetween(0, dailyModelSnapshotRetentionAfterDays - 1);
String errorMessage =
Messages.getMessage(Messages.JOB_CONFIG_MODEL_SNAPSHOT_RETENTION_SETTINGS_INCONSISTENT,
dailyModelSnapshotRetentionAfterDays, modelSnapshotRetentionDays);
Job.Builder builder = buildJobBuilder("foo");
builder.setDailyModelSnapshotRetentionAfterDays(dailyModelSnapshotRetentionAfterDays);
builder.setModelSnapshotRetentionDays(modelSnapshotRetentionDays);
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, builder::build);

assertEquals(errorMessage, e.getMessage());
}

public void testVerify_GivenLowBackgroundPersistInterval() {
String errorMessage = Messages.getMessage(Messages.JOB_CONFIG_FIELD_VALUE_TOO_LOW, "background_persist_interval", 3600, 3599);
Job.Builder builder = buildJobBuilder("foo");
Expand Down Expand Up @@ -628,7 +652,11 @@ public static Job createRandomizedJob() {
builder.setModelSnapshotRetentionDays(randomNonNegativeLong());
}
if (randomBoolean()) {
builder.setDailyModelSnapshotRetentionAfterDays(randomNonNegativeLong());
if (builder.getModelSnapshotRetentionDays() != null) {
builder.setDailyModelSnapshotRetentionAfterDays(randomLongBetween(0, builder.getModelSnapshotRetentionDays()));
} else {
builder.setDailyModelSnapshotRetentionAfterDays(randomNonNegativeLong());
}
}
if (randomBoolean()) {
builder.setResultsRetentionDays(randomNonNegativeLong());
Expand Down
Loading