-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] More advanced model snapshot retention options #56125
[ML] More advanced model snapshot retention options #56125
Conversation
This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Closes elastic#52150
Pinging @elastic/ml-core (:ml) |
setting was available, the default is the same as that job's | ||
`model_snapshot_retention_days` setting, to preserve the original behavior | ||
that no thinning out of model snapshots is done. | ||
end::daily-model-snapshot-retention-after-days[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs should be updated for tag::model-snapshot-retention-days[]
as it says the default value is 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added that change in a commit.
import static org.hamcrest.Matchers.is; | ||
import static org.hamcrest.Matchers.notNullValue; | ||
|
||
public class ModelSnapshotRetentionIT extends MlNativeAutodetectIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#nit, do these tests NEED to be the native-multi-node-tests
? These take a while to spin-up and run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't NEED to be multi-node tests, but I think they need to be external cluster tests.
I initially tried adding the integration test as an internal cluster test, but that didn't work because delete-by-query didn't work in the internal cluster tests.
We have a single node external cluster test suite too but I am not sure there is a huge performance benefit in moving the test here. Also, over the years we've found numerous bugs in our multi-node clusters caused by differences between requests getting sent directly to the master node and requests getting sent to a non-master node, so I don't think it hurts to have a bit more coverage of inter-node communications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tag::daily-model-snapshot-retention-after-days[] | ||
Advanced configuration option. Specifies a number of days between 0 and the | ||
value of `model_snapshot_retention_days`. After this period of time, only one | ||
model snapshot per day is retained for this job. Age is calculated relative to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Age is calculated relative to the timestamp of the newest model snapshot...
Is this sentence needed? If yes, it's unclear to me what this age affects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something is needed. Not sure how best to phrase it though.
Suppose you have the following model snapshots:
- Tuesday 9:04 am
- Monday 10:39 am
- Sunday 9:03 am
- Saturday 10:46 pm
- Friday 8:02 pm
- Friday 7:49 pm
- Friday 8:59 am
- Thursday 9:04 am
And suppose model_snapshot_retention_days
is 4 and daily_model_snapshot_retention_after_days
is 2.
Then we will delete all snapshots older than Friday 9:04 am (= Tuesday 9:04 am - 4 days). So we will delete snapshots 7 and 8 by this rule. And for snapshots at or after Friday 9:04 am (= Tuesday 9:04 am - 4 days) but before Sunday 9:04 am (= Tuesday 9:04 am - 2 days) we will keep the first per 24 hour period. So we will keep snapshots 6 and 4 by this rule and delete snapshots 5 and 3.
You can see that this is all highly dependent on the timestamp of the latest model snapshot, because everything is measured relative to its timestamp. So this is what "Age is calculated relative to the timestamp of the newest model snapshot" was trying to say. Can you recommend a good way to explain it? I will probably do this docs change in a followup PR though if the PR build goes green soon as I need to get the main code change merged before the 7.8 branch is split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created elastic/stack-docs#1047 to update the Model snapshots (https://www.elastic.co/guide/en/machine-learning/current/ml-model-snapshots.html) page and can revisit the API details thereafter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lisa!
@@ -84,15 +87,15 @@ private WrappedBatchedJobsIterator newJobIterator() { | |||
return new WrappedBatchedJobsIterator(jobsIterator); | |||
} | |||
|
|||
abstract void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Long> listener); | |||
abstract void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Tuple<Long, Long>> listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth replacing the tuple with a struct-like class that better explains what each value is.
*/ | ||
protected static final class CutoffDetails { | ||
|
||
public long latestTimeMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make those two members final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of elastic#56125
This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of #56125
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8.
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8.
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8.
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8.
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8. Co-authored-by: Russ Cam <[email protected]>
Relates: elastic/elasticsearch#56125, #4803 This commit adds the DailyModelSnapshotRetentionAfterDays to ML jobs. It also updates the XML comment for ModelSnapshotRetentionDays to align with the new default in Elasticsearch 7.8. Co-authored-by: Russ Cam <[email protected]>
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.
model_snapshot_retention_days
for new jobs is now10 instead of 1
daily_model_snapshot_retention_after_days
,that defaults to 1 for new jobs and
model_snapshot_retention_days
for pre-7.8 jobs
model_snapshot_retention_days
, allmodel snapshots are deleted as before
daily_model_snapshot_retention_after_days
and
model_snapshot_retention_days
all but the first model snapshotfor that day are deleted
retain
setting of model snapshots is still respected to allowselected model snapshots to be retained indefinitely
Closes #52150