Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] More advanced model snapshot retention options #56125

Merged

Conversation

droberts195
Copy link
Contributor

This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

  • The default for model_snapshot_retention_days for new jobs is now
    10 instead of 1
  • There is a new job setting, daily_model_snapshot_retention_after_days,
    that defaults to 1 for new jobs and model_snapshot_retention_days
    for pre-7.8 jobs
  • For days that are older than model_snapshot_retention_days, all
    model snapshots are deleted as before
  • For days that are in between daily_model_snapshot_retention_after_days
    and model_snapshot_retention_days all but the first model snapshot
    for that day are deleted
  • The retain setting of model snapshots is still respected to allow
    selected model snapshots to be retained indefinitely

Closes #52150

This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Closes elastic#52150
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

setting was available, the default is the same as that job's
`model_snapshot_retention_days` setting, to preserve the original behavior
that no thinning out of model snapshots is done.
end::daily-model-snapshot-retention-after-days[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated for tag::model-snapshot-retention-days[] as it says the default value is 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that change in a commit.

import static org.hamcrest.Matchers.is;
import static org.hamcrest.Matchers.notNullValue;

public class ModelSnapshotRetentionIT extends MlNativeAutodetectIntegTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#nit, do these tests NEED to be the native-multi-node-tests? These take a while to spin-up and run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't NEED to be multi-node tests, but I think they need to be external cluster tests.

I initially tried adding the integration test as an internal cluster test, but that didn't work because delete-by-query didn't work in the internal cluster tests.

We have a single node external cluster test suite too but I am not sure there is a huge performance benefit in moving the test here. Also, over the years we've found numerous bugs in our multi-node clusters caused by differences between requests getting sent directly to the master node and requests getting sent to a non-master node, so I don't think it hurts to have a bit more coverage of inter-node communications.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tag::daily-model-snapshot-retention-after-days[]
Advanced configuration option. Specifies a number of days between 0 and the
value of `model_snapshot_retention_days`. After this period of time, only one
model snapshot per day is retained for this job. Age is calculated relative to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Age is calculated relative to the timestamp of the newest model snapshot...

Is this sentence needed? If yes, it's unclear to me what this age affects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is needed. Not sure how best to phrase it though.

Suppose you have the following model snapshots:

  1. Tuesday 9:04 am
  2. Monday 10:39 am
  3. Sunday 9:03 am
  4. Saturday 10:46 pm
  5. Friday 8:02 pm
  6. Friday 7:49 pm
  7. Friday 8:59 am
  8. Thursday 9:04 am

And suppose model_snapshot_retention_days is 4 and daily_model_snapshot_retention_after_days is 2.

Then we will delete all snapshots older than Friday 9:04 am (= Tuesday 9:04 am - 4 days). So we will delete snapshots 7 and 8 by this rule. And for snapshots at or after Friday 9:04 am (= Tuesday 9:04 am - 4 days) but before Sunday 9:04 am (= Tuesday 9:04 am - 2 days) we will keep the first per 24 hour period. So we will keep snapshots 6 and 4 by this rule and delete snapshots 5 and 3.

You can see that this is all highly dependent on the timestamp of the latest model snapshot, because everything is measured relative to its timestamp. So this is what "Age is calculated relative to the timestamp of the newest model snapshot" was trying to say. Can you recommend a good way to explain it? I will probably do this docs change in a followup PR though if the PR build goes green soon as I need to get the main code change merged before the 7.8 branch is split.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created elastic/stack-docs#1047 to update the Model snapshots (https://www.elastic.co/guide/en/machine-learning/current/ml-model-snapshots.html) page and can revisit the API details thereafter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lisa!

@@ -84,15 +87,15 @@ private WrappedBatchedJobsIterator newJobIterator() {
return new WrappedBatchedJobsIterator(jobsIterator);
}

abstract void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Long> listener);
abstract void calcCutoffEpochMs(String jobId, long retentionDays, ActionListener<Tuple<Long, Long>> listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth replacing the tuple with a struct-like class that better explains what each value is.

*/
protected static final class CutoffDetails {

public long latestTimeMs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make those two members final?

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit c99021c into elastic:master May 5, 2020
@droberts195 droberts195 deleted the daily_model_snapshot_retention branch May 5, 2020 11:55
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request May 5, 2020
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of elastic#56125
droberts195 added a commit that referenced this pull request May 5, 2020
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of #56125
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 24, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 30, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Jun 30, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this pull request Jun 30, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 30, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.

Co-authored-by: Russ Cam <[email protected]>
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 30, 2020
Relates: elastic/elasticsearch#56125, #4803

This commit adds the DailyModelSnapshotRetentionAfterDays
to ML jobs. It also updates the XML comment for
ModelSnapshotRetentionDays to align with the new default in
Elasticsearch 7.8.

Co-authored-by: Russ Cam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] More advanced model snapshot retention policy
7 participants