-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix delete_expired_data/nightly maintenance when many model snapshots need deleting #57041
Conversation
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
All my comments on the code are minor nits.
My main comment is that we should consider this a bug rather than an enhancement and backport it to 7.8.0 and 7.7.1. From the perspective of an end user who is suffering from this the product doesn't work.
} else { | ||
value[0] = 0L; | ||
} | ||
value[0] = TimeUtils.parseToEpochMs((String)value[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indenting is out by 1 space by the look of it
* @param epoch The | ||
* @return The epoch value. | ||
*/ | ||
public static Long parseToEpochMs(String epoch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to return long
? In the case where the caller wants Long
it can be boxed just as efficiently on the caller's side. And if the caller wants long
then there's no boxing and unboxing.
long timestampMs = modelSnapshot.getTimestamp().getTime(); | ||
String timestamp = stringFieldValueOrNull(hit, ModelSnapshot.TIMESTAMP.getPreferredName()); | ||
if (timestamp == null) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should log an error here, as I don't think this should never happen unless a document is corrupt. So if it does happen it would be useful to know about when debugging why results haven't been deleted.
if (timestamp == null) { | ||
continue; | ||
} | ||
Long timestampMs = TimeUtils.parseToEpochMs(timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use long
instead of Long
and avoid unboxing in the loop.
I changed the PR title so affected users can find it more easily in the release notes. |
*/ | ||
public static Long parseToEpochMs(String epoch) { | ||
int dotPos = epoch.indexOf('.'); | ||
long epochMs = 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could get rid of epochMs
and use return
statement in each of 3 if
branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that does read better
* JSON. So String is really the only format that could be used, but the consumers of time | ||
* are expecting a number. | ||
* | ||
* @param epoch The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write sth here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😄
8a7366d
to
a69f90a
Compare
@@ -84,7 +78,11 @@ public void remove(float requestsPerSec, ActionListener<Boolean> listener, Suppl | |||
.filter(QueryBuilders.termQuery(Result.RESULT_TYPE.getPreferredName(), ForecastRequestStats.RESULT_TYPE_VALUE)) | |||
.filter(QueryBuilders.existsQuery(ForecastRequestStats.EXPIRY_TIME.getPreferredName()))); | |||
source.size(MAX_FORECASTS); | |||
source.trackTotalHits(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will removing this make:
if (hits.getTotalHits().value > MAX_FORECASTS) {
LOGGER.info("More than [{}] forecasts were found. This run will only delete [{}] of them", MAX_FORECASTS, MAX_FORECASTS);
}
less useful? (Around line 141/145.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot I couldn't see where it was checked so removed it.
The benefit of setting trackTotalHits(true)
as the only purpose is to log this message saying there are more forecasts to be deleted later. However, in the interest of making this change as small as possible I've reverted this line
… need deleting (elastic#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
…pshots need deleting (#57041) (#57136) Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
… need deleting (elastic#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted. # Conflicts: # x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/ExpiredForecastsRemover.java # x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/ExpiredModelSnapshotsRemover.java # x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/job/retention/MlDataRemover.java # x-pack/plugin/ml/src/test/java/org/elasticsearch/xpack/ml/job/retention/ExpiredModelSnapshotsRemoverTests.java
… need deleting (elastic#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
… need deleting (#57041) (#57145) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
…pshots need deleting (#57041) (#57146) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
The queries performed by the expired data removers pull back entire documents where only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000.
If the user is suffering with many accumulated snapshots that were not cleaned up due to #47103 the size of this search response could be very large. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.