-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] rewrite deprecated datafeed configs to use new date_histogram interval fields #52538
[ML] rewrite deprecated datafeed configs to use new date_histogram interval fields #52538
Conversation
Pinging @elastic/ml-core (:ml) |
…:benwtrent/elasticsearch into feature/ml-datafeed-date_histo-migration
run elasticsearch-ci/bwc |
1 similar comment
run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good first step. I had a couple of questions.
"Put job and datafeed with aggs in old cluster - deprecated interval with warning": | ||
"Put job and datafeed with aggs in old cluster - deprecated interval with warning before 8.0.0": | ||
- skip: | ||
version: "7.99.99 - " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be version: "8.0.0 - "
for 8.0.0 and above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#fixed
@@ -23,7 +26,9 @@ setup: | |||
--- | |||
"Test old cluster datafeed with aggs": | |||
- skip: | |||
features: "warnings" | |||
features: warnings | |||
version: "all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment this test is unconditionally skipped, so I wondered about deleting it completely. But I guess the reason for keeping it is that eventually it can be reenabled. You could add a comment # TODO: remove skip section from master when version is bumped to 9.0.0
here to prompt us to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
ElasticsearchException e = expectThrows(ElasticsearchException.class, | ||
() -> createDatafeedWithDateHistogram("8d")); | ||
|
||
assertThat(e.getMessage(), containsString("When specifying a date_histogram calendar interval [8d]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still be a valid test. It's testing the code in
Lines 179 to 189 in 290c8b8
if (interval.days() > 7) { | |
throw ExceptionsHelper.badRequestException(invalidDateHistogramCalendarIntervalMessage(calendarInterval)); | |
} | |
return interval.millis(); | |
} | |
private static String invalidDateHistogramCalendarIntervalMessage(String interval) { | |
throw ExceptionsHelper.badRequestException("When specifying a date_histogram calendar interval [" | |
+ interval + "], ML does not accept intervals longer than a week because of " + | |
"variable lengths of periods greater than a week"); | |
} |
Did this test fail after the other changes in this PR? If so that might be a sign of a problem.
Even though histograms in general can plot variable length time buckets (e.g. months), it's still reasonable for ML to prohibit such bucket spans as they would make the modelling much more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the test back and fixed it
- is_false: datafeeds.0.node | ||
|
||
--- | ||
"Put job and datafeed with aggs in old cluster - deprecated interval with warning after 8.0.0": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will run into a problem with this in the future, because at some point before 8.0.0-beta1 is released the master branch will start to treat interval
as a syntax error. This test seems to be designed for the day when there is a version beyond 8.0.0 and the old cluster is 8.0.0. But when this combination of versions is being tested this test will be forced to use "fixed_interval" : "30s"
instead of "interval" : "30s"
, so will not test the rewrite functionality. Since it's also testing other things it's worth keeping the test, but you might as well change it to say "fixed_interval" : "30s"
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be designed for the day when there is a version beyond 8.0.0 and the old cluster is 8.0.0.
Not exactly, the rolling upgrade tests use an "old cluster" of 8.0.0, even if the "new cluster" version is 8.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is effectively testing the rewrite all in the same cluster version. This is actually covered when we "GET" the old datafeed in the new cluster tests.
I will adjust the PUT
to be a fixed_interval
and not check for warnings at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think we need another PR to update the configs in the index at an appropriate time, but that doesn't need to be done immediately. This is a great step towards solving the problem.
interval
was deprecated back in 7.2.0. It was replaced withcalendar_interval
andfixed_interval
.This change automatically rewrites datafeed configurations that contain the old
interval
field. The rewrite occurs when the configuration is read from the index. It is NOT then written back to the index.This PR also re-enables the long disabled BWC tests for datafeeds. I did my best to cover all the flakiness. Definitely want to run through multiple times locally once #52383 is fixed.
Partially addresses: #51606