Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] rewrite deprecated datafeed configs to use new date_histogram interval fields #52538

Merged

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Feb 19, 2020

interval was deprecated back in 7.2.0. It was replaced with calendar_interval and fixed_interval.

This change automatically rewrites datafeed configurations that contain the old interval field. The rewrite occurs when the configuration is read from the index. It is NOT then written back to the index.

This PR also re-enables the long disabled BWC tests for datafeeds. I did my best to cover all the flakiness. Definitely want to run through multiple times locally once #52383 is fixed.

Partially addresses: #51606

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member Author

run elasticsearch-ci/bwc

1 similar comment
@benwtrent
Copy link
Member Author

run elasticsearch-ci/bwc

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good first step. I had a couple of questions.

"Put job and datafeed with aggs in old cluster - deprecated interval with warning":
"Put job and datafeed with aggs in old cluster - deprecated interval with warning before 8.0.0":
- skip:
version: "7.99.99 - "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be version: "8.0.0 - " for 8.0.0 and above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#fixed

@@ -23,7 +26,9 @@ setup:
---
"Test old cluster datafeed with aggs":
- skip:
features: "warnings"
features: warnings
version: "all"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this test is unconditionally skipped, so I wondered about deleting it completely. But I guess the reason for keeping it is that eventually it can be reenabled. You could add a comment # TODO: remove skip section from master when version is bumped to 9.0.0 here to prompt us to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

ElasticsearchException e = expectThrows(ElasticsearchException.class,
() -> createDatafeedWithDateHistogram("8d"));

assertThat(e.getMessage(), containsString("When specifying a date_histogram calendar interval [8d]"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still be a valid test. It's testing the code in

if (interval.days() > 7) {
throw ExceptionsHelper.badRequestException(invalidDateHistogramCalendarIntervalMessage(calendarInterval));
}
return interval.millis();
}
private static String invalidDateHistogramCalendarIntervalMessage(String interval) {
throw ExceptionsHelper.badRequestException("When specifying a date_histogram calendar interval ["
+ interval + "], ML does not accept intervals longer than a week because of " +
"variable lengths of periods greater than a week");
}

Did this test fail after the other changes in this PR? If so that might be a sign of a problem.

Even though histograms in general can plot variable length time buckets (e.g. months), it's still reasonable for ML to prohibit such bucket spans as they would make the modelling much more complex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the test back and fixed it

- is_false: datafeeds.0.node

---
"Put job and datafeed with aggs in old cluster - deprecated interval with warning after 8.0.0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will run into a problem with this in the future, because at some point before 8.0.0-beta1 is released the master branch will start to treat interval as a syntax error. This test seems to be designed for the day when there is a version beyond 8.0.0 and the old cluster is 8.0.0. But when this combination of versions is being tested this test will be forced to use "fixed_interval" : "30s" instead of "interval" : "30s", so will not test the rewrite functionality. Since it's also testing other things it's worth keeping the test, but you might as well change it to say "fixed_interval" : "30s" now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be designed for the day when there is a version beyond 8.0.0 and the old cluster is 8.0.0.

Not exactly, the rolling upgrade tests use an "old cluster" of 8.0.0, even if the "new cluster" version is 8.0.0

Copy link
Member Author

@benwtrent benwtrent Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is effectively testing the rewrite all in the same cluster version. This is actually covered when we "GET" the old datafeed in the new cluster tests.

I will adjust the PUT to be a fixed_interval and not check for warnings at all.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think we need another PR to update the configs in the index at an appropriate time, but that doesn't need to be done immediately. This is a great step towards solving the problem.

@benwtrent benwtrent merged commit b49b8db into elastic:master Feb 24, 2020
@benwtrent benwtrent deleted the feature/ml-datafeed-date_histo-migration branch February 24, 2020 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants