Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Automatically map interval to fixed/calendar interval in datafeed aggs #51606

Closed
droberts195 opened this issue Jan 29, 2020 · 6 comments · Fixed by #55678
Closed

[ML] Automatically map interval to fixed/calendar interval in datafeed aggs #51606

droberts195 opened this issue Jan 29, 2020 · 6 comments · Fixed by #55678
Assignees
Labels
:ml Machine learning >upgrade

Comments

@droberts195
Copy link
Contributor

Since #33727 went into 7.2 interval is deprecated in date histograms. New date histograms should use fixed_interval or calendar_interval instead. In 8.x it is likely that date histograms containing interval will be considered invalid.

ML datafeed configs are persisted and are likely to contain interval rather than fixed_interval or calendar_interval; they definitely will if they were first created in a version earlier than 7.2. Such datafeed configs will be invalid in 8.0, so prior to 8.0 we must do something about this, but there is no reason we cannot start sooner.

The best course of action would seem to be to silently rewrite datafeed configs that contain a date_histogram aggregation with an interval setting to have whichever of fixed_interval or calendar_interval preserves the current behaviour.

We can do this unconditionally providing the oldest node in the cluster is on version 7.2 or above.

Rollups already does this silent behaviour-preserving rewriting, although rollup configs are stored in cluster state which makes this considerably easier than for the ML configs that are stored in an index.

The code that does the translation for rollups is in:

PARSER = new ConstructingObjectParser<>(NAME, a -> {
DateHistogramInterval oldInterval = (DateHistogramInterval) a[1];
DateHistogramInterval calendarInterval = (DateHistogramInterval) a[2];
DateHistogramInterval fixedInterval = (DateHistogramInterval) a[3];
if (oldInterval != null) {
if (calendarInterval != null || fixedInterval != null) {
throw new IllegalArgumentException("Cannot use [interval] with [fixed_interval] or [calendar_interval] " +
"configuration options.");
}
return fromUnknownTimeUnit((String) a[0], oldInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval != null && fixedInterval == null) {
return new CalendarInterval((String) a[0], calendarInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval == null && fixedInterval != null) {
return new FixedInterval((String) a[0], fixedInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval != null && fixedInterval != null) {
throw new IllegalArgumentException("Cannot set both [fixed_interval] and [calendar_interval] at the same time");
} else {
throw new IllegalArgumentException("An interval is required. Use [fixed_interval] or [calendar_interval].");
}
});

and:

private static DateHistogramGroupConfig fromUnknownTimeUnit(String field, DateHistogramInterval interval,
DateHistogramInterval delay, String timeZone) {
if (DateHistogramAggregationBuilder.DATE_FIELD_UNITS.get(interval.toString()) != null) {
return new CalendarInterval(field, interval, delay, timeZone);
} else {
return new FixedInterval(field, interval, delay, timeZone);
}
}

That should be considered as the spec for how to do a silent behaviour-preserving rewrite.

The extra complexity in ML is deciding where to do the migration for datafeed configs that wouldn't otherwise be rewritten. If we do it in the datafeed config parser then that ensures that newly submitted configs and datafeed configs that are updated will get translated. Then we would just need to find an appropriate place to do it for existing datafeed configs - maybe on master node election providing the oldest node in the cluster is version 7.2 or above?

There are also some currently muted tests that should be reenabled when the work is done:

  1. @AwaitsFix(bugUrl = "Needs ML to look at and fix. Unclear how this should be handled, interval is not an optional param")
  2. @AwaitsFix(bugUrl = "Tests need to be updated to use calendar/fixed interval explicitly")
@droberts195 droberts195 added >upgrade :ml Machine learning labels Jan 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent benwtrent self-assigned this Feb 10, 2020
@benwtrent
Copy link
Member

We can do this unconditionally providing the oldest node in the cluster is on version 7.2 or above.

I think we can do this unconditionally regardless of minimum node version. If we have a mixed cluster version, the date_histogram aggregation must follow the same BWC guarantees that we do. I think we would have a bigger problem if a date_histogram that specifies a fixed/calendar interval is not BWC with older nodes.

@droberts195
Copy link
Contributor Author

I think we can do this unconditionally regardless of minimum node version. If we have a mixed cluster version, the date_histogram aggregation must follow the same BWC guarantees that we do. I think we would have a bigger problem if a date_histogram that specifies a fixed/calendar interval is not BWC with older nodes.

For wire serialization this is true, but the extra problem ML is adding is to store the aggregations in a document in an index that might be read by an older node in a mixed version cluster. Usually aggregation DSL is a transient thing that only exists for the duration of one search so the wire serialization mappings are enough. We have lenient parsing when loading configs from indices, so parsing the config on load will not cause an error. But the older node will then see a date_histogram aggregation that doesn't contain an interval, and error at the point of trying to run it within the datafeed. This is why I think we should only do the rewrites when the minimum node version is high enough for all nodes in the cluster to understand fixed_interval and calendar_interval.

@benwtrent
Copy link
Member

benwtrent commented Feb 18, 2020

@droberts195

For wire serialization this is true, but the extra problem ML is adding is to store the aggregations in a document in an index that might be read by an older node in a mixed version cluster.

Isn't this already a problem then?

  • Somebody PUT or _update a datafeed and specifically sets the histogram setting to fixed|calendar
  • the doc is updated
  • the doc is read by an older node

I thought this type of situation was why our PUT and _update requests are master node actions. This way the request that gets parsed from XContent is the same version of as the master node. Which should be upgraded last anyways.

NOTE: With the automatic re-write, this will probably be exacerbated. To your point.

@droberts195
Copy link
Contributor Author

the master node. Which should be upgraded last anyways.

This is not a hard requirement. The docs just say:

It is best to upgrade the master-eligible nodes in your cluster after all of the other nodes.

Also, even if that advice is followed, when there are multiple master-eligible nodes the actual master node at some time during the process might be on a version ahead of some other master-eligible nodes.

Somebody PUT or _update a datafeed and specifically sets the histogram setting to fixed|calendar

There's an assumption that users don't use new features of a particular version until the whole cluster is on that version. I agree there could be problems if this assumption isn't followed.

Basically what I am saying is that our internal upgrade code should follow that same assumption - don't start adding new syntax into configs until the whole cluster is on the version that supports that syntax.

If a user got themselves into a mess by trying to use new version functionality in a mixed version cluster we could justifiably push back and say they should have waited until the whole cluster is upgraded, and that they need to complete their upgrade before our functionality will work again. But code we write that runs automatically breaks their mixed version cluster then they have a more justified grievance against us.

@benwtrent
Copy link
Member

benwtrent commented Feb 18, 2020

Current plan:

  • Have _update automatically fix the interval if the min node version of the cluster is > 7.2.0
  • The datafeed job builder should update the lazily parsed map before validating or running the datafeed. This way we handle the situation where a datafeed has continued to run, but has the old interval settings

As for updating existing datafeeds that are not in use, i am not 100% the best way to go about this.

The options I can think of are:

  • provide an easy way for folks to update them (via an empty call to _update) if they want to use the datafeed again later. This lines up with the "upgrade assistant" format.
  • always update the interval when we parse the XContent from the index. v8 is BWC only with 7.last. So, in v8.0.0 we can do this without worry. But, this would mean the stored document could always have the old interval. Might cause issues if somebody decides to delete line of code that does the rewrite...
  • Have some service that reads and rewrites them (I dislike this option...)

benwtrent added a commit that referenced this issue Feb 24, 2020
…terval fields (#52538)

`interval` was deprecated back in 7.2.0. It was replaced with `calendar_interval` and `fixed_interval`. 

This change automatically rewrites datafeed configurations that contain the old `interval` field. The rewrite occurs when the configuration is read from the index. It is NOT then written back to the index.

This PR also re-enables the long disabled BWC tests for datafeeds. 

Partially addresses: #51606
benwtrent added a commit that referenced this issue Apr 28, 2020
This creates an auto update service. The first automatic update is rewriting datafeed aggregations if they exist. This is necessary as `date_histogram` is deprecating and removing the `interval` key. To aid users who have the `interval` key defined, we can automatically upgrade these aggregation definitions to the appropriate `fixed_interval` or `calendar_interval`.

closes #51606
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >upgrade
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants